Language Selection

English French German Italian Portuguese Spanish

Keeper of Expired Web Pages Is Sued

Filed under
Legal

The Internet Archive was created in 1996 as the institutional memory of the online world, storing snapshots of ever-changing Web sites and collecting other multimedia artifacts. Now the nonprofit archive is on the defensive in a legal case that represents a strange turn in the debate over copyrights in the digital age.

Beyond its utility for Internet historians, the Web page database, searchable with a form called the Wayback Machine, is also routinely used by intellectual property lawyers to help learn, for example, when and how a trademark might have been historically used or violated.

That is what brought the Philadelphia law firm of Harding Earley Follmer & Frailey to the Wayback Machine two years ago. The firm was defending Health Advocate, a company in suburban Philadelphia that helps patients resolve health care and insurance disputes, against a trademark action brought by a similarly named competitor.

In preparing the case, representatives of Earley Follmer used the Wayback Machine to turn up old Web pages - some dating to 1999 - originally posted by the plaintiff, Healthcare Advocates of Philadelphia.

Last week Healthcare Advocates sued both the Harding Earley firm and the Internet Archive, saying the access to its old Web pages, stored in the Internet Archive's database, was unauthorized and illegal.

The lawsuit, filed in Federal District Court in Philadelphia, seeks unspecified damages for copyright infringement and violations of two federal laws: the Digital Millennium Copyright Act and the Computer Fraud and Abuse Act.

"The firm at issue professes to be expert in Internet law and intellectual property law," said Scott S. Christie, a lawyer at the Newark firm of McCarter & English, which is representing Healthcare Advocates. "You would think, of anyone, they would know better."

But John Earley, a member of the firm being sued, said he was not surprised by the action, because Healthcare Advocates had tried to amend similar charges to its original suit against Health Advocate, but the judge denied the motion. Mr. Earley called the action baseless, adding: "It's a rather strange one, too, because Wayback is used every day in trademark law. It's a common tool."

The Internet Archive uses Web-crawling "bot" programs to make copies of publicly accessible sites on a periodic, automated basis. Those copies are then stored on the archive's servers for later recall using the Wayback Machine.

The archive's repository now has approximately one petabyte - roughly one million gigabytes - worth of historical Web site content, much of which would have been lost as Web site owners deleted, changed and otherwise updated their sites.

The suit contends, however, that representatives of Harding Earley should not have been able to view the old Healthcare Advocates Web pages - even though they now reside on the archive's servers - because the company, shortly after filing its suit against Health Advocate, had placed a text file on its own servers designed to tell the Wayback Machine to block public access to the historical versions of the site.

Under popular Web convention, such a file - known as robots.txt - dictates what parts of a site can be examined for indexing in search engines or storage in archives.

Most search engines program their Web crawlers to recognize a robots.txt file, and follow its commands. The Internet Archive goes a step further, allowing Web site administrators to use the robots.txt file to control the archiving of current content, as well as block access to any older versions already stored in the archive's database before a robots.txt file was put in place.

But on at least two dates in July 2003, the suit states, Web logs at Healthcare Advocates indicated that someone at Harding Earley, using the Wayback Machine, made hundreds of rapid-fire requests for the old versions of the Web site. In most cases, the robot.txt blocked the request. But in 92 instances, the suit states, it appears to have failed, allowing access to the archived pages.

In so doing, the suit claims, the law firm violated the Digital Millennium Copyright Act, which prohibits the circumventing of "technological measures" designed to protect copyrighted materials. The suit further contends that among other violations, the firm violated copyright by gathering, storing and transmitting the archived pages as part of the earlier trademark litigation.

The Internet Archive, meanwhile, is accused of breach of contract and fiduciary duty, negligence and other charges for failing to honor the robots.txt file and allowing the archived pages to be viewed.

Brewster Kahle, the director and a founder of the Internet Archive, was unavailable for comment, and no one at the archive was willing to talk about the case - although Beatrice Murch, Mr. Kahle's assistant and a development coordinator, said the organization had not yet been formally served with the suit.

Mr. Earley, the lawyer whose firm is named along with the archive, however, said no breach was ever made. "We wouldn't know how to, in effect, bypass a block." he said.

Even if they had, it is unclear that any laws would have been broken.

"First of all, robots.txt is a voluntary mechanism," said Martijn Koster, a Dutch software engineer and the author of a comprehensive tutorial on the robots.txt convention (robotstxt.org). "It is designed to let Web site owners communicate their wishes to cooperating robots. Robots can ignore robots.txt."

William F. Patry, an intellectual property lawyer with Thelen Reid & Priest in New York and a former Congressional copyright counsel, said that violations of the copyright act and other statutes would be extremely hard to prove in this case.

He said that the robots.txt file is part of an entirely voluntary system, and that no real contract exists between the nonprofit Internet Archive and any of the historical Web sites it preserves.

"The archive here, they were being the good guys," Mr. Patry said, referring to the archive's recognition of robots.txt commands. "They didn't have to do that."

Mr. Patry also noted that despite Healthcare Advocates' desire to prevent people from seeing its old pages now, the archived pages were once posted openly by the company. He asserted that gathering them as part of fending off a lawsuit fell well within the bounds of fair use.

Whatever the circumstances behind the access, Mr. Patry said, the sole result "is that information that they had formerly made publicly available didn't stay hidden."

By TOM ZELLER Jr.
The New York Times

More in Tux Machines

Introducing the potential new Ubuntu Studio Council

Back in 2016, Set Hallström was elected as the new Team Lead for Ubuntu Studio, just in time for the 16.04 Xenial Long Term Support (LTS) release. It was intended that Ubuntu Studio would be able to utilise Set’s leadership skills at least up until the next LTS release in April 2018. Unfortunately, as happens occasionally in the world of volunteer work, Set’s personal circumstances changed and he is no longer able to devote as much time to Ubuntu Studio as he would like. Therefore, an IRC meeting was held between interested Ubuntu Studio contributors on 21st May 2017 to agree on how to fill the void. We decided to follow the lead of Xubuntu and create a Council to take care of Ubuntu Studio, rather than continuing to place the burden of leadership on the shoulder of one particular person. Unfortunately, although the result was an agreement to form the first Ubuntu Studio Council from the meeting participants, we all got busy and the council was never set up. Read more

today's leftovers

  • My Experience with MailSpring on Linux
    On the Linux Desktop, there are quite a few choices for email applications. Each of these has their own pros and cons which should be weighed depending on one’s needs. Some clients will have MS Exchange support. Others do not. In general, because email is reasonably close to free (and yes, we can thank Hotmail for that) it has been a difficult place to make money. Without a cash flow to encourage developers, development has trickled at best.
  • Useful FFMPEG Commands for Managing Audio and Video Files
  • Set Up A Python Django Development Environment on Debian 9 Stretch Linux
  • How To Run A Command For A Specific Time In Linux
  • Kubuntu 17.10 Guide for Newbie Part 7
  •  
  • Why Oppo and Vivo are losing steam in Chinese smartphone market
    China’s smartphone market has seen intense competition over the past few years with four local brands capturing more than 60 percent of sales in 2017. Huawei Technologies, Oppo, Vivo and Xiaomi Technology recorded strong shipment growth on a year-on-year basis. But some market experts warned that Oppo and Vivo may see the growth of their shipments slow this year as users become more discriminating.
  • iPhones Blamed for More than 1,600 Accidental 911 Calls Since October
    The new Emergency SOS feature released by Apple for the iPhone is the one to blame for no less than 1,600 false calls to 911 since October, according to dispatchers. And surprisingly, emergency teams in Elk Grove and Sacramento County in California say they receive at least 20 such 911 calls every day from what appears to be an Apple service center. While it’s not exactly clear why the iPhones that are probably brought in for repairs end up dialing 911, dispatchers told CBS that the false calls were first noticed in the fall of the last year. Apple launched new iPhones in September 2017 and they went on sale later the same month and in November, but it’s not clear if these new devices are in any way related to the increasing number of accidental calls to 911.
  • Game Studio Found To Install Malware DRM On Customers' Machines, Defends Itself, Then Apologizes
    The thin line that exists between entertainment industry DRM software and plain malware has been pointed out both recently and in the past. There are many layers to this onion, ranging from Sony's rootkit fiasco, to performance hits on machines thanks to DRM installed by video games, up to and including the insane idea that copyright holders ought to be able to use malware payloads to "hack back" against accused infringers. What is different in more recent times is the public awareness regarding DRM, computer security, and an overall fear of malware. This is a natural kind of progression, as the public becomes more connected and reliant on computer systems and the internet, they likewise become more concerned about those systems. That may likely explain the swift public backlash to a small game-modding studio seemingly installing something akin to malware in every installation of its software, whether from a legitimate purchase or piracy.

Server: Benchmarks, IBM and Red Hat

  • 36-Way Comparison Of Amazon EC2 / Google Compute Engine / Microsoft Azure Cloud Instances vs. Intel/AMD CPUs
    Earlier this week I delivered a number of benchmarks comparing Amazon EC2 instances to bare metal Intel/AMD systems. Due to interest from that, here is a larger selection of cloud instance types from the leading public clouds of Amazon Elastic Compute Cloud, Microsoft Azure, and Google Compute Engine.
  • IBM's Phil Estes on the Turbulent Waters of Container History
    Phil Estes painted a different picture of container history at Open Source 101 in Raleigh last weekend, speaking from the perspective of someone who had a front row seat. To hear him tell it, this rise and success is a story filled with intrigue, and enough drama to keep a daytime soap opera going for a season or two.
  • Red Hat CSA Mike Bursell on 'managed degradation' and open data
    As part of Red Hat's CTO office chief security architect Mike Bursell has to be informed of security threats past, present and yet to come – as many as 10 years into the future. The open source company has access to a wealth of customers in verticals including health, finance, defence, the public sector and more. So how do these insights inform the company's understanding of the future threat landscape?
  • Red Hat Offers New Decision Management Tech Platform
    Red Hat (NYSE: RHT) has released a platform that will work to support information technology applications and streamline the deployment of rules-based tools in efforts to automate processes for business decision management, ExecutiveBiz reported Thursday.

Vulkan Anniversary and Generic FBDEV Emulation Continues To Be Worked On For DRM Drivers

  • Vulkan Turns Two Years Old, What Do You Hope For Next?
    This last week marked two years since the debut of Vulkan 1.0, you can see our our original launch article. My overworked memory missed realizing it by a few days, but it's been a pretty miraculous two years for this high-performance graphics and compute API.
  • Generic FBDEV Emulation Continues To Be Worked On For DRM Drivers
    Noralf Trønnes has spent the past few months working on generic FBDEV emulation for Direct Rendering Manager (DRM) drivers and this week he volleyed his third revision of these patches, which now includes a new in-kernel API along with some clients like a bootsplash system, VT console, and fbdev implementation.