Language Selection

English French German Italian Portuguese Spanish

Keeper of Expired Web Pages Is Sued

Filed under

The Internet Archive was created in 1996 as the institutional memory of the online world, storing snapshots of ever-changing Web sites and collecting other multimedia artifacts. Now the nonprofit archive is on the defensive in a legal case that represents a strange turn in the debate over copyrights in the digital age.

Beyond its utility for Internet historians, the Web page database, searchable with a form called the Wayback Machine, is also routinely used by intellectual property lawyers to help learn, for example, when and how a trademark might have been historically used or violated.

That is what brought the Philadelphia law firm of Harding Earley Follmer & Frailey to the Wayback Machine two years ago. The firm was defending Health Advocate, a company in suburban Philadelphia that helps patients resolve health care and insurance disputes, against a trademark action brought by a similarly named competitor.

In preparing the case, representatives of Earley Follmer used the Wayback Machine to turn up old Web pages - some dating to 1999 - originally posted by the plaintiff, Healthcare Advocates of Philadelphia.

Last week Healthcare Advocates sued both the Harding Earley firm and the Internet Archive, saying the access to its old Web pages, stored in the Internet Archive's database, was unauthorized and illegal.

The lawsuit, filed in Federal District Court in Philadelphia, seeks unspecified damages for copyright infringement and violations of two federal laws: the Digital Millennium Copyright Act and the Computer Fraud and Abuse Act.

"The firm at issue professes to be expert in Internet law and intellectual property law," said Scott S. Christie, a lawyer at the Newark firm of McCarter & English, which is representing Healthcare Advocates. "You would think, of anyone, they would know better."

But John Earley, a member of the firm being sued, said he was not surprised by the action, because Healthcare Advocates had tried to amend similar charges to its original suit against Health Advocate, but the judge denied the motion. Mr. Earley called the action baseless, adding: "It's a rather strange one, too, because Wayback is used every day in trademark law. It's a common tool."

The Internet Archive uses Web-crawling "bot" programs to make copies of publicly accessible sites on a periodic, automated basis. Those copies are then stored on the archive's servers for later recall using the Wayback Machine.

The archive's repository now has approximately one petabyte - roughly one million gigabytes - worth of historical Web site content, much of which would have been lost as Web site owners deleted, changed and otherwise updated their sites.

The suit contends, however, that representatives of Harding Earley should not have been able to view the old Healthcare Advocates Web pages - even though they now reside on the archive's servers - because the company, shortly after filing its suit against Health Advocate, had placed a text file on its own servers designed to tell the Wayback Machine to block public access to the historical versions of the site.

Under popular Web convention, such a file - known as robots.txt - dictates what parts of a site can be examined for indexing in search engines or storage in archives.

Most search engines program their Web crawlers to recognize a robots.txt file, and follow its commands. The Internet Archive goes a step further, allowing Web site administrators to use the robots.txt file to control the archiving of current content, as well as block access to any older versions already stored in the archive's database before a robots.txt file was put in place.

But on at least two dates in July 2003, the suit states, Web logs at Healthcare Advocates indicated that someone at Harding Earley, using the Wayback Machine, made hundreds of rapid-fire requests for the old versions of the Web site. In most cases, the robot.txt blocked the request. But in 92 instances, the suit states, it appears to have failed, allowing access to the archived pages.

In so doing, the suit claims, the law firm violated the Digital Millennium Copyright Act, which prohibits the circumventing of "technological measures" designed to protect copyrighted materials. The suit further contends that among other violations, the firm violated copyright by gathering, storing and transmitting the archived pages as part of the earlier trademark litigation.

The Internet Archive, meanwhile, is accused of breach of contract and fiduciary duty, negligence and other charges for failing to honor the robots.txt file and allowing the archived pages to be viewed.

Brewster Kahle, the director and a founder of the Internet Archive, was unavailable for comment, and no one at the archive was willing to talk about the case - although Beatrice Murch, Mr. Kahle's assistant and a development coordinator, said the organization had not yet been formally served with the suit.

Mr. Earley, the lawyer whose firm is named along with the archive, however, said no breach was ever made. "We wouldn't know how to, in effect, bypass a block." he said.

Even if they had, it is unclear that any laws would have been broken.

"First of all, robots.txt is a voluntary mechanism," said Martijn Koster, a Dutch software engineer and the author of a comprehensive tutorial on the robots.txt convention ( "It is designed to let Web site owners communicate their wishes to cooperating robots. Robots can ignore robots.txt."

William F. Patry, an intellectual property lawyer with Thelen Reid & Priest in New York and a former Congressional copyright counsel, said that violations of the copyright act and other statutes would be extremely hard to prove in this case.

He said that the robots.txt file is part of an entirely voluntary system, and that no real contract exists between the nonprofit Internet Archive and any of the historical Web sites it preserves.

"The archive here, they were being the good guys," Mr. Patry said, referring to the archive's recognition of robots.txt commands. "They didn't have to do that."

Mr. Patry also noted that despite Healthcare Advocates' desire to prevent people from seeing its old pages now, the archived pages were once posted openly by the company. He asserted that gathering them as part of fending off a lawsuit fell well within the bounds of fair use.

Whatever the circumstances behind the access, Mr. Patry said, the sole result "is that information that they had formerly made publicly available didn't stay hidden."

The New York Times

More in Tux Machines

Linux 4.8.4

I'm announcing the release of the 4.8.4 kernel. And yeah, sorry about the quicker releases, I'll be away tomorrow and as they seem to have passed all of the normal testing, I figured it would be better to get them out earlier instead of later. And I like releasing stuff on this date every year... All users of the 4.8 kernel series must upgrade. The updated 4.8.y git tree can be found at: git:// linux-4.8.y and can be browsed at the normal git web browser: Read more Also: Linux 4.7.10 Linux 4.4.27

New Releases: Budgie, Solus, SalentOS, and Slackel

  • Open-Source Budgie Desktop Sees New Release
    The pet parakeet of the Linux world, Budgie has a new release available for download. in this post we lookout what's new and tell you how you can get it.
  • Solus Linux Making Performance Gains With Its BLAS Configuration
    - Those making use of the promising Solus Linux distribution will soon find their BLAS-based workloads are faster. Solus developer Peter O'Connor tweeted this week that he's found some issues with the BLAS linking on the distribution and he's made fixes for Solus. He also mentioned that he uncovered these BLAS issues by using our Phoronix Test Suite benchmarking software.
  • SalentOS “Luppìu” 1.0 released!
    With great pleasure the team announces the release of SalentOS “Luppìu” 1.0.
  • Slackel "Live kde" 4.14.21
    This release is available in both 32-bit and 64-bit architectures, while the 64-bit iso supports booting on UEFI systems. The 64-bit iso images support booting on UEFI systems. The 32-bit iso images support both i686 PAE SMP and i486, non-PAE capable systems. Iso images are isohybrid.

Security News

  • Free tool protects PCs from master boot record attacks [Ed: UEFI has repeatedly been found to be both a detriment to security and enabler of Microsoft lock-in]
    Cisco's Talos team has developed an open-source tool that can protect the master boot record of Windows computers from modification by ransomware and other malicious attacks. The tool, called MBRFilter, functions as a signed system driver and puts the disk's sector 0 into a read-only state. It is available for both 32-bit and 64-bit Windows versions and its source code has been published on GitHub. The master boot record (MBR) consists of executable code that's stored in the first sector (sector 0) of a hard disk drive and launches the operating system's boot loader. The MBR also contains information about the disk's partitions and their file systems. Since the MBR code is executed before the OS itself, it can be abused by malware programs to increase their persistence and gain a head start before antivirus programs. Malware programs that infect the MBR to hide from antivirus programs have historically been known as bootkits -- boot-level rootkits. Microsoft attempted to solve the bootkit problem by implementing cryptographic verification of the bootloader in Windows 8 and later. This feature is known as Secure Boot and is based on the Unified Extensible Firmware Interface (UEFI) -- the modern BIOS.
  • DDOS Attack On Internet Infrastructure
    I hope somebody's paying attention. There's been another big DDOS attack, this time against the infrastructure of the Internet. It began at 7:10 a.m. EDT today against Dyn, a major DNS host, and was brought under control at 9:36 a.m. According to Gizmodo, which was the first to report the story, at least 40 sites were made unreachable to users on the US East Coast. Many of the sites affected are among the most trafficed on the web, and included CNN, Twitter, PayPal, Pinterest and Reddit to name a few. The developer community was also touched, as GitHub was also made unreachable. This event comes on the heels of a record breaking 620 Gbps DDOS attack about a month ago that brought down security expert Brian Krebs' website, KrebsonSecurity. In that attack, Krebs determined the attack had been launched by botnets that primarily utilized compromised IoT devices, and was seen by some as ushering in a new era of Internet security woes.
  • This Is Why Half the Internet Shut Down Today [Update: It’s Getting Worse]
    Twitter, Spotify and Reddit, and a huge swath of other websites were down or screwed up this morning. This was happening as hackers unleashed a large distributed denial of service (DDoS) attack on the servers of Dyn, a major DNS host. It’s probably safe to assume that the two situations are related.
  • Major DNS provider Dyn hit with DDoS attack
    Attacks against DNS provider Dyn continued into Friday afternoon. Shortly before noon, the company said it began "monitoring and mitigating a DDoS attack" against its Dyn Managed DNS infrastructure. The attack may also have impacted Managed DNS advanced service "with possible delays in monitoring."
  • What We Know About Friday’s Massive East Coast Internet Outage
    Friday morning is prime time for some casual news reading, tweeting, and general Internet browsing, but you may have had some trouble accessing your usual sites and services this morning and throughout the day, from Spotify and Reddit to the New York Times and even good ol’ For that, you can thank a distributed denial of service attack (DDoS) that took down a big chunk of the Internet for most of the Eastern seaboard. This morning’s attack started around 7 am ET and was aimed at Dyn, an Internet infrastructure company headquartered in New Hampshire. That first bout was resolved after about two hours; a second attack began just before noon. Dyn reported a third wave of attacks a little after 4 pm ET. In all cases, traffic to Dyn’s Internet directory servers throughout the US—primarily on the East Coast but later on the opposite end of the country as well—was stopped by a flood of malicious requests from tens of millions of IP addresses disrupting the system. Late in the day, Dyn described the events as a “very sophisticated and complex attack.” Still ongoing, the situation is a definite reminder of the fragility of the web, and the power of the forces that aim to disrupt it.
  • Either IoT will be secure or the internet will be crippled forever
    First things first a disclaimer. I neither like nor trust the National Security Agency (NSA). I believe them to be mainly engaged in economic spying for the corporate American empire. Glenn Greenwald has clearly proven that in his book No Place to Hide. At the NSA, profit and power come first and I have no fucking clue as to how high they prioritize national security. Having said that, the NSA should hack the Internet of (insecure) Things (IoT) to death. I know Homeland Security and the FBI are investigating where the DDoS of doomsday proportions is coming from and the commentariat is already screaming RUSSIA! But it is really no secret what is enabling this clusterfuck. It’s the Mirai botnet. If you buy a “smart camera” from the Chinese company Hangzhou XiongMai Technologies and do not change the default password, it will be part of a botnet five minutes after you connect it to the internet. We were promised a future where we would have flying cars but we’re living in a future where camera’s, light-bulbs, doorbells and fridges can get you in serious trouble because your home appliances are breaking the law.
  • IoT at the Network Edge
    Fog computing, also known as fog networking, is a decentralized computing infrastructure. Computing resources and application services are distributed in logical, efficient places at any points along the connection from the data source (endpoint) to the cloud. The concept is to process data locally and then use the network for communicating with other resources for further processing and analysis. Data could be sent to a data center or a cloud service. A worthwhile reference published by Cisco is the white paper, "Fog Computing and the Internet of Things: Extend the Cloud to Where the Things Are."
  • Canonical now offers live kernel patching for Ubuntu 16.04 LTS users
    Canonical has announced its ‘Livepatch Service’ which any user can enable on their current installations to eliminate the need for rebooting their machine after installing an update for the Linux kernel. With the release of Linux 4.0, users have been able to update their kernel packages without rebooting, however, Ubuntu will be the first distribution to offer this feature for free.
  • ​The Dirty Cow Linux bug: A silly name for a serious problem
    Dirty Cow is a silly name, but it's a serious Linux kernel problem. According to the Red Hat bug report, "a race condition was found in the way the Linux kernel's memory subsystem handled the copy-on-write (COW) breakage of private read-only memory mappings. An unprivileged local user could use this flaw to gain write access to otherwise read-only memory mappings and thus increase their privileges on the system."
  • Ancient Privilege Escalation Bug Haunts Linux
  • October 21, 2016 Is Dirty COW a serious concern for Linux?
  • There is a Dirty Cow in Linux
  • Red Hat Discovers Dirty COW Archaic Linux Kernel Flaw Exploited In The Wild
  • Linux kernel bug being exploited in the wild
  • Update Linux now: Critical privilege escalation security flaw gives hackers full root access
  • Linux kernel bug: DirtyCOW “easyroot” hole and what you need to know
  • 'Most serious' Linux privilege-escalation bug ever discovered
  • New 'Dirty Cow' vulnerability threatens Linux systems
  • Serious Dirty Cow Linux Vulnerability Under Attack
  • Easy-to-exploit rooting flaw puts Linux PCs at risk
  • Linux just patched a vulnerability it's had for 9 years
  • Dirty COW Linux vulnerability has existed for nine years
  • 'Dirty Cow' Linux Vulnerability Found
  • 'Dirty Cow' Linux Vulnerability Found After Nine Years
  • FakeFile Trojan Opens Backdoors on Linux Computers, Except openSUSE
    Malware authors are taking aim at Linux computers, more precisely desktops and not servers, with a new trojan named FakeFile, currently distributed in live attacks. Russian antivirus vendor Dr.Web discovered this new trojan in October. The company's malware analysts say the trojan is spread in the form of an archived PDF, Microsoft Office, or OpenOffice file.

today's howtos