Language Selection

English French German Italian Portuguese Spanish

Keeper of Expired Web Pages Is Sued

Filed under
Legal

The Internet Archive was created in 1996 as the institutional memory of the online world, storing snapshots of ever-changing Web sites and collecting other multimedia artifacts. Now the nonprofit archive is on the defensive in a legal case that represents a strange turn in the debate over copyrights in the digital age.

Beyond its utility for Internet historians, the Web page database, searchable with a form called the Wayback Machine, is also routinely used by intellectual property lawyers to help learn, for example, when and how a trademark might have been historically used or violated.

That is what brought the Philadelphia law firm of Harding Earley Follmer & Frailey to the Wayback Machine two years ago. The firm was defending Health Advocate, a company in suburban Philadelphia that helps patients resolve health care and insurance disputes, against a trademark action brought by a similarly named competitor.

In preparing the case, representatives of Earley Follmer used the Wayback Machine to turn up old Web pages - some dating to 1999 - originally posted by the plaintiff, Healthcare Advocates of Philadelphia.

Last week Healthcare Advocates sued both the Harding Earley firm and the Internet Archive, saying the access to its old Web pages, stored in the Internet Archive's database, was unauthorized and illegal.

The lawsuit, filed in Federal District Court in Philadelphia, seeks unspecified damages for copyright infringement and violations of two federal laws: the Digital Millennium Copyright Act and the Computer Fraud and Abuse Act.

"The firm at issue professes to be expert in Internet law and intellectual property law," said Scott S. Christie, a lawyer at the Newark firm of McCarter & English, which is representing Healthcare Advocates. "You would think, of anyone, they would know better."

But John Earley, a member of the firm being sued, said he was not surprised by the action, because Healthcare Advocates had tried to amend similar charges to its original suit against Health Advocate, but the judge denied the motion. Mr. Earley called the action baseless, adding: "It's a rather strange one, too, because Wayback is used every day in trademark law. It's a common tool."

The Internet Archive uses Web-crawling "bot" programs to make copies of publicly accessible sites on a periodic, automated basis. Those copies are then stored on the archive's servers for later recall using the Wayback Machine.

The archive's repository now has approximately one petabyte - roughly one million gigabytes - worth of historical Web site content, much of which would have been lost as Web site owners deleted, changed and otherwise updated their sites.

The suit contends, however, that representatives of Harding Earley should not have been able to view the old Healthcare Advocates Web pages - even though they now reside on the archive's servers - because the company, shortly after filing its suit against Health Advocate, had placed a text file on its own servers designed to tell the Wayback Machine to block public access to the historical versions of the site.

Under popular Web convention, such a file - known as robots.txt - dictates what parts of a site can be examined for indexing in search engines or storage in archives.

Most search engines program their Web crawlers to recognize a robots.txt file, and follow its commands. The Internet Archive goes a step further, allowing Web site administrators to use the robots.txt file to control the archiving of current content, as well as block access to any older versions already stored in the archive's database before a robots.txt file was put in place.

But on at least two dates in July 2003, the suit states, Web logs at Healthcare Advocates indicated that someone at Harding Earley, using the Wayback Machine, made hundreds of rapid-fire requests for the old versions of the Web site. In most cases, the robot.txt blocked the request. But in 92 instances, the suit states, it appears to have failed, allowing access to the archived pages.

In so doing, the suit claims, the law firm violated the Digital Millennium Copyright Act, which prohibits the circumventing of "technological measures" designed to protect copyrighted materials. The suit further contends that among other violations, the firm violated copyright by gathering, storing and transmitting the archived pages as part of the earlier trademark litigation.

The Internet Archive, meanwhile, is accused of breach of contract and fiduciary duty, negligence and other charges for failing to honor the robots.txt file and allowing the archived pages to be viewed.

Brewster Kahle, the director and a founder of the Internet Archive, was unavailable for comment, and no one at the archive was willing to talk about the case - although Beatrice Murch, Mr. Kahle's assistant and a development coordinator, said the organization had not yet been formally served with the suit.

Mr. Earley, the lawyer whose firm is named along with the archive, however, said no breach was ever made. "We wouldn't know how to, in effect, bypass a block." he said.

Even if they had, it is unclear that any laws would have been broken.

"First of all, robots.txt is a voluntary mechanism," said Martijn Koster, a Dutch software engineer and the author of a comprehensive tutorial on the robots.txt convention (robotstxt.org). "It is designed to let Web site owners communicate their wishes to cooperating robots. Robots can ignore robots.txt."

William F. Patry, an intellectual property lawyer with Thelen Reid & Priest in New York and a former Congressional copyright counsel, said that violations of the copyright act and other statutes would be extremely hard to prove in this case.

He said that the robots.txt file is part of an entirely voluntary system, and that no real contract exists between the nonprofit Internet Archive and any of the historical Web sites it preserves.

"The archive here, they were being the good guys," Mr. Patry said, referring to the archive's recognition of robots.txt commands. "They didn't have to do that."

Mr. Patry also noted that despite Healthcare Advocates' desire to prevent people from seeing its old pages now, the archived pages were once posted openly by the company. He asserted that gathering them as part of fending off a lawsuit fell well within the bounds of fair use.

Whatever the circumstances behind the access, Mr. Patry said, the sole result "is that information that they had formerly made publicly available didn't stay hidden."

By TOM ZELLER Jr.
The New York Times

More in Tux Machines

5 Reasons to Switch to Ubuntu Phone

You’ve had Android phones, and you’ve had iPhones. Buying a smartphone for most people is a polarized, A/B choice. And for some, the experience of choosing a new phone is becoming… jaded. You might think that Android and iOS have the mobile market sewn up, but what if I was to tell you that you don’t need to look at Windows 10 Mobile or BlackBerry as alternatives? Various others are available, but perhaps the most impressive of them all is the Ubuntu Phone, which uses the Ubuntu Touch platform, and can be found on devices such as the Meizu Pro 5. Read more Also: Ubuntu Linux 16.10 (Yakkety Yak) Beta 1 now available for download (don't talk back)

Bodhi Updates, KaOS & Antergos Reviews, Another 25?

Today in Linux news, Jeff Hoogland posted a short update on the progress of Bodhi Linux 4.0 and reported on the updates to the project's donations page. In other news, An Everyday Linux User reviewed Arch-based Antergos Linux saying it was "decent" and Ubuntu-fan Jack Wallen reviewed "beautiful" KDE-centric KaOS. makeuseof.com has five reasons to switch to the Ubuntu phone and Brian Fagioli asked if Linux can survive another 25 years. Read more

Rise of the Forks: Nextcloud and LibreOffice

  • ownCloud-Forked Nextcloud 10 Now Available
  • Secure, Monitor and Control your data with Nextcloud 10 – get it now!
    Nextcloud 10 is now available with many new features for system administrators to control and direct the flow of data between users on a Nextcloud server. Rule based file tagging and responding to these tags as well as other triggers like physical location, user group, file properties and request type enables administrators to specifically deny access to, convert, delete or retain data following business or legal requirements. Monitoring, security, performance and usability improvements complement this release, enabling larger and more efficient Nextcloud installations. You can get it on our install page or read on for details.
  • What makes a great Open Source project?
    Recently the Document Foundation has published its annual report for the year 2015. You can download it as a pdf by following this link, and you can now even purchase a paper copy of the report. This publication gives me the opportunity to talk a bit about what I think makes a great FOSS project and what I understand may be a great community. If it is possible to see this topic as something many people already went over and over again, think again: Free & Open Source Software is seen as having kept and even increased its momentum these past few years, with many innovative companies developing and distributing software licensed under a Free & Open Source license from the very beginning. This trend indicates two important points: FOSS is no longer something you can automagically use as a nice tag slapped on a commodity software; and FOSS projects cannot really be treated as afterthoughts or “nice-to-haves”. Gone are the days where many vendors could claim to be sympathetic and even supportive to FOSS but only insofar as their double-digits forecasted new software solution would not be affected by a cumbersome “community of developers”. Innovation relies on, starts with, runs thanks to FOSS technologies and practices. One question is to wonder what comes next. Another one is to wonder why Open Source is still seen as a complex maze of concepts and practices by so many in the IT industry. This post will try to address one major difficulty of FOSS: why do some projects fail while others succeed.

Red Hat News

  • Red Hat Virtualisation 4 woos VMware faithful
    It is easy for a virtual machine user to feel left out these days, what with containers dominating the discussion of how to run applications at scale. But take heart, VM fans: Red Hat hasn’t forgotten about you. Red Hat Virtualisation (RHV) 4.0 refreshes Red Hat’s open source virtualisation platform with new technologies from the rest of Red Hat’s product line. It is a twofold strategy to consolidate Red Hat’s virtualisation efforts across its various products and to ramp up the company’s intention to woo VMware customers.
  • Forbes Names Red Hat One of the World's Most Innovative Companies
    Red Hat, Inc. (NYSE: RHT), the world's leading provider of open source solutions, today announced it has been named to Forbes' “World’s Most Innovative Companies” list. Red Hat was ranked as the 25th most innovative company in the world, marking the company's fourth appearance on the list (2012, 2014, 2015, 2016). Red Hat was named to Forbes' "World's Most Innovative Growth Companies" list in 2011.
  • Is this Large Market Cap Stock target price reasonable for Red Hat, Inc. (NYSE:RHT)?