Language Selection

English French German Italian Portuguese Spanish

Keeper of Expired Web Pages Is Sued

Filed under
Legal

The Internet Archive was created in 1996 as the institutional memory of the online world, storing snapshots of ever-changing Web sites and collecting other multimedia artifacts. Now the nonprofit archive is on the defensive in a legal case that represents a strange turn in the debate over copyrights in the digital age.

Beyond its utility for Internet historians, the Web page database, searchable with a form called the Wayback Machine, is also routinely used by intellectual property lawyers to help learn, for example, when and how a trademark might have been historically used or violated.

That is what brought the Philadelphia law firm of Harding Earley Follmer & Frailey to the Wayback Machine two years ago. The firm was defending Health Advocate, a company in suburban Philadelphia that helps patients resolve health care and insurance disputes, against a trademark action brought by a similarly named competitor.

In preparing the case, representatives of Earley Follmer used the Wayback Machine to turn up old Web pages - some dating to 1999 - originally posted by the plaintiff, Healthcare Advocates of Philadelphia.

Last week Healthcare Advocates sued both the Harding Earley firm and the Internet Archive, saying the access to its old Web pages, stored in the Internet Archive's database, was unauthorized and illegal.

The lawsuit, filed in Federal District Court in Philadelphia, seeks unspecified damages for copyright infringement and violations of two federal laws: the Digital Millennium Copyright Act and the Computer Fraud and Abuse Act.

"The firm at issue professes to be expert in Internet law and intellectual property law," said Scott S. Christie, a lawyer at the Newark firm of McCarter & English, which is representing Healthcare Advocates. "You would think, of anyone, they would know better."

But John Earley, a member of the firm being sued, said he was not surprised by the action, because Healthcare Advocates had tried to amend similar charges to its original suit against Health Advocate, but the judge denied the motion. Mr. Earley called the action baseless, adding: "It's a rather strange one, too, because Wayback is used every day in trademark law. It's a common tool."

The Internet Archive uses Web-crawling "bot" programs to make copies of publicly accessible sites on a periodic, automated basis. Those copies are then stored on the archive's servers for later recall using the Wayback Machine.

The archive's repository now has approximately one petabyte - roughly one million gigabytes - worth of historical Web site content, much of which would have been lost as Web site owners deleted, changed and otherwise updated their sites.

The suit contends, however, that representatives of Harding Earley should not have been able to view the old Healthcare Advocates Web pages - even though they now reside on the archive's servers - because the company, shortly after filing its suit against Health Advocate, had placed a text file on its own servers designed to tell the Wayback Machine to block public access to the historical versions of the site.

Under popular Web convention, such a file - known as robots.txt - dictates what parts of a site can be examined for indexing in search engines or storage in archives.

Most search engines program their Web crawlers to recognize a robots.txt file, and follow its commands. The Internet Archive goes a step further, allowing Web site administrators to use the robots.txt file to control the archiving of current content, as well as block access to any older versions already stored in the archive's database before a robots.txt file was put in place.

But on at least two dates in July 2003, the suit states, Web logs at Healthcare Advocates indicated that someone at Harding Earley, using the Wayback Machine, made hundreds of rapid-fire requests for the old versions of the Web site. In most cases, the robot.txt blocked the request. But in 92 instances, the suit states, it appears to have failed, allowing access to the archived pages.

In so doing, the suit claims, the law firm violated the Digital Millennium Copyright Act, which prohibits the circumventing of "technological measures" designed to protect copyrighted materials. The suit further contends that among other violations, the firm violated copyright by gathering, storing and transmitting the archived pages as part of the earlier trademark litigation.

The Internet Archive, meanwhile, is accused of breach of contract and fiduciary duty, negligence and other charges for failing to honor the robots.txt file and allowing the archived pages to be viewed.

Brewster Kahle, the director and a founder of the Internet Archive, was unavailable for comment, and no one at the archive was willing to talk about the case - although Beatrice Murch, Mr. Kahle's assistant and a development coordinator, said the organization had not yet been formally served with the suit.

Mr. Earley, the lawyer whose firm is named along with the archive, however, said no breach was ever made. "We wouldn't know how to, in effect, bypass a block." he said.

Even if they had, it is unclear that any laws would have been broken.

"First of all, robots.txt is a voluntary mechanism," said Martijn Koster, a Dutch software engineer and the author of a comprehensive tutorial on the robots.txt convention (robotstxt.org). "It is designed to let Web site owners communicate their wishes to cooperating robots. Robots can ignore robots.txt."

William F. Patry, an intellectual property lawyer with Thelen Reid & Priest in New York and a former Congressional copyright counsel, said that violations of the copyright act and other statutes would be extremely hard to prove in this case.

He said that the robots.txt file is part of an entirely voluntary system, and that no real contract exists between the nonprofit Internet Archive and any of the historical Web sites it preserves.

"The archive here, they were being the good guys," Mr. Patry said, referring to the archive's recognition of robots.txt commands. "They didn't have to do that."

Mr. Patry also noted that despite Healthcare Advocates' desire to prevent people from seeing its old pages now, the archived pages were once posted openly by the company. He asserted that gathering them as part of fending off a lawsuit fell well within the bounds of fair use.

Whatever the circumstances behind the access, Mr. Patry said, the sole result "is that information that they had formerly made publicly available didn't stay hidden."

By TOM ZELLER Jr.
The New York Times

More in Tux Machines

GNU/Linux, Docker Gain in Rented Space

LibreOffice Help From FSF, Mike Saunders

  • New FSF membership benefit: LibreOffice certification
    The Free Software Foundation (FSF) today announced that the opportunity to apply for LibreOffice certification for migrations and trainings is now available to FSF Associate Members. LibreOffice is a free software project of The Document Foundation (TDF), a non-profit based in Germany. An office suite, LibreOffice encompasses word processing, and programs for the creation and editing of spreadsheets, slideshows, databases, diagrams and drawings, and mathematical formulae. It uses the ISO standard OpenDocument file format (ODF).
  • Marketing activities so far in 2017: Mike Saunders
    Thanks to donations to The Document Foundation, along with valued contributions from our community, we maintain a small team working on various aspects of LibreOffice including documentation, user interface design, quality assurance, release engineering and marketing. Together with Italo Vignoli, I help with the latter, and today I’ll summarise some of the achievements so far in 2017.

Debian/Ubuntu: Q4OS, Ubuntu Dock and LXD Weekly Status Update

  • There's Now a Windows 10 Installer for the Debian-Based Q4OS Linux Distribution
    The Q4OS development team is pleased to inform us today about the immediate availability for download of a Windows installer for their Debian-based GNU/Linux distribution, Q4OS, allowing users to create a dual-boot environment on their PCs. For those not familiar to Q4OS, it's an open-source and free Linux distro based on the popular Debian GNU/Linux operating system and built around the Trinity Desktop Environment (TDE), which resembles the look and feel of the old-school KDE 3.5 desktop environment. Created with an emphasis on Windows users who want to migrate to a free, open-source, and more secure operating system, Q4OS now lets them install the distribution alongside Microsoft Windows in an easy manner, without having to do any modifications to your personal computer or install any other apps.
  • Ubuntu Dock Now Has Dynamic Transparency
    Ubuntu devs have listened to our gripe on the jarring contrast between GNOME 3.26's transparent top bar and the Ubuntu Dock.
  • Ubuntu Dock Features Adaptive Transparency on Ubuntu 17.10, Here's How It Works
    Ubuntu contributor Didier Roche continues his development on the look and feel of the upcoming Ubuntu 17.10 (Artful Aardvark) operating system, and today he announced that Ubuntu Dock is getting adaptive transparency. Canonical confirmed that Ubuntu 17.10 would come with the GNOME 3.26 desktop environment by default, though the default session has suffered numerous modifications compared to the vanilla one to make things easier for those using the Unity interface on Ubuntu 17.04 (Zesty Zapus) or Ubuntu 16.04 LTS (Xenial Xerus). Most probably, Ubuntu 16.04 LTS users won't upgrade to Ubuntu 17.10, but we're sure Ubuntu 17.04 users will because it'll reach end of life in about four months from the moment of writing, sometime in January 2018. Therefore, Canonical wants to make their Unity to GNOME transition as painless as possible.
  • LXD: Weekly Status #15
    This week has been pretty quiet as far as upstream changes since half the team was attending the Open Source Summity, the Linux Plumbers Conference and the Linux Security Summit in Los Angeles, California.

Events: KDE/Randa 2017 and Linux Foundation

  • KMyMoney’s Łukasz Wojniłowicz in Randa
    Please read the following guest post from Łukasz who joined me last week in Randa to work on KMyMoney.
  • Randa 2017 – Databases are back to KMyMoney
    On the morning of Day 5 we chased and fixed a problem that was introduced a long time ago but never caused any trouble. The code goes back into the KDE3 version of KMyMoney and was caused by some changes inside Qt5. The fix prevents a crash when saving a transaction which opens an additional dialog to gather more information (e.g. price information). With the help of other devs here in Randa, we were able to drill down the problem and update the code to work on KF5/Qt5 keeping the existing functionality.
  • Randa 2017 – Days 3 and 4
    On Day 3, we started out at 7:02 as usual with the team responsible for breakfast meeting in the kitchen. KMyMoney wise, we worked some more on keyboard navigation and porting to KF5. The dialog to open a database and the logic around it have been rewritten/fixed, so that it is now possible to collect the information from the user and proceed with opening. The database I have on file for testing does not open though due to another problem which I still need to investigate.
  • Watch the Keynote Videos from Open Source Summit in Los Angeles
    If you weren’t able to attend Open Source Summit North America 2017 in Los Angeles, don’t worry! We’ve rounded up the following keynote presentations so you can hear from the experts about the growing impact of open source software.
  • uniprof: Transparent Unikernel for Performance Profiling and Debugging
    Unikernels are small and fast and give Docker a run for its money, while at the same time still giving stronger features of isolation, says Florian Schmidt, a researcher at NEC Europe, who has developed uniprof, a unikernel performance profiler that can also be used for debugging. Schmidt explained more in his presentation at Xen Summit in Budapest in July. Most developers think that unikernels are hard to create and debug. This is not entirely true: Unikernels are a single linked binary that come with a shared address space, which mean you can use gdb. That said, developers do lack tools, such as effective profilers, that would help create and maintain unikernels.