Language Selection

English French German Italian Portuguese Spanish

Keeper of Expired Web Pages Is Sued

Filed under
Legal

The Internet Archive was created in 1996 as the institutional memory of the online world, storing snapshots of ever-changing Web sites and collecting other multimedia artifacts. Now the nonprofit archive is on the defensive in a legal case that represents a strange turn in the debate over copyrights in the digital age.

Beyond its utility for Internet historians, the Web page database, searchable with a form called the Wayback Machine, is also routinely used by intellectual property lawyers to help learn, for example, when and how a trademark might have been historically used or violated.

That is what brought the Philadelphia law firm of Harding Earley Follmer & Frailey to the Wayback Machine two years ago. The firm was defending Health Advocate, a company in suburban Philadelphia that helps patients resolve health care and insurance disputes, against a trademark action brought by a similarly named competitor.

In preparing the case, representatives of Earley Follmer used the Wayback Machine to turn up old Web pages - some dating to 1999 - originally posted by the plaintiff, Healthcare Advocates of Philadelphia.

Last week Healthcare Advocates sued both the Harding Earley firm and the Internet Archive, saying the access to its old Web pages, stored in the Internet Archive's database, was unauthorized and illegal.

The lawsuit, filed in Federal District Court in Philadelphia, seeks unspecified damages for copyright infringement and violations of two federal laws: the Digital Millennium Copyright Act and the Computer Fraud and Abuse Act.

"The firm at issue professes to be expert in Internet law and intellectual property law," said Scott S. Christie, a lawyer at the Newark firm of McCarter & English, which is representing Healthcare Advocates. "You would think, of anyone, they would know better."

But John Earley, a member of the firm being sued, said he was not surprised by the action, because Healthcare Advocates had tried to amend similar charges to its original suit against Health Advocate, but the judge denied the motion. Mr. Earley called the action baseless, adding: "It's a rather strange one, too, because Wayback is used every day in trademark law. It's a common tool."

The Internet Archive uses Web-crawling "bot" programs to make copies of publicly accessible sites on a periodic, automated basis. Those copies are then stored on the archive's servers for later recall using the Wayback Machine.

The archive's repository now has approximately one petabyte - roughly one million gigabytes - worth of historical Web site content, much of which would have been lost as Web site owners deleted, changed and otherwise updated their sites.

The suit contends, however, that representatives of Harding Earley should not have been able to view the old Healthcare Advocates Web pages - even though they now reside on the archive's servers - because the company, shortly after filing its suit against Health Advocate, had placed a text file on its own servers designed to tell the Wayback Machine to block public access to the historical versions of the site.

Under popular Web convention, such a file - known as robots.txt - dictates what parts of a site can be examined for indexing in search engines or storage in archives.

Most search engines program their Web crawlers to recognize a robots.txt file, and follow its commands. The Internet Archive goes a step further, allowing Web site administrators to use the robots.txt file to control the archiving of current content, as well as block access to any older versions already stored in the archive's database before a robots.txt file was put in place.

But on at least two dates in July 2003, the suit states, Web logs at Healthcare Advocates indicated that someone at Harding Earley, using the Wayback Machine, made hundreds of rapid-fire requests for the old versions of the Web site. In most cases, the robot.txt blocked the request. But in 92 instances, the suit states, it appears to have failed, allowing access to the archived pages.

In so doing, the suit claims, the law firm violated the Digital Millennium Copyright Act, which prohibits the circumventing of "technological measures" designed to protect copyrighted materials. The suit further contends that among other violations, the firm violated copyright by gathering, storing and transmitting the archived pages as part of the earlier trademark litigation.

The Internet Archive, meanwhile, is accused of breach of contract and fiduciary duty, negligence and other charges for failing to honor the robots.txt file and allowing the archived pages to be viewed.

Brewster Kahle, the director and a founder of the Internet Archive, was unavailable for comment, and no one at the archive was willing to talk about the case - although Beatrice Murch, Mr. Kahle's assistant and a development coordinator, said the organization had not yet been formally served with the suit.

Mr. Earley, the lawyer whose firm is named along with the archive, however, said no breach was ever made. "We wouldn't know how to, in effect, bypass a block." he said.

Even if they had, it is unclear that any laws would have been broken.

"First of all, robots.txt is a voluntary mechanism," said Martijn Koster, a Dutch software engineer and the author of a comprehensive tutorial on the robots.txt convention (robotstxt.org). "It is designed to let Web site owners communicate their wishes to cooperating robots. Robots can ignore robots.txt."

William F. Patry, an intellectual property lawyer with Thelen Reid & Priest in New York and a former Congressional copyright counsel, said that violations of the copyright act and other statutes would be extremely hard to prove in this case.

He said that the robots.txt file is part of an entirely voluntary system, and that no real contract exists between the nonprofit Internet Archive and any of the historical Web sites it preserves.

"The archive here, they were being the good guys," Mr. Patry said, referring to the archive's recognition of robots.txt commands. "They didn't have to do that."

Mr. Patry also noted that despite Healthcare Advocates' desire to prevent people from seeing its old pages now, the archived pages were once posted openly by the company. He asserted that gathering them as part of fending off a lawsuit fell well within the bounds of fair use.

Whatever the circumstances behind the access, Mr. Patry said, the sole result "is that information that they had formerly made publicly available didn't stay hidden."

By TOM ZELLER Jr.
The New York Times

More in Tux Machines

Leftovers: OSS

  • Diving into Drupal: Princeton’s Multi-site Migration Success with Open-source
    Princeton University’s web team had a complex and overwhelming digital ecosystem comprised of many different websites, created from pre-built templates and hosted exclusively on internal servers. Fast forward six years: Princeton continues to manage a their multisite and flagship endeavors on the open-source Drupal platform, and have seen some great results since their migration back in 2011. However, this success did not come overnight. Organizational buy-in, multi-site migration and authentication were a few of the many challenges Princeton ran into when making the decision to move to the cloud.
  • GitHub Invites Developers to Contribute to the Open Source Guides
    GitHub has recently launched its Open Source Guides, a collection of resources addressing the most common scenarios and best practices for both contributors and maintainers of open source projects. The guides themselves are open source and GitHub is actively inviting developers to participate and share their stories.
  • Top open source projects
    TechRadar recently posted an article about "The best open source software 2017" where they list a few of their favorite open source software projects. It's really hard for an open source software project to become popular if it has poor usability—so I thought I'd add a few quick comments of my own about each.
  • Dropbox releases open-source Slack bot
    Dropbox is looking to tackle unauthorized access and other security incidents in the workplace with a chatbot. Called Securitybot, it that can automatically grab alerts from security monitoring tools and verify incidents with other employers. The company says that through the use of the chatbot, which is open source, it will no longer be necessary to manually reach out to employees to verify access, every time someone enters a sensitive part of the system. The bot is built primarily for Slack, but it is designed to be transferable to other platforms as well.
  • Dropbox’s tool shows how chatbots could be future of cybersecurity
    Disillusion with chatbots has set in across the tech industry and yet Dropbox’s deep thinkers believe they have spotted the technology’s hidden talent: cybersecurity.

Desktop GNU/Linux

  • Entroware have unleashed the 'Aether' laptop for Linux enthusiasts featuring Intel's 7th generation CPUs
  • New Entroware Aether Laptop Pairs Intel Kaby Lake with Ubuntu
    The new Entroware Aether is the latest Linux powered laptop from British company Entroware, and is powered by the latest Intel Kaby Lake processors.
  • Freedom From Microsoft v1.01
    But we can be Free from Microsoft! As we saw above, there is a powerful – and now popular movement afoot to make alternative software available. The Free Software Foundation, and the GNU Project, both founded by Richard Stallman, provide Free software to users with licenses that guarantee users rights: the rights to view, modify, and distribute the software source code. With GNU-licensed software, such as Linux, the user is in complete control over the software they employ. And as people contribute to modify Free Software source code, and are required to share those modifications again, the aggregate creative acts give rise to the availability of many more, much more useful results. Value is created beyond what anyone thought possible, and our freedom multiplies.
  • Review of the week 2017/08
    This week we had to cancel a couple snapshots, as a regression in grub was detected, that caused issues on chain-loading bootloaders. But thanks to our genius maintainers, the issue could be found, fixed and integrated into Tumbleweed (and this despite being busy with hackweek! A great THANK YOU!). Despite those canceled snapshots, this review will still span 4 revisions: 0216, 0218, 0219 and 0224. And believe me, there have been quite some things coming your way.

Security Leftovers

  • [Older] The Secure Linux OS - Tails
    Some people worry a lot about security issues. Anyone can worry about their personal information, such as credit card numbers, on the Internet. They can also be concerned with someone monitoring their activity on the Internet, such as the websites they visit. To help ease these frustrations about the Internet anyone can use the Internet without having to “look over their shoulder”.
  • Password management made easy as news of CloudFlare leak surfaces
    In the last 24 hours, news broke that a serious Cloudflare bug has been causing sensitive data leaks since September, exposing 5.5 million users across thousands of websites. In addition to login data cached by Google and other search engines, it is possible that some iOS applications have been affected as well. With the scale of this leak, the best course of action is to update every password for every site you have an account for. If there was ever a good time to modernize your password practices, this is it. As consumers and denizens of the Internet, we have a responsibility to be aware of the risks we face and make an attempt to mitigate that risk by taking best-effort precautions. Poor password and authentication hygiene leaves a user open to risks such as credit card fraud and identity theft, just like forgetting to brush your teeth regularly can lead to cavities and gum disease. This leaves us with the question of what good password and authentication hygiene looks like. If we stick with the (admittedly poorly chosen) dentistry analogy, then there are five easily identifiable aspects of good hygiene.
  • Security: You might want to change passwords on sites that use Cloudflare
  • Smoothwall Express
    The award-winning Smoothwall Express open-source firewall—designed specifically to be installed and administered by non-experts—continues its forward development march with a new 3.1 release.

Leftovers: Ubuntu and Derivatives

  • 'Big Bang Theory's' Stuart wears Ubuntu T-shirt
    Am I the only person to notice that comic book shop-owning Stuart (Kevin Sussman) on the "The Big Bang Theory" is wearing an Ubuntu T-shirt on the episode airing Thursday, Feb. 23, 2017? (It's Season 10, Episode 17, if that information helps you.) The T-shirt appearance isn't as overt as Sheldon's mention of the Ubuntu Linux operating system way back in Season 3 (Episode 22, according to one YouTube video title), but it's an unusual return for Ubuntu to the world of "Big Bang."
  • Unity Explained: A Look at Ubuntu’s Default Desktop Environment
    Ubuntu is the most well-known version of Linux around. It’s how millions of people have discovered Linux for the first time, and continues to draw new users into the world of open source operating systems. So the interface Ubuntu uses is one many people are going to see. In this area, Ubuntu is unique. Even as a new user, rarely will you confuse the default Ubuntu desktop for something else. That’s because Ubuntu has its own interface that you can — but probably won’t — find anywhere else. It’s called Unity.
  • A Look at Ubuntu MATE 16.04.2 LTS for Raspberry Pi
    Installing Ubuntu MATE onto my Raspberry Pi 3 was straight forward. You can easily use Etcher to write the image to a microSD card, the partition is automatically resized to fill your microSD card when the pi is powered up for the first time, and then you are sent through a typical guided installer. Installation takes several minutes and finally the system reboots and you arrive at the desktop. A Welcome app provides some good information on Ubuntu MATE, including a section specific for the Raspberry Pi. The Welcome app explains that the while the system is based on Ubuntu MATE and uses Ubuntu armhf base, it is in fact using the same kernel as Raspian. It also turns out that a whole set of Raspian software has been ported over such as raspi-config, rpi.gpio, sonic-pi, python-sent-hat, omxplayer, etc. I got in a very simple couple of tests that showed that GPIO control worked.
  • Zorin OS 12 Business Has Arrived [Ed: Zorin 12.1 has also just been released]
    This new release of Zorin OS Business takes advantage of the new features and enhancements in Zorin OS 12, our biggest release ever. These include an all new desktop environment, a new way to install software, entirely new desktop apps and much more. You can find more information about what’s new in Zorin OS 12 here.