Language Selection

English French German Italian Portuguese Spanish

Debate Over the Size of the Web

Filed under
Web

How big is the World Wide Web? Many Internet engineers consider that query one of those imponderable philosophical questions, like how many angels can dance on the head of a pin.

But the question about the size of the Web came under intense debate last week after Yahoo announced at an Internet search engine conference in Santa Clara, Calif., that its search engine index - an accounting of the number of documents that can be located from its databases - had reached 19.2 billion.

Because the number was more than twice as large as the number of documents (8.1 billion) currently reported by Google, Yahoo's fierce competitor and Silicon Valley neighbor, the announcement - actually a brief mention in a Yahoo company Web log - set off a spat. Google questioned the way its rival was counting its numbers.

Sergey Brin, Google's co-founder, suggested that the Yahoo index was inflated with duplicate entries in such a way as to cut its effectiveness despite its large size.

"The comprehensiveness of any search engine should be measured by real Web pages that can be returned in response to real search queries and verified to be unique," he said on Friday. "We report the total index size of Google based on this approach."

But Yahoo executives stood by their earlier statement. "The number of documents in our index is accurate," Jeff Weiner, senior vice president of Yahoo's search and marketplace group, said on Saturday. "We're proud of the accomplishments of our search engineers and scientists and look forward to continuing to satisfy our users by delivering the world's highest-quality search experience."

The scope of Internet search engines, and thus indirectly the size of the Internet, has long been a lively area of computer science research and debate.

Moreover, all camps in the discussion are quick to note that index size is only loosely - and possibly even somewhat inversely - related to the quality of results returned.

The major commercial search engines use software programs known as Web crawlers to scour the Internet systematically for documents and index them.

The indexes themselves are maintained as arcane structures of computer data that permit the search engines to return lists of hundreds of answers in fractions of a second when Web users enter terms like "Britney Spears" or "Iraq and weapons of mass destruction."

On Sunday, researchers at the National Center for Supercomputer Applications attempted to shed light on the debate by performing a large number of random searches on both indices. They ran a random sample of 10,012 queries and concluded that Google, on average, returned 166.9 percent more results than Yahoo. In only three percent of the cases did the Yahoo searches return more queries than Google. The group said the Yahoo index claim was suspicious.

Neither Yahoo nor Google makes public the software algorithms that underlie their collection methods. In fact, those details are closely guarded secrets, which lie near the heart of heated competition now going on between Google, Yahoo and Microsoft over who can provide the most relevant answers to a user's query.

"It's a little bit silly," said Christopher Manning, a Stanford University professor who teaches a course on information retrieval. "It's difficult, and the whole question of how big indexes are has clearly become extremely political and commercial."

Even if the methodology is unclear, there is no shortage of outside speculation about what the different numbers mean, if anything.

Jean Veronis, a linguist in France and director of the Centre Informatique pour les Lettres et Sciences Humaines, posted a discussion on a blog noting that the increase in Yahoo references in French appeared consistent with the larger overall number that Yahoo was now reporting.

He added a caveat, however. "All of this should of course be taken with a large pinch of salt," he wrote. "So far, I haven't quite caught Yahoo red-handed when it comes to fiddling the books, but this could simply be because they are smarter with their figures than their competitors ;-)"

In contrast, a fellow blogger, Akash Jain, did his own random query test and wrote that it appeared that Google's index remained about 50 percent larger.

Other search engine specialists remained skeptical about the ability to estimate Web or index size as long as the search engines were being secretive about their methods. "I don't have any good way of checking," said Raul Valdes-Perez, a computer scientist who is chief executive of Vivisimo, which operates the Clusty search engine. "It feels a little like Harvard and Yale decided to argue over who has the most books in their respective libraries."

By JOHN MARKOFF
The New York Times

More in Tux Machines

Leftovers: Gaming

Android Leftovers

  • Android Candy: Intercoms
    Ever since my "tiny $20 tablet" project (see my Open-Source Classroom column in the March 2015 issue), I've been looking for more and more cool things to do with cheap Android devices. Although the few obvious ones like XBMC or Plex remotes work well, I've recently found that having Android devices around the house means I can gain back an old-school ability that went out of style in the late 1980s—namely, an intercom system.
  • There's a wild prank hidden in Google Maps that insults Apple in the most childishly inappropriate way
    Rawalpindi is a vibrant Pakistani city known for its bazaars, ancient ruins, and array of religious shrines. But if you pay it a visit on Google Maps, you're going to notice something very unusual on the outskirts of the city — the Android "droid" mascot urinating on the Apple logo.
  • There's an Android bot peeing on an Apple logo on Google Maps
    Sick of all the Apple Watch news today? You're in luck, because we have something completely different for you. An image of an Android mascot, also known as an Android bot or Bugdroid, peeing on an Apple logo has been discovered on Google Maps.
  • An Android robot is peeing on an Apple logo in Google Maps
  • An Android is urinating on the Apple logo in Google Maps (update)
    Google and Apple have always had their differences, but a new Easter egg inside Google Maps has just taken their rivalry to a whole new level. As spotted by Team Android, if you head to these coordinates with the regular Map view enabled, you'll see Google's iconic Android mascot taking a leak on the Apple logo. At the moment, it's unclear who created this little piece of mischief and whether Google is taking action. But if this hidden message is any indication, it was snuck through by a member of the public using Google's Map Maker service, rather than a Google employee. Regardless, it's a crazy (and pretty hilarious) addition that's sure to rile some of the employees in Cupertino. Shots fired!
  • Sony's Android TV-powered 4K televisions are ridiculously thin
    Four models from Sony’s 2015 Android TV-powered 4K television range are now available for pre-order, with shipping to begin in May. The Japanese electronics giant unveiled its 4K TV lineup for 2015 at the Consumer Electronics Show in January, but kept pricing and release information to itself, only saying the new sets would be available sometime in the spring. Those details are finally here and the TVs themselves aren’t far off.
  • Android Wear v1.1 APK has Apple references in it, but when is iOS support coming?
    That Google is working on iOS support for Android Wear is nearly undeniable at this point, but even more evidence has surfaced in case you aren’t a believer. We peeked inside the latest Android Wear update APK to see what hidden bits were swarming about, and we came across some very interesting references.
  • 5 Things to Expect from the Nexus 5 Android 5.1.1 Release
    A few weeks ago, an Android 5.1.1 update mysteriously appeared alongside an update for Google’s Android SDK. Earlier this week, Google finally confirmed the Nexus Android 5.1.1 release with an update for its Nexus Player. With an Android 5.1.1 update now on the minds of Nexus users, particularly Nexus 5 users dealing with Android 5.0 Lollipop problems, we want to take a look at what we expect from the Nexus 5 Android 5.1 release from Google.

The Turing Phone Is Super Durable and Ultra Secure

The device also sports a 13MP/8MP camera combo, 64GB / 128GB of internal storage and runs Android 5.0 Lollipop out of the box. Read more

GNU/Linux Share of Global Page-Views Reaches New High

Eight days in April, 2015, so far, have reached 2% share of page-views for GNU/Linux on the desktop worldwide, according to data from StatCounter. Read more