Language Selection

English French German Italian Portuguese Spanish

Debate Over the Size of the Web

Filed under
Web

How big is the World Wide Web? Many Internet engineers consider that query one of those imponderable philosophical questions, like how many angels can dance on the head of a pin.

But the question about the size of the Web came under intense debate last week after Yahoo announced at an Internet search engine conference in Santa Clara, Calif., that its search engine index - an accounting of the number of documents that can be located from its databases - had reached 19.2 billion.

Because the number was more than twice as large as the number of documents (8.1 billion) currently reported by Google, Yahoo's fierce competitor and Silicon Valley neighbor, the announcement - actually a brief mention in a Yahoo company Web log - set off a spat. Google questioned the way its rival was counting its numbers.

Sergey Brin, Google's co-founder, suggested that the Yahoo index was inflated with duplicate entries in such a way as to cut its effectiveness despite its large size.

"The comprehensiveness of any search engine should be measured by real Web pages that can be returned in response to real search queries and verified to be unique," he said on Friday. "We report the total index size of Google based on this approach."

But Yahoo executives stood by their earlier statement. "The number of documents in our index is accurate," Jeff Weiner, senior vice president of Yahoo's search and marketplace group, said on Saturday. "We're proud of the accomplishments of our search engineers and scientists and look forward to continuing to satisfy our users by delivering the world's highest-quality search experience."

The scope of Internet search engines, and thus indirectly the size of the Internet, has long been a lively area of computer science research and debate.

Moreover, all camps in the discussion are quick to note that index size is only loosely - and possibly even somewhat inversely - related to the quality of results returned.

The major commercial search engines use software programs known as Web crawlers to scour the Internet systematically for documents and index them.

The indexes themselves are maintained as arcane structures of computer data that permit the search engines to return lists of hundreds of answers in fractions of a second when Web users enter terms like "Britney Spears" or "Iraq and weapons of mass destruction."

On Sunday, researchers at the National Center for Supercomputer Applications attempted to shed light on the debate by performing a large number of random searches on both indices. They ran a random sample of 10,012 queries and concluded that Google, on average, returned 166.9 percent more results than Yahoo. In only three percent of the cases did the Yahoo searches return more queries than Google. The group said the Yahoo index claim was suspicious.

Neither Yahoo nor Google makes public the software algorithms that underlie their collection methods. In fact, those details are closely guarded secrets, which lie near the heart of heated competition now going on between Google, Yahoo and Microsoft over who can provide the most relevant answers to a user's query.

"It's a little bit silly," said Christopher Manning, a Stanford University professor who teaches a course on information retrieval. "It's difficult, and the whole question of how big indexes are has clearly become extremely political and commercial."

Even if the methodology is unclear, there is no shortage of outside speculation about what the different numbers mean, if anything.

Jean Veronis, a linguist in France and director of the Centre Informatique pour les Lettres et Sciences Humaines, posted a discussion on a blog noting that the increase in Yahoo references in French appeared consistent with the larger overall number that Yahoo was now reporting.

He added a caveat, however. "All of this should of course be taken with a large pinch of salt," he wrote. "So far, I haven't quite caught Yahoo red-handed when it comes to fiddling the books, but this could simply be because they are smarter with their figures than their competitors ;-)"

In contrast, a fellow blogger, Akash Jain, did his own random query test and wrote that it appeared that Google's index remained about 50 percent larger.

Other search engine specialists remained skeptical about the ability to estimate Web or index size as long as the search engines were being secretive about their methods. "I don't have any good way of checking," said Raul Valdes-Perez, a computer scientist who is chief executive of Vivisimo, which operates the Clusty search engine. "It feels a little like Harvard and Yale decided to argue over who has the most books in their respective libraries."

By JOHN MARKOFF
The New York Times

More in Tux Machines

babyliss curl secret in the whitney museum area

Cybercriminals are quite interested in YouTube, but they don't want to share good content. On the ground, he learned the Chinook in front of him "had almost been blown out of the sky"; he showed a photo of it with a gash from a rocket propelled grenade.. In many respects, the 911 Turbo developed into the car the 928S would probably have become; a savagely powerful, hugely capable but ultimately rather refined sports/GT weapon.Launched in 2000 in 420bhp coupe form, the Turbo was an instant hit and made many more expensive rivals suddenly look rather silly. Fatshionista: SALES POST: bubble dress and tops, Torrid, Alfani SALES POST: bubble dress and tops, Torrid, Alfani, INC, Baby Phat, sizes 14 18 and 0X 1X. I live in a field. According to arrest reports, on several occasions McCray would have friends come into the store and select items for purchase. Tirana, Albania Get in quick before everyone else does "You'll see wedding dress shops everywhere. The patterns through olden dealing mulberry bags outletbring more retro styles to the classic design, which make the pieces more vogue and elegant. Several viewers from the Bay Area tell us they been getting calls about having won the lottery. In fact, hobo designer leather handbags happen to be top fashion today. Should magenta is normally utilized using the stormy glimpse, it again makes an ideal phenomena connected with fashion. The stripe story was cute longline tops, cardigans and one intriguing hooded singlet dress with babyliss pro perfect curl cutout panels as were the denim overalls.. Most of the time when a man wears a jockstrap he will be wearing a cup with it. Fresh produce accounts for about 30 percent of Wal Mart's sales in its wholesale outlets in India.Wal Mart must buy in small batches from small plot holders in a country where more than 80 percent of farms are under 2 hectares. Up to 4 players can participate in an online race and can race in 4 game modes including circuit, sprint, lap knockout and speed trap. Booties with gold chain tassel trim. If you consider you're brilliant adequate to carry on into it then you could be the first getting babyliss perfect curl it on the shops. And you know but at the very least even have Time Warner Cable to wanna turn the tables what this can try and show you who's. Simple night sky exploration: the moon looks pretty coolcloseupWith all that said, the Optic 1050 is hollister hoodies a pretty well built viewing device. Another significant group consists of those that travel abroad extensively, and need to babyliss miracurl be capable of switch to carriers based within the countries they go to. Dr chen who had been to the Dr Oz exhibit stated that the good fresh fruit not merely suppresses desire to have ingredients and function as a fat blocker but she likewise stated that the Garcinia Cambogia improves muscles and decreases fat.

Calligra 2.9 Brings Biggest Krita Release and New Kexi Partnership

We are happy to announce the release of final version 2.9 of the Calligra Suite, Calligra Active and the Calligra Office Engine. This version is the result of thousands of changes which provide new features, polishing of the user experience and bug fixes. More

Meizu MX4 Ubuntu Edition Is Official, Will Be at MWC In March

No longer a rumour but fact: Meizu has confirmed the news on its social media accounts this morning, just as the latest flash sale for the Bq Ubuntu Phone was getting underway. Read more

Telegram Desktop for Linux Review

Telegram is an instant messaging service that is best known for its mobile implementation, but a desktop app is also available and it's even better than what users might expect. Read more