Language Selection

English French German Italian Portuguese Spanish

Enough Keyword Searches. Just Answer My Question.

Filed under
Web

SEARCH engines are so powerful. And they are so pathetically weak.

When it comes to digging up a specific name, date, phrase or price, search engines are unstoppable. The same is true for details from the previously concealed past. For better and worse, any information about any of us - true or false, flattering or compromising - that has ever appeared on a publicly available site is likely to be retrievable forever, or until we run out of electricity for the server farms. Carefree use of e-mail was once a sign of sophistication. Now to trust confidential information to e-mail is to be a rube. Despite the sneering term snail mail, plain old letters are the form of long-distance communication least likely to be intercepted, misdirected, forwarded, retrieved or otherwise inspected by someone you didn't have in mind.

Yet for anything but simple keyword queries, even the best search engines are surprisingly ineffective.

Recently, for example, I was trying to track the changes in California's spending on its schools. In the 1960's, when I was in public school there, the legend was that only Connecticut spent more per student than California did. Now, the legend is that only the likes of Louisiana and Mississippi spend less. Was either belief true? When I finally called an education expert on a Monday morning, she gave me the answer off the top of her head. (Answer: right in spirit, exaggerated in detail.) But that was only after I'd wasted what seemed like hours over the weekend with normal search tools. If it sounds easy, try using keyword searches to find consistent state-by-state data covering the last 40 years.

We live with these imperfections by trying to outguess the engines - what if I put "per capita spending by states" in quotation marks? - and by realizing that they're right for some jobs and wrong for others.

One branch of the federal government is desperate enough for a better search tool that its efforts could be a stimulus for fundamental long-term improvements. Last week, I spent a day at a workshop near Washington for the Aquaint project, whose work is unclassified but has gone virtually unnoticed in the news media. The name stands for "advanced question answering for intelligence," and it refers to a joint effort by the National Security Agency, the C.I.A. and other federal intelligence organizations. To computer scientists, "question answering," or Q.A., means a form of search that does not just match keywords but also scans, parses and "understands" vast quantities of information to respond to queries. An ideal Q.A. system would let me ask, "How has California's standing among states in per-student school funds changed since the 1960's?" - and it would draw from all relevant sources to find the right answer.

In the real Aquaint program, the questions are more likely to be, "Did any potential terrorist just buy an airplane ticket?" or "How strong is the new evidence of nuclear programs in Country X?" The presentations I saw, by scientists at universities and private companies, reported progress on seven approaches to the problem. (The new I.B.M. search technology discussed here last year is also part of the Aquaint project.)

There will be more to say later about this effort. On the bright side, apart from whatever the project does for national security, its innovations could eventually improve civilian search systems, much as the Pentagon's Arpanet eventually became the civilian Internet. Of course, the dark potential in ever more effective search-and-surveillance systems is also obvious.

For the moment, consider several here-and-now innovations that can improve on the standard Google-style list of search hits. Ask Jeeves, whose site is Ask.com, recently introduced two features that enhance its long-established question-and-answer format. One tries to recast search terms into a question that can be answered on the Web; the other offers suggestions to broaden or narrow the search. Answers.com, a free version of what was once called GuruNet, combines conventional search results with questions and answers.

Two related sites, Clusty.com and its parent, Vivisimo.com, categorize the hits from each search, producing a kind of table of contents of results. Another site, Grokker.com, does something similar in a visual form; it is free online or $49 for a desktop version. And the bizarrely named but extremely useful MrSapo.com has become my favorite search portal, because it allows quick, easy comparisons of the results of the same search on virtually any major engine.

By JAMES FALLOWS.

More in Tux Machines

Leftovers: OSS

OSS in the Back End

  • Open Source NFV Part Four: Open Source MANO
    Defined in ETSI ISG NFV architecture, MANO (Management and Network Orchestration) is a layer — a combination of multiple functional entities — that manages and orchestrates the cloud infrastructure, resources and services. It is comprised of, mainly, three different entities — NFV Orchestrator, VNF Manager and Virtual Infrastructure Manager (VIM). The figure below highlights the MANO part of the ETSI NFV architecture.
  • After the hype: Where containers make sense for IT organizations
    Container software and its related technologies are on fire, winning the hearts and minds of thousands of developers and catching the attention of hundreds of enterprises, as evidenced by the huge number of attendees at this week’s DockerCon 2016 event. The big tech companies are going all in. Google, IBM, Microsoft and many others were out in full force at DockerCon, scrambling to demonstrate how they’re investing in and supporting containers. Recent surveys indicate that container adoption is surging, with legions of users reporting they’re ready to take the next step and move from testing to production. Such is the popularity of containers that SiliconANGLE founder and theCUBE host John Furrier was prompted to proclaim that, thanks to containers, “DevOps is now mainstream.” That will change the game for those who invest in containers while causing “a world of hurt” for those who have yet to adapt, Furrier said.
  • Is Apstra SDN? Same idea, different angle
    The company’s product, called Apstra Operating System (AOS), takes policies based on the enterprise’s intent and automatically translates them into settings on network devices from multiple vendors. When the IT department wants to add a new component to the data center, AOS is designed to figure out what needed changes would flow from that addition and carry them out. The distributed OS is vendor-agnostic. It will work with devices from Cisco Systems, Hewlett Packard Enterprise, Juniper Networks, Cumulus Networks, the Open Compute Project and others.
  • MapR Launches New Partner Program for Open Source Data Analytics
    Converged data vendor MapR has launched a new global partner program for resellers and distributors to leverage the company's integrated data storage, processing and analytics platform.
  • A Seamless Monitoring System for Apache Mesos Clusters
  • All Marathons Need a Runner. Introducing Pheidippides
    Activision Publishing, a computer games publisher, uses a Mesos-based platform to manage vast quantities of data collected from players to automate much of the gameplay behavior. To address a critical configuration management problem, James Humphrey and John Dennison built a rather elegant solution that puts all configurations in a single place, and named it Pheidippides.
  • New Tools and Techniques for Managing and Monitoring Mesos
    The platform includes a large number of tools including Logstash, Elasticsearch, InfluxDB, and Kibana.
  • BlueData Can Run Hadoop on AWS, Leave Data on Premises
    We've been watching the Big Data space pick up momentum this year, and Big Data as a Service is one of the most interesting new branches of this trend to follow. In a new development in this space, BlueData, provider of a leading Big-Data-as-a-Service software platform, has announced that the enterprise edition of its BlueData EPIC software will run on Amazon Web Services (AWS) and other public clouds. Essentially, users can now run their cloud and computing applications and services in an Amazon Web Services (AWS) instance while keeping data on-premises, which is required for some companies in the European Union.

today's howtos

Industrial SBC builds on Raspberry Pi Compute Module

On Kickstarter, a “MyPi” industrial SBC using the RPi Compute Module offers a mini-PCIe slot, serial port, wide-range power, and modular expansion. You might wonder why in 2016 someone would introduce a sandwich-style single board computer built around the aging, ARM11 based COM version of the original Raspberry Pi, the Raspberry Pi Compute Module. First off, there are still plenty of industrial applications that don’t need much CPU horsepower, and second, the Compute Module is still the only COM based on Raspberry Pi hardware, although the cheaper, somewhat COM-like Raspberry Pi Zero, which has the same 700MHz processor, comes close. Read more