Language Selection

English French German Italian Portuguese Spanish

Enough Keyword Searches. Just Answer My Question.

Filed under

SEARCH engines are so powerful. And they are so pathetically weak.

When it comes to digging up a specific name, date, phrase or price, search engines are unstoppable. The same is true for details from the previously concealed past. For better and worse, any information about any of us - true or false, flattering or compromising - that has ever appeared on a publicly available site is likely to be retrievable forever, or until we run out of electricity for the server farms. Carefree use of e-mail was once a sign of sophistication. Now to trust confidential information to e-mail is to be a rube. Despite the sneering term snail mail, plain old letters are the form of long-distance communication least likely to be intercepted, misdirected, forwarded, retrieved or otherwise inspected by someone you didn't have in mind.

Yet for anything but simple keyword queries, even the best search engines are surprisingly ineffective.

Recently, for example, I was trying to track the changes in California's spending on its schools. In the 1960's, when I was in public school there, the legend was that only Connecticut spent more per student than California did. Now, the legend is that only the likes of Louisiana and Mississippi spend less. Was either belief true? When I finally called an education expert on a Monday morning, she gave me the answer off the top of her head. (Answer: right in spirit, exaggerated in detail.) But that was only after I'd wasted what seemed like hours over the weekend with normal search tools. If it sounds easy, try using keyword searches to find consistent state-by-state data covering the last 40 years.

We live with these imperfections by trying to outguess the engines - what if I put "per capita spending by states" in quotation marks? - and by realizing that they're right for some jobs and wrong for others.

One branch of the federal government is desperate enough for a better search tool that its efforts could be a stimulus for fundamental long-term improvements. Last week, I spent a day at a workshop near Washington for the Aquaint project, whose work is unclassified but has gone virtually unnoticed in the news media. The name stands for "advanced question answering for intelligence," and it refers to a joint effort by the National Security Agency, the C.I.A. and other federal intelligence organizations. To computer scientists, "question answering," or Q.A., means a form of search that does not just match keywords but also scans, parses and "understands" vast quantities of information to respond to queries. An ideal Q.A. system would let me ask, "How has California's standing among states in per-student school funds changed since the 1960's?" - and it would draw from all relevant sources to find the right answer.

In the real Aquaint program, the questions are more likely to be, "Did any potential terrorist just buy an airplane ticket?" or "How strong is the new evidence of nuclear programs in Country X?" The presentations I saw, by scientists at universities and private companies, reported progress on seven approaches to the problem. (The new I.B.M. search technology discussed here last year is also part of the Aquaint project.)

There will be more to say later about this effort. On the bright side, apart from whatever the project does for national security, its innovations could eventually improve civilian search systems, much as the Pentagon's Arpanet eventually became the civilian Internet. Of course, the dark potential in ever more effective search-and-surveillance systems is also obvious.

For the moment, consider several here-and-now innovations that can improve on the standard Google-style list of search hits. Ask Jeeves, whose site is, recently introduced two features that enhance its long-established question-and-answer format. One tries to recast search terms into a question that can be answered on the Web; the other offers suggestions to broaden or narrow the search., a free version of what was once called GuruNet, combines conventional search results with questions and answers.

Two related sites, and its parent,, categorize the hits from each search, producing a kind of table of contents of results. Another site,, does something similar in a visual form; it is free online or $49 for a desktop version. And the bizarrely named but extremely useful has become my favorite search portal, because it allows quick, easy comparisons of the results of the same search on virtually any major engine.


More in Tux Machines

Games and CrossOver

Red Hat and Fedora

Android Leftovers

Leftovers: OSS and Sharing

  • CoreOS Tectonic Now Installs Kubernetes on OpenStack
    CoreOS and OpenStack have a somewhat intertwined history, which is why it's somewhat surprising it took until today for CoreOS's Tectonic Kubernetes distribution to provide an installer that targets OpenStack cloud deployments.
  • Docker and Core OS plan to donate their container technologies to CNCF
    Containers have become a critical component of modern cloud, and Docker Inc. controls the heart of containers, the container runtime. There has been a growing demand that this critical piece of technology should be under control of a neutral, third party so that the community can invest in it freely.
  • How Blockchain Is Helping China Go Greener
    Blockchain has near-universal applicability as a distributed transaction platform for securely authenticating exchanges of data, goods, and services. IBM and the Beijing-based Energy-Blockchain Labs are even using it to help reduce carbon emissions in air-polluted China.
  • An efficient approach to continuous documentation
  • The peril in counting source lines on an OSS project
    There seems to be a phase that OSS projects go through where as they mature and gain traction. As they do it becomes increasingly important for vendors to point to their contributions to credibly say they are the ‘xyz’ company. Heptio is one such vendor operating in the OSS space, and this isn’t lost on us. :) It helps during a sales cycle to be able to say “we are the a big contributor to this project, look at the percentage of code and PRs we submitted”. While transparency is important as is recognizing the contributions that key vendors, focus on a single metric in isolation (and LoC in particular) creates a perverse incentive structure. Taken to its extreme it becomes detrimental to project health.
  • An Open Source Unicycle Motor
    And something to ponder. The company that sells this electric unicycle could choose to use a motor with open firmware or one with closed firmware. To many consumers, that difference might not be so significant. To this consumer, though, that’s a vital difference. To me, I fully own the product I bought when the firmware is open. I explain to others that they ought to choose that level of full ownership whenever they get a chance. And if they join a local makerspace, they will likely meet others with similar values. If you don’t yet have a makerspace in your community, inquire around to see if anyone is in the process of forming one. Then find ways to offer them support. That’s how we do things in the FOSS community.
  • The A/V guy’s take on PyCon Pune
    “This is crazy!”, that was my reaction at some point in PyCon Pune. This is one of my first conference where I participated in a lot of things starting from the website to audio/video and of course being the speaker. I saw a lot of aspects of how a conference works and where what can go wrong. I met some amazing people, people who impacted my life , people who I will never forget. I received so much of love and affection that I can never express in words. So before writing anything else I want to thank each and everyone of you , “Thank you!”.
  • Azure Service Fabric takes first tentative steps toward open source [Ed: Microsoft Peter is openwashing a patent trap with back doors]
  • Simulate the Internet with Flashback, a New WebDev Test Tool from LinkedIn
  • Mashape Raises $18M for API Gateway Tech
    Casado sees Mashape's Kong API gateway in particular as being a particularly well positioned technology. Kong is an open-source API gateway and microservice management technology.
  • PrismTech to Demonstrate Open Source FACE 2.1 Transport Services Segment (TSS) Reference Implementation at Air Force FACE Technical Interchange Meeting
    PrismTech’s TSS reference implementation is being made available under GNU Lesser General Public License (LGPL) v3 open source license terms.
  • How Open-Source Robotics Hardware Is Accelerating Research and Innovation

    The latest issue of the IEEE Robotics & Automation Magazine features a special report on open-source robotics hardware and its impact in the field.