Short bio: Computer Scientist, FOSS supporter (read more)
Tux Machines (TM)-specific
Google is the No. 1 free tool to snoop on friends or strangers. But government agencies including the Federal Aviation Administration are investing in a new search engine being developed at the University of Buffalo to do some of their more sensitive detective work.
The technology, released as a prototype in recent weeks, is designed to mine a corpus of documents for associated ideas or connections--connections between two unrelated concepts, for example, that would otherwise go unseen or would take countless hours of investigative work to discover. The project was specifically funded for anti-terrorism efforts and initially was used for searching over data within the 9/11 Commission report and public Web pages related to the suicide bombings carried out by terrorists who hijacked three U.S. commercial planes.
"Say you have the kind of question that connects these two people that we don't know about. You could start reading through all those documents. But our system is designed to look specifically for those evidence trails" that connect those two people, said Rohini Srihari, UB professor of computer science and engineering.
John McCarthy, professor emeritus of computer science at Stanford University, said that linking between concepts is an old idea, but that a new way of doing it could be an important breakthrough. In general, search engines such as Google and Yahoo mine documents for textual clues, or matches to query terms, rather than on the occurrence of ideas. Still, Google is working in the area of searching for concepts.
"The tools that we already have would be more useful if we could search on concepts," McCarthy said.
Srihari and a team in the Center of Excellence in Document Analysis and Recognition in the UB School of Engineering and Applied Sciences have been developing the search engine for the last two years. She said that her team plans to have a deliverable system for the FAA and the intelligence community by the end of the year, but it will not be widely available to the public. The underlying research, co-funded by the National Science Foundation, will also be published.
The technology, called a concept chain graph, uses different mathematical algorithms for finding the best path for connecting two different concepts. It will then list the strongest to weakest links.