Linux Desktop Search Engines Compared
I have a large electronic library (over 15,000 books) and I was looking for a way to cope with this mass of information. I didn't like the idea of a special catalog, since it would take a lot of manual work to enter the metadata. Besides, my books are in various formats, from HTML, to RTF, to DOC, to PDF, to DjVU. These files lack metadata way too often and I thought a local indexing service with a full-text search might solve my problem. I knew there are more options to choose from than just Google, but I could not find a good modern comparison. Even the table in Wikinfo's Comparison of desktop search software contained too many errors, as I discovered.
I had to compare them myself.
My task imposed certain restrictions on the one hand, but made the others irrelevant on the other hand. So, I was especially interested in a wide gamut of file types, in the ability to add new ones (Epub, fb2, html.zip) and in extensive query language. All software, except for GDS and DocFetcher, was installed from Ubuntu 9.10 repositories.
I have no special preferences regarding the backend, it may be Xapian- or Lucene-based tool, or even a custom backend. On the other hand, Xapian usually requires more disk space, and there is never too much space on desktops.