Language Selection

English French German Italian Portuguese Spanish

The Road to KDE 4: Strigi and File Information Extraction

Filed under
KDE

After a short delay due to a heavy dosage of Real Life(tm), I return to bring you more on the technologies behind KDE 4. This week I am featuring Strigi, an information extraction subsystem that is being fully deployed for KDE 4.0. KDE has previously had the ability to extract information about files of various types, and has used them in a variety of functional contexts, such as the Properties Dialog. Strigi promises many improvements over the existing versions. Read on for more...

Strigi is a library that sits at a lower level than KDE. It is written in C++, and is designed to present a series of generic calls that a program can use to find more information about a given file or files. It is in no way tied to KDE except that the development version lives in KDE's SVN repository. It also has search capabilities, which are not really the focus of this article.

The Strigi libraries are used to get information from within files, such as the dimensions of an image, or the length of an audio clip, embedded thumbnails, number of lines in a log, source code licensing info or just to search a text file for a given string. Strigi has other advantages, as it can work inside compressed files, archives, and so forth seamlessly. In fact, it ships a few useful utility programs, called deepgrep and deepfind. These useful command line programs allow you to search for information within binary file formats as easily as using grep or find on plain text files. KDE is inheriting the same libraries, so we also get this unique advantage of being able to pull information out of files that are buried within binary formats, such as .tgz files.

Full Story.

More in Tux Machines

The current state of Drupal security

Greg Knaddison has worked for big consulting firms, boutique software firms, startups, professional service firms, and former Drupal Security Team leader. He is currently the director of Engineering at CARD.com and a Drupal Association advisory board member. Michael Hess works with the University of Michigan School of Information and the UM Medical Center teaching three courses on content management platforms and overseeing the functionality of hundreds of campus websites. He serves in a consulting and development role for many other university departments and is the current Drupal Security Team leader. He also consults with BlueCross on large-scale medical research projects. Hess is a graduate of the University of Michigan School of Information with a master's degree in information. Read more

Ultimate Boot CD Live Aims to Become a Parted Magic Replacement, Based on Debian

The development team behind the popular UBCD (Ultimate Boot CD) project have announced recently that they are working on a Live version of Ultimate Boot CD, which is currently based on the Debian GNU/Linux operating system and has the ultimate goal of becoming a Parted Magic replacement. Read more

Linux Kernel 3.14.40 LTS Arrives with ARM Improvements, Updated Drivers

Linux kernel 3.14.40 LTS arrived a few days ago, as announced by Greg Kroah-Hartman on the kernel mailinglist, and it brings a number of important improvements to the ARM and PowerPC architectures, as well as several updated drivers. Read more

CoreOS Gives Up Control of Non-Docker Linux Container Standard

Taking a major step forward in its quest to drive a Linux container standard that’s not created and controlled by Docker or any other company, CoreOS spun off management of its App Container project into a stand-alone foundation. Google, VMware, Red Hat, and Apcera have announced support for the standard. Becoming a more formalized open source project, the App Container (appc) community now has a governance policy and has added a trio of top software engineers that work on infrastructure at Google, Twitter, and Red Hat as “community maintainers.” Read more