Language Selection

English French German Italian Portuguese Spanish

PhpDig excels at small Web site indexing

Filed under
HowTos

Webmasters looking to provide search capabilities for their site would do well to try out PhpDig, a Web spider and search engine written in PHP with a MySQL backend. There are other open source search engines, all of which have their own advantages. PhpDig just happens to suit the needs of my Information Technology for Greenhouses and Horticulture site. Here's how I got it working.

Webmasters with small sites know the problem of providing useful site search capabilities. Typically, visitors enter keywords in a search box and the search engine returns a ranked list of pages related to the query. This is a useful service -- provided the visitor can tune the search and the results returned are reliable and relevant.

Some Webmasters rely on Google for this service. A listing in Google or another mainstream engine is a must-have in practical terms, so it is easy enough to piggyback on the main engine with a site-specific search, provided Google understands your site and keeps coming back for updates -- but this isn't always the case.

Large search engines boast of indexing of billions of pages, but we are only interested in digesting a hundred pages or so. We need them indexed on a regular basis, daily or at least more often than Google might do it.

It is also important to know if our site is responding correctly by providing public pages, hiding private pages, and following links correctly. Since Google uses algorithms that it doesn't share, we have no way of predicting the indexing results or doing any testing in advance. Advance testing is useful if, for example, you have private files that you want to be sure will not be indexed, but you are relying on your robots.txt file to deny access to bots. If we make a spelling mistake in robots.txt, our private pages could go in Google's cache for the world to read. We also need to control what words are indexed and customize our own search and result pages.

Enter PhpDig.

More in Tux Machines

Mesa 10.3 release candidate 2

Mesa 10.3 release candidate 2 is now available for testing. The current plan of record is to have an additional release candidate each Friday until the 10.3 release on Friday, September 12th. The tag in the GIT repository for Mesa 10.3-rc2 is 'mesa-10.3-rc2'. I have verified that the tag is in the correct place in the tree. Mesa 10.3 release candidate 2 is available for download at ftp://freedesktop.org/pub/mesa/10.3/ Read more

Linux 3.17-rc3

I'm back to the usual Sunday release schedule, and -rc3 is out there now. As expected, it is larger than rc2, since people are clearly getting back from their Kernel Summit travels etc. But happily, it's not *much* larger than rc2 was, and there's nothing particularly odd going on, so I'm going to just ignore the whole "it's summer" argument, and hope that things are just going that well. Please don't prove me wrong, Linus Read more

Revisiting How We Put Together Linux Systems

Traditional Linux distributions are built around packaging systems like RPM or dpkg, and an organization model where upstream developers and downstream packagers are relatively clearly separated: an upstream developer writes code, and puts it somewhere online, in a tarball. A packager than grabs it and turns it into RPMs/DEBs. The user then grabs these RPMs/DEBs and installs them locally on the system. For a variety of uses this is a fantastic scheme: users have a large selection of readily packaged software available, in mostly uniform packaging, from a single source they can trust. In this scheme the distribution vets all software it packages, and as long as the user trusts the distribution all should be good. The distribution takes the responsibility of ensuring the software is not malicious, of timely fixing security problems and helping the user if something is wrong. Read more

See How Your Linux System Performs Against The Latest Intel/AMD CPUs

This holiday weekend (in the US) can be a great time to test your Linux system to see how it's performing against the latest AMD and Intel processors to see if it's time for a good upgrade. This weekend I'm working on many Linux CPU benchmarks for the upcoming Linux review of the Intel Core i7 5960X Haswell-E system (still waiting for Intel's review sample to arrive though...) and also have some other hardware in preparation for an unrelated launch that's happening next week from another vendor. I'm testing several different Intel/AMD CPUs from the latest desktop CPUs to the Extreme Edition models to some slightly older parts. Beyond the raw performance results are also the power consumption data and much more. Read more