Language Selection

English French German Italian Portuguese Spanish

PhpDig excels at small Web site indexing

Filed under
HowTos

Webmasters looking to provide search capabilities for their site would do well to try out PhpDig, a Web spider and search engine written in PHP with a MySQL backend. There are other open source search engines, all of which have their own advantages. PhpDig just happens to suit the needs of my Information Technology for Greenhouses and Horticulture site. Here's how I got it working.

Webmasters with small sites know the problem of providing useful site search capabilities. Typically, visitors enter keywords in a search box and the search engine returns a ranked list of pages related to the query. This is a useful service -- provided the visitor can tune the search and the results returned are reliable and relevant.

Some Webmasters rely on Google for this service. A listing in Google or another mainstream engine is a must-have in practical terms, so it is easy enough to piggyback on the main engine with a site-specific search, provided Google understands your site and keeps coming back for updates -- but this isn't always the case.

Large search engines boast of indexing of billions of pages, but we are only interested in digesting a hundred pages or so. We need them indexed on a regular basis, daily or at least more often than Google might do it.

It is also important to know if our site is responding correctly by providing public pages, hiding private pages, and following links correctly. Since Google uses algorithms that it doesn't share, we have no way of predicting the indexing results or doing any testing in advance. Advance testing is useful if, for example, you have private files that you want to be sure will not be indexed, but you are relying on your robots.txt file to deny access to bots. If we make a spelling mistake in robots.txt, our private pages could go in Google's cache for the world to read. We also need to control what words are indexed and customize our own search and result pages.

Enter PhpDig.

More in Tux Machines

Udine city struggles to remove IT vendor lock-in

The Italian city of Udine is 'gradually and painfully' removing all the ties that bind the city's ICT systems to the usual proprietary operating systems and office productivity solutions, reports head of the IT department, Antonio Scaramuzzi. The city aims to slowly introduce more free and open source software alternatives. Unhurried, the municipality is implementing open source technologies where feasible, avoiding big migration projects, Scaramuzzi writes to the Open Source Observatory and Repository (OSOR). Earlier this month, IT trade news site Zdnet that the town is making Apache OpenOffice the default office suite. The software is already installed on all of the city's 900 PCs. ZDNet writes that this switch will save the city about 400 euro per PC in proprietary software licences. Read more

The Path to Full-time Open Source

Three months ago I quit my job to work on Sidekiq and build a brand new OSS project and commercial product. Tomorrow I want to introduce it to you. Read more

Mir 0.8 Works On Less ABI Breakage, Touchspots, Responsiveness

While Ubuntu 14.10 on the desktop isn't using Mir by default, Mir 0.8.0 is being prepared for release by Canonical and it has a number of interesting changes. Read more

Open source history, present day, and licensing

Looking at open source softwares particularly, this is a fact that is probably useful to you if you are thinking about business models, many people don't care about it anymore. We talk about FOSS, Free and Open Source Software, but if we really are strict there's a difference between free software and open source software. On the left, I have free software which most typically is GPL software. Software where the license insures freedom. It gives freedoms to you as a user, but it also requires that the freedoms are maintained. On the right-hand side, you have open source software which is open for all, but it also allows you to close it. So here we come back to the famous clause of the GPL license, the reciprocity requirement which says, "If I am open, you need to be open." So software that comes under the GPL license carries with it something that other people call a virus. I call it a blessing because I think it's great if all software becomes open. Read more