Short bio: Computer Scientist, FOSS supporter (read more)
Tux Machines (TM)-specific
This article pursues the analysis of DistroWatch.com's logs I started one week ago. Last time, the data were prepared so that we could investigate the evolution, in time and space, of the popularity of GNU/Linux distributions. Pre-processing the logs in a different manner allows to focus on other interesting questions. In this way, although the extracted patterns will have the same "shape" as in last week's extraction, they will, this time, help us in discovering groups of distributions fulfilling similar purposes.
Instead of last week's ternary relation, this time, we will end up with mining a 4-ary relation. More precisely, a symmetric graph of distributions evolving in time and space. Take the red pill and welcome to the real world... of data-mining! When a visitor of DistroWatch.com (identified by her IP address from which the country is inferred) visits, the same day, pages related to different distributions, she probably searches for a Free operating system to fulfill her specific needs.
Among the millions of visits in a semester, almost all pairs of distribution pages have, one day, been consulted by at least one visitor.
Let us start with the biggest community: the old mainstream general-purpose distributions. At the center of this community (again remember that this does not relate to popularity but to the common purposes these distributions serve), Slackware, Gentoo and Ubuntu. A bit further away (i.e., not as much related to the other distributions of this group), Fedora, openSUSE and Debian. At the border of this community, Yellow Dog, MEPIS, Mandriva, Vector, FreeBSD and Damn Small Linux. When looking at the countries present in these patterns, it appears that the visitors from some European countries are clearly those making these associations. The United Kingdom shows off by being in almost all these patterns. Finland is also extremely present. Australia, Greece and Denmark are not far away. Why would these European and Australian visitors focus more on mainstream distributions than others? Maybe they are more conservative and keep on tracking the evolution of these solid distributions instead of searching for more specialized ones.