Roy Schestowitz's blog
Summary: A 2-hour investigation reveals that Tux Machines is now the victim of an arrogant, out-of-control Baidu
TUX MACHINES has been mostly offline later this morning. It has evidently become the victim of Baidu's lawlessness, having fallen under huge dumps of requests from IP addresses which can be traced back to Baidu and whose requests say Baidu as well (we tried blocking these, but it's not easy to do by IP because they have so many). They don't obey
robots.txt rules; not even close! It turns out that others suffer from this as well. These A-holes have been causing a lot of problems to the site as of late (slowdowns was one of those problems), including damage to the underlying framework. Should we report them? To who exactly? Looking around the Web, there are no contact details (in English anyway) by which to reach them.
Baidu can be very evil towards Web sites. Evil. Just remember that. █
Update: 3 major DDOS attacks (so far today) led to a lot of problems and they also revealed that not Baidu was at fault but botmasters who used "Baidu" to masquerade themselves, hiding among some real and legitimate requests from Baidu (with Baidu-owned IP addresses). We have changed our firewall accordingly. We don't know who's behind these attacks and what the motivations may be.
QUIETLY but surely, last week marked an important milestone, with traffic at the back end (not the cache layer*) exceeding 1.8 million hits, thus establishing a new record. So far this week it looks as though we are going to break this record again. We hope that the new format, which places emphasis on high importance links (as standalone nodes) and puts less important links in topical groups (grouping like games or howtos), makes reading the site more convenient and makes keeping abreast of the news easier, without getting overloaded in a way that is not somewhat manageable (links inside groups are typically less important, as intended). We're open to any suggestions readers may have to ensure we remain a leading syndicator of GNU/Linux and Free/Open Source software news. Any feedback can improve the site. █
* It is difficult to measure what happens at the Varnish layer as it's shared among several domains, including Techrights.
IN CASE it's not already obvious, we have been posting fewer links since the 14th of this month because we are both away and we catch up with some news only when time permits. Today's hot day (38 degrees) will probably allow us to stay indoors more time than usual and therefore post some more links (from Rianne's laptop), but a week for now is when we'll properly catch up with everything that was missed and gradually get back to normal, hopefully for a long time to come.
Please bear with us while we enjoy our last chance to have a summer vacation. It's already cold back home in Manchester. █
Summary: Some numbers to show what goes on in sites that do not share information about their visitors (unlike Windows-centric sites which target non-technical audiences)
THE common perception of GNU/Linux is that it is scarcely used, based on statistics gathered from privacy-hostile Web sites that share (or sell) access log data, embed spyware in all of their pages, and so on. Our sites are inherently different because of a reasonable -- if not sometimes fanatic -- appreciation of privacy at both ends (server and client). People who read technical sites know how to block ads, impede spurious scripts etc. These sites also actively avoid anything which is privacy-infringing, such as interactive 'social' media buttons (these let third parties spy on all visitors in all pages).
Techrights and Tux Machines attract the lion's share our traffic (and server capacity). They both have dedicated servers. These are truly popular and some of the leaders in their respective areas. Techrights deals with threats to software freedom, whereas Tux Machines is about real-time news discovery and organisation (pertaining to Free software and GNU/Linux).
The Varnish layer, which protects both of these large sites (nearly 100,000 pages in each, necessitating a very large cache pool), handles somewhere between a gigabyte to 2.5 gigabytes of data per hour (depending on the time of day, usually somewhere in the middle of this range, on average).
The Apache layer, which now boasts 32 GB of RAM and sports many CPU cores, handled 1,324,232 hits for Techrights (ranked 6636th for traffic in Netcraft) in this past week and 1,065,606 for Tux Machines (ranked 6214th for traffic in Netcraft).
Based on VISITORS Web Log Analyzer, this is what we've had in Techrights:
Unknown: (e.g. bots/spiders): (23.0%)
As a graph (charted with LibreOffice):
Tux Machines reveals a somewhat different pattern. Based on
grepping/filtering the of past month's log at the Apache back end (not Varnish, which would have been a more sensible but harder thing to do), presenting the top 3 only:
One month is as far as retention goes, so it's not possible to show long-term trends (as before, based on Susan's summary of data). Logs older than that are automatically deleted, as promised, for both sites -- forever! We just need a small tail of data (temporarily) for DDOS prevention. █
IN the coming days we will prioritise very recent news and of course important news, but at the same time we shall be catching up with some older but important news that we missed. This means that some older items (one or two weeks old) may occasionally appear. In lieu with requests from readers we will also stop abbreviating long summaries of news, such as today's leftovers and howto roundups. █
THIS COMING WEEK, starting Tuesday in particular, will be a lot less busy than usual because Rianne and I are flying away and will be absent for a couple of weeks. Depending on availability of Wi-Fi, we ought to be able to still post some links, just not the usual volume of links.
We kindly ask anyone who is interested and willing to submit links highlighting relevant news, as every registered user can do that. It will greatly help us run the site while we are very far away in east Asia. █
2014 was a great year for Tux Machines. The site moved to a new server with much higher capacity and better caching, Rianne and I moved to a better house, and we finally set up a tree the way we wanted to. Financial contributions from readers were enough to subsidise a laptop for Rianne and she now happily submits a lot of links from there.
In 2015 we expect to improve both volume and quality of links. We are going to think of ways to improve the Web site and we openly welcome suggestions from readers. The goal is to make the site more informative more efficiently. We wish to help readers steer away from cruft and gossip and instead identify news of importance, without repetition unless new information and details arise. █
TOMORROW is my birthday, so we are going away to Liverpool for a while. Over the holidays we won't be too active in this site, at the very least because there is no major news, no announcements of substance, and we also wish to spend some time with our extended family.
As always, anyone in Tux Machines can create an account and submit stories to the front page (as of late only spammers have been doing that almost every morning). We encourage readers to submit any links which they find relevant and of interest to the community. █
Fireworks continue to appear all over the place, even a day after Guy Fawkes Night. Yesterday the city was full of smoke (as though it is under heavy fog all around), but that is just an annually-recurring tradition. It's very bad for the environment, but hey, lots of people enjoy it.
Over 400 Guy Fawkes tried to destroy the House of Lords, just as some gang or a person has been trying now for nearly two months to keep Tux Machines offline. Thankfully, however, the attacks are not succeeding anymore because we have refined our defenses and the offending zombie PCs are being banned left and right (all day long). Surely the plot has been foiled. All we need now is effigies. █
TUX MACHINES HAS BEEN under attack for nearly two weeks now. We need not really comment on our technical means of defence and how we mostly overcome these attacks (we are not giving too many clues to the attackers, who are mostly deflected with blacklists and redirects for the time being), but for the most part the Web site continues to run and to serve visitors. That's what is important. We work hard to keep posting the latest news and not let distraction, aggravation or sabotage get in the way. It is hard to imagine who would want to attack a site like this. This site is not even political or controversial.
In more general news, Manchester has had a nice and warm September. It continues into October (so far). Today we started seeing some hybrid (partly electric) double-decker buses and today we also found out that the health club we always go to has been voted best in the north west and third-best in the UK for the second year in a row. We still post some news whilst out of the house (if a wireless connection becomes available) and this morning the weather was so fine that we managed to play some badminton outdoors.
Life goes on and no level of attacks on the site is going to stop it. There are many ways to combat DDOS attacks, so they are merely a nuisance. The attackers should know that they are only wasting their time; there are much better things to do in life. Those commandeering Microsoft Windows botnets would be better off targeting the KKK or something, not a GNU and Linux news site. █
There are rogue bots hammering on this site all day long. It has gone on for quite a few days and it is getting worse. The bots are getting harder to block. Strategies are changing. They are all acting like zombies/botnet and they all have a "Microsoft Windows" in their HTTP header.
The corporate media seems preoccupied with a bug in GNU Bash. It predicts gloom and doom, just as it did when there was a bug in OpenSSL that Microsoft partners dubbed "heartbleed" (although not so much actually happened in terms of damages).
Perhaps it is time to remind that media that Microsoft, with its back doors, is causing turbulence on the Web. Among the outcomes there are GNU/Linux Web sites that are brought down, with administrators who work around the clock trying to block Windows-running PCs from trying to take down their sites. █
Aggregators in Tux Machines have been universally disabled (temporarily we hope) after a week or so of heavy load that took the site down (well, over capacity and hence not accessible). The culprit seems to be mostly -- although not exclusively -- a bunch of bots that hammer on the aggregators with spammy requests. It's sad that so many hours need to be spent just keeping script kiddies out of the site, resulting in fewer bits of output, slower pageloads (performance degradation), and restlessness (monitoring alerts all day long), not to mention crafting of rules that merely keep the site running. Running Tux Machines is not quite as peaceful and trivial/simple as it may seem from the outside. It's like a full-time job, or at least it feels like it, especially whenever the site gets flooded by rogue bots, necessitating special attention 24/7. █
TODAY we have taken a bit of a break. It's Sunday after all. But here is a bit of a site status update.
The site's design has evolved a bit and it hopefully makes navigation a little better. SPAM is still a problem, but we do our best to keep it out of the sight of visitors. It's the result of a permissive policy that lets everyone publish a story, blog post, etc.
In terms of server load, we are still coping most of the time, but sometimes there's a flood of SPAM/rogue traffic that renders the server virtually unreachable. We use some ad hoc filters for to address this nuisance, but if we are away, then the site can be paralysed for a long time. We still need to find better solutions to that.
Thanks in advance for any feedback you may have and thanks for reading Tux Machines. █
THE WEATHER has been getting more pleasant and the news too is pleasant these days. Software patents are in a state of perpetual demise, Microsoft is dealing with its large-scale demise (layoffs also), FOSS is being adopted by very large nations (Russia and China are among them), the UK has adopted OpenDocument Format as the standard, and our family benefits from government migrations to FOSS (Rianne and I work through a FOSS specialist).
While it may seem like the FOSS world is quiet (judging by the volume of news), the truth of the matter is that FOSS professionals are busy migrating many systems from proprietary to FOSS. These people are committed to the cause not just with words but also with actions.
Tux Machines, realising that games for GNU/Linux are now a dozen a week (not literally), lumps together gaming news. Android, being a Linux-based platform with huge worldwide impact, receives frequent mentions. If anyone wishes to suggest other editorial priorities, please share with us in the comments. █
TODAY was the last day of the log rotation. The uncached requests to Apache (bypassing Varnish proxy) exceeded the record by a huge gap (around 20%) and nearly reached 300 megabytes.
It is reassuring and gratifying to know that our readers base is expanding each week and we welcome submissions (news, blogs, etc.), which can be automatically pushed to the front page by any subscriber. █
FIVE days ago TuxMachines turned 10 years old. Rianne and I were on holiday in Scotland at the time, but were still able to keep the site up to date, owing to a Wi-Fi connection which we had to work exceptionally hard for (an open Wi-Fi connection is hard to find in the UK, especially one that enables anonymous use).
Running the site requires a lot of dedication because in order to stay up-to-the-minute TuxMachines requires non-ending research/survey of news. It's truly life-changing, potentially affecting the first hours of the morning and the little hours of the night. Sometimes it affects holidays and every couple of days I browse through news and post links in-between sets at the gym. Both Rianne and I are very dedicated to the site.
Since this site keeps growing in size and in traffic (the past week saw traffic climbing 20% above the previous record) it's all worthwhile at the end, and we have no intention of slowing down. What's more, seeing how Linux expands in use (and clout) around the world assures us that efforts to popularise GNU/Linux are succeeding. █
SEVERAL days ago we visited Trafford Centre, which is a large shopping mall in Greater Manchester. The place is quite nice as it embodies very modern (yet classic) ornamental features, encompassing the best of outdoor and indoor decorations. It's all geared up towards consumerism, but there is also a nice cinema there. Now, here's the deal. Upon entering the mall one cannot help noticing that there is strong, universal Wi-Fi signal. Let's leave aside health implications. It's the same in other malls, such as the Arndale Centre near our house. It is also the same at airports, but if there is no payment needed for the Wi-Fi, then the user's identity is requested (if a payment is made, then the payment itself exposes the user's identity).
Following basic principles and common sense, I gave some fake details so that I can use the 'free' Wi-Fi anonymously and log into Tux Machines (checking the latest), but I not help wondering, still. Given what we know about NSA- and GCHQ-centric plans for surveillance on in-flight Wi-Fi, what are the chances that users' identities are being requested not just for marketing purposes but also for surveillance? It is becoming very hard to access the Net anonymously now. The UK is cracking down on 'free' Wi-Fi, saying that it facilitates copyright infringement and our home hub, which is open for all to use (no password needed), keeps warning us that it is "not secure" (because it facilitates sharing). This is actively being discouraged if not forbidden. In all sorts of beverage-serving places (hot or cold, or alcoholic) and restaurants it is getting hard to gain anonymous Wi-FI access and the only way I've found (out of curiosity) to attain anonymous Wi-Fi use is First Class in high-speed British rail, provided one purchases the train ticket with cash. Similarly, it is getting harder to purchase groceries with cash here, at least without being penalised (not receiving a discount in exchange for identifying cards like Nectar). It sure seems like the very idea of anonymity here is becoming synonymous with crime. For experimental reasons I researched which shops in the UK still enable people to purchase a mobile phone anonymously. It's not easy, but it is still possible. Maybe it's no longer possible because I haven't surveyed the shops in almost 3 years.
We are entering a new unprecedented norm as those in power gradually phase in scary forms of governance in society, where the assumption is that anonymity deserves to be maligned and people should always identify themselves everywhere (also enable tracking of themselves by carrying a mobile phone) so as to avoid looking "suspicious". That's the mentality of mass surveillance that people have become accustomed to (and rather apathetic towards) in the UK.
It's stuff like this that made me exceptionally stubborn about deleting server logs in Tux Machines and not connecting to any third-party entity (e.g. with interactive social buttons, cookies), unlike most other GNU/Linux/FOSS sites. █