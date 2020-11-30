Kernel: LWN Articles (Paywall Expired), Alexander Popov, and Phoronix on Bcachefs and VPU
A realtime developer's checklist
We want all applications to be correct and bug-free, Ogness began; in the realtime domain, correctness "means running at the correct time". The application must wake up within a bounded time limit when there is time-critical work to do. Ogness highlighted that, in realtime systems, the right timing of tasks is a requirement; things will go wrong if the constraints are not met. Developers need to define which tasks and applications are time-critical; he noted that a lot of people mistakenly think that all tasks in a realtime system are realtime, while most of them are not.
The good news for developers is that, under Linux, they can write realtime applications using only the POSIX API with the realtime extensions. The code will look familiar, and only three additional header files are required: sched.h (a member of the audience noted that the musl C library does not implement this one), time.h, and pthread.h.
There are three properties that a realtime operating system must have: deterministic scheduling behavior, interruptibility (the CPU is always running something, so there should be a way to interrupt a task), and a way to avoid priority inversion, which happens when a high-priority task must wait for a lower-priority one. The third property, which might be less familiar to non-realtime developers, was described with an example.
iproute2 and libbpf: vendoring on the small scale
LWN's recent article on Kubernetes in Debian discussed the challenges of packaging a massive project with hundreds of dependencies. Many of the issues that arose there, however, are not limited to such projects, as can be seen in the ongoing discussion about whether a copy of the relatively small libbpf library should be shipped with the iproute2 collection of networking tools. Fast-moving projects, it would seem, continue to feel limited by the restrictions imposed by the Linux distribution model.
Iproute2 is a collection of network-configuration tools found on almost any Linux system; it includes utilities like arpd, ip (the command old-timers guiltily think they should be using when they type ifconfig), ss, and tc. That last command, in particular, is used to configure the traffic-control subsystem, which allows administrators to manage and prioritize the flow of network traffic through their systems. This subsystem has, for some years, had the capability to load and run BPF programs to both classify packets and make decisions on how to queue them. This mechanism gives administrators a great deal of flexibility in the management of network traffic.
The code for handling BPF programs within iproute2 is old, though, and lacks support for many of the features that have been added to BPF in the last few years. Since that code was written, the BPF community has developed libbpf (which lives in the kernel source tree) as the preferred way to work with BPF programs and load them into the kernel. This is not a small task; libbpf must interpret the instructions encoded as special ELF sections in an executable BPF program and make the necessary calls to the sprawling bpf() system call. This work can include creating maps, "relocating" structure offsets to match the configuration of the running kernel, loading programs, and attaching those programs to the appropriate hooks within the kernel. Libbpf has grown quickly, along with the rest of the BPF ecosystem.
Systemd catches up with bind events
The kernel project has a strong focus on not breaking user-space applications; if something works with a given kernel release, it should continue to work with subsequent releases. So it may be discouraging to read the lengthy exposition on an apparent user-space API break in the announcement for the systemd 247-rc2 release. Changes to udev configuration files will be needed to keep systems working, but the systemd project claims that it "is not [the] fault of systemd or udev, but caused by an incompatible kernel change that happened back in Linux 4.12". It seems like an appropriate time to look at what happened, how administrators need to respond, and whether anything can be done to avoid this kind of thing from happening again.
Modern computers tend to be highly dynamic, with devices (of both the physical and virtual variety) appearing and disappearing while the system is running. The kernel handles the low-level details with regard to these device events, but it is up to user space to take care of the rest. For that to happen, user space needs to know when something has changed with the system's configuration.
To that end, events are emitted to user space from deep within the kernel's driver-core subsystem whenever something changes; for example, plugging in a USB device will result in the creation of one or more ADD events to tell user space that the new device is available. The udev daemon is charged with responding to these events according to a set of rules; it can create device nodes, set permissions, notify other user-space components, and more, all in response to properties attached to events by matching rules. The set of possible events is relatively small and does not change often.
Changed-block tracking and differential backups in QEMU
The block layer of QEMU, the open-source machine emulator and virtualizer, forms the backbone of many storage virtualization features: the QEMU Copy-On-Write (QCOW2) disk-image file format, disk image chains, point-in-time snapshots, backups, and more. At the recently concluded 2020 KVM Forum virtual event, Eric Blake gave a talk on the current work in QEMU and libvirt to make differential backups more powerful. As the name implies, "differential backups" address the efficiency problems of full disk backups: space usage and speed of backup creation.
There's also the similar-sounding term, "incremental backups". The difference is that differential tracks what has changed since any given backup, while incremental tracks changes only since the last backup. Incremental backups are a subset of differential backups, but both are often lumped under the "incremental backups" term. This article will stick to "differential" as the broader term.
With differential backups, one of the two endpoints when creating backups is always the current point in time. In other words, it is not like Git, where, if the latest version of a file is, say, v4, you can still diff between v2 and v3 — with differential backups, one of the two diff points is always v4, the current point in time.
QEMU has had block-layer primitives to support full backups for some time; these were most commonly used for live-migrating the entire storage of a virtual machine, or for point-in-time snapshots. But over the past couple of years, QEMU and libvirt have picked up steam toward the goal of making differential backups a first-class feature that is enabled by default.
Linux kernel heap quarantine versus use-after-free exploits | Alexander Popov
It's 2020. Quarantines are everywhere – and here I'm writing about one, too. But this quarantine is of a different kind.
In this article I'll describe the Linux Kernel Heap Quarantine that I developed for mitigating kernel use-after-free exploitation. I will also summarize the discussion about the prototype of this security feature on the Linux Kernel Mailing List (LKML).
Bcachefs Going Through Period Of More Performance Optimizations - Phoronix
Bcachefs was sent out for another round of review at the end of October. While it doesn't look like this file-system born out of Linux's block cache code will be mainlined in the immediate near future, it's still on a nice trajectory.
The October post to the Linux kernel mailing list outlined all of the current features and those recently completed like erasure coding, inline data extents, and more -- plus many bug fixes.
Intel Begins Upstreaming Work For Their Vision Processing Unit On Linux
While Intel engineers over the course of the year began upstreaming various elements of the Keem Bay SoC support, the actual Vision Processing Unit (VPU) enabling hasn't been sent out for review until now. Intel has sent out their initial patches for bringing up the Vision Processing Unit on the open-source Linux kernel.
Intel's Visual Processing Unit that is new to the Keem Bay SoC can be used as a standalone SoC or as a PCI Express vision processing accelerator The Linux kernel work is ultimately about fulfilling both possible uses.
IBM/Red Hat Leftovers
What Is the Best Linux Distro for Laptops?
Let's start with those aging, venerable machines: your old laptop. Linux carries a strong reputation for breathing life into old hardware, and Lubuntu is one of the best options. Lubuntu, as you might guess from the name, is an Ubuntu derivative. It uses a different desktop environment from Ubuntu, opting for the more lightweight and less resource-intensive LXDE desktop instead of GNOME. The result is a lightweight Linux distro that will run nicely on an older laptop. Lubuntu requires a minimum of 1GB RAM for "advanced internet services" such as YouTube and Facebook, while just 512MB RAM will suffice for basic operations such as LibreOffice and basic web browsing. In terms of CPU, you'll need at least an Intel Pentium 4 or Pentium M, or an AMD K8.
Don't Panic: Kubernetes and Docker
Docker as an underlying runtime is being deprecated in favor of runtimes that use the Container Runtime Interface(CRI) created for Kubernetes. Docker-produced images will continue to work in your cluster with all runtimes, as they always have. If you’re an end-user of Kubernetes, not a whole lot will be changing for you. This doesn’t mean the death of Docker, and it doesn’t mean you can’t, or shouldn’t, use Docker as a development tool anymore. Docker is still a useful tool for building containers, and the images that result from running docker build can still run in your Kubernetes cluster. If you’re using a managed Kubernetes service like GKE or EKS, you will need to make sure your worker nodes are using a supported container runtime before Docker support is removed in a future version of Kubernetes. If you have node customizations you may need to update them based on your environment and runtime requirements. Please work with your service provider to ensure proper upgrade testing and planning. If you’re rolling your own clusters, you will also need to make changes to avoid your clusters breaking. At v1.20, you will get a deprecation warning for Docker. When Docker runtime support is removed in a future release (currently planned for the 1.23 release in late 2021) of Kubernetes it will no longer be supported and you will need to switch to one of the other compliant container runtimes, like containerd or CRI-O. Just make sure that the runtime you choose supports the docker daemon configurations you currently use (e.g. logging).
International Day Against DRM (IDAD) is almost here: Stand with us on Dec. 4
Although many of us are in quarantine, that doesn't mean that we have to cease our fight against Digital Restrictions Management (DRM). The International Day Against DRM (IDAD) is just two days away, and we're here to let you know how we can all stand up this Friday, December 4th, against the latest encroachments from one of DRM's major players: Netflix. As pandemic response measures all over the world forced so many people to stay home, we've seen a corresponding and dangerous increase in dependence on streaming media for entertainment. Streaming media has gone from an ethically problematic pastime to being a playground for dystopia. In a world where media is served over ephemeral streaming, these services can delete things from history, or rewrite them, sometimes without any announcement. Besides deciding what people can and can't view with their service, corporations like Netflix also dictate what can and can't be made, now that they're one of the heavyweights in television and film production and distribution. This rise in control is in part due to their constant mistreatment of their subscribers, having used DRM to prevent legitimate uses of their media and dictate which devices can play it. December 4th marks the start of Netflix's "StreamFest" initiative in certain countries -- letting users have a taste of the poison apple before they commit to taking the bite. It's at times like these that we as a community need to step up and say that enough is enough, and let them know that DRM is unacceptable no matter where it appears or how it's being used. We may not be meeting in person, but that doesn't mean we can't come together and let our voices be heard. We hope you'll join us in this year's IDAD by following one or more of the suggestions we've provided below.
