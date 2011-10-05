Kernel: LWN's Latest and IO_uring Patches
-
Resource limits in user namespaces
User namespaces provide a number of interesting challenges for the kernel. They give a user the illusion of owning the system, but must still operate within the restrictions that apply outside of the namespace. Resource limits represent one type of restriction that, it seems, is proving too restrictive for some users. This patch set from Alexey Gladkov attempts to address the problem by way of a not-entirely-obvious approach.
Consider the following use case, as stated in the patch series. Some user wants to run a service that is known not to fork within a container. As a way of constraining that service, the user sets the resource limit for the number of processes to one, explicitly preventing the process from forking. That limit is global, though, so if this user tries to run two containers with that service, the second one will exceed the limit and fail to start. As a result, our user becomes depressed and considers a career change to goat farming.
Clearly, what is needed is a way to make at least some resource limits apply on per-container basis; then each container could run its service with the process limit set to one and everybody will be happy (except perhaps the goats).
-
Fast commits for ext4
The Linux 5.10 release included a change that is expected to significantly increase the performance of the ext4 filesystem; it goes by the name "fast commits" and introduces a new, lighter-weight journaling method. Let us look into how the feature works, who can benefit from it, and when its use may be appropriate.
Ext4 is a journaling filesystem, designed to ensure that filesystem structures appear consistent on disk at all times. A single filesystem operation (from the user's point of view) may require multiple changes in the filesystem, which will only be coherent after all of those changes are present on the disk. If a power failure or a system crash happens in the middle of those operations, corruption of the data and filesystem structure (including unrelated files) is possible. Journaling prevents corruption by maintaining a log of transactions in a separate journal on disk. In case of a power failure, the recovery procedure can replay the journal and restore the filesystem to a consistent state.
The ext4 journal includes the metadata changes associated with an operation, but not necessarily the related data changes. Mount options can be used to select one of three journaling modes, as described in the ext4 kernel documentation. data=ordered, the default, causes ext4 to write all data before committing the associated metadata to the journal. It does not put the data itself into the journal. The data=journal option, instead, causes all data to be written to the journal before it is put into the main filesystem; as a side effect, it disables delayed allocation and direct-I/O support. Finally, data=writeback relaxes the constraints, allowing data to be written to the filesystem after the metadata has been committed to the journal.
Another important ext4 feature is delayed allocation, where the filesystem defers the allocation of blocks on disk for data written by applications until that data is actually written to disk. The idea is to wait until the application finishes its operations on the file, then allocate the actual number of data blocks needed on the disk at once. This optimization limits unneeded operations related to short-lived, small files, batches large writes, and helps ensure that data space is allocated contiguously. On the other hand, the writing of data to disk might be delayed (with the default settings) by a minute or so. In the default data=ordered mode, where the journal entry is written only after flushing all pending data, delayed allocation might thus delay the writing of the journal. To assure data is actually written to disk, applications use the fsync() or fdatasync() system calls, causing the data (and the journal) to be written immediately.
-
MAINTAINERS truth and fiction
Since the release of the 5.5 kernel in January 2020, there have been almost 87,000 patches from just short of 4,600 developers merged into the mainline repository. Reviewing all of those patches would be a tall order for even the most prolific of kernel developers, so decisions on patch acceptance are delegated to a long list of subsystem maintainers, each of whom takes partial or full responsibility for a specific portion of the kernel. These maintainers are documented in a file called, surprisingly, MAINTAINERS. But the MAINTAINERS file, too, must be maintained; how well does it reflect reality?
The MAINTAINERS file doesn't exist just to give credit to maintainers; developers make use of it to know where to send patches. The get_maintainer.pl script automates this process by looking at the files modified by a patch and generating a list of email addresses to send it to. Given that misinformation in this file can send patches astray, one would expect it to be kept up-to-date. Recently, your editor received a suggestion from Jakub Kicinski that there may be insights to be gleaned from comparing MAINTAINERS entries against activity in the real world. A bit of Python bashing later, a new analysis script was born.
-
Experimental Patches Allow For New Ioctls To Be Built Over IO_uring
IO_uring continues to be one of the most exciting technical innovations in the Linux kernel in recent years not only for more performant I/O but also opening up other doors for new Linux innovations. IO_uring has continued adding features since being mainlined in 2019 and now the newest proposed feature is the ability to build new ioctls / kernel interfaces atop IO_uring.
The idea of supporting kernel ioctls over IO_uring has been brought up in the past and today lead IO_uring developer Jens Axboe sent out his initial patches. These initial patches are considered experimental and sent out as "request for comments" - they provide the infrastructure to provide a file private command type with IO_uring handling the passing of the arbitrary data.
-
Installing Debian on modern hardware
It is an unfortunate fact of life that non-free firmware blobs are required to use some hardware, such as network devices (WiFi in particular), audio peripherals, and video cards. Beyond that, those blobs may even be required in order to install a Linux distribution, so an installation over the network may need to get non-free firmware directly from the installation media. That, as might be guessed, is a bit of a problem for distributions that are not willing to officially ship said firmware because of its non-free status, as a recent discussion in the Debian community shows. Surely Dan Pal did not expect the torrent of responses he received to his short note to the debian-devel mailing list about problems he encountered trying to install Debian. He wanted to install the distribution on a laptop that was running Windows 10, but could not use the normal network installation mechanism because the WiFi device required non-free firmware. He tracked down the DVD version of the distribution and installed that, but worried that Debian is shooting itself in the foot by not prominently offering more installation options: "The current policy of hiding other versions of Debian is limiting the adoption of your OS by people like me who are interested in moving from Windows 10." The front page at debian.org currently has a prominent "Download" button that starts to retrieve a network install ("netinst") CD image when clicked. But that image will not be terribly useful for systems that need non-free firmware to make the network adapter work. Worse yet, it is "impossible to find" a working netinst image with non-free firmware, Sven Joachim said, though he was overstating things a bit. Alexis Murzeau suggested adding a link under the big download button that would lead users to alternate images containing non-free firmware. He also pointed out that there are two open bugs (one from 2010 and another from 2016) that are related, so the problem is hardly a new one.
Wayland 1.19 Released With Small Protocol Updates, Fixes
Wayland 1.18 released back in February 2020 while now nearly one year later it's been succeeded by Wayland 1.19. Even with one year passing, Wayland 1.19 is a very minor update over Wayland 1.18. That's part of the reason why they moved off timed releases in the first place was the core Wayland code and protocol being quite stable at this point: there is very little change. Most of the work remaining to get Wayland ready for production use across all workloads is on the compositor side with KDE Plasma's KWin seeing improvements, GNOME Shell + Mutter being in very good shape, etc. There is also the driver obstacle of the NVIDIA proprietary driver support at the moment not being ideal but improvements are pending there. That is all outside of the core Wayland code itself that is the protocol and key libraries.
Android Leftovers
Stable Kernels: 5.10.11, 5.4.93, and 4.19.171
I'm announcing the release of the 5.10.11 kernel. All users of the 5.10 kernel series must upgrade. The updated 5.10.y git tree can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.10.y and can be browsed at the normal kernel.org git web browser: https://git.kernel.org/?p=linux/kernel/git/stable/linux-s... thanks, greg k-hAlso: Linux 5.4.93 Linux 4.19.171
