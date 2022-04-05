LWN's Latest (Outside Paywall) Kernel Articles
-
Printbuf rebuffed for now [LWN.net]
There is a long and growing list of options for getting information out of the kernel but, in the real world, print statements still tend to be the tool of choice. The kernel's printk() function often comes up short, despite the fact that it provides a set of kernel-specific features, so there has, for some time, been interest in better APIs for textual output from the kernel. The "printbuf" proposal from Kent Overstreet is one step in that direction, but will need some changes to make it work well with features the kernel already has.
A call to printk() works well when kernel code needs to output a simple line of text. It is not as convenient when there is a need for complex formatting or when multiple lines of output must be generated. It is possible to use multiple printk() calls for even a single line of text, just as it is with printf() in user space, but there is a problem: the kernel is a highly concurrent environment, and anything can happen between successive printk() calls, including printk() calls from other contexts. That results in intermixed output, often described with technical terms like "garbled", that can be painful to make sense of.
-
The BPF allocator runs into trouble [LWN.net]
One of the changes merged for the 5.18 kernel was a specialized memory allocator for BPF programs that have been loaded into the kernel. Since then, though, this feature has run into a fair amount of turbulence and will almost certainly be disabled in the final 5.18 release. This outcome is partly a result of bugs in the allocator itself, but this work also had the bad luck to trip some older and deeper bugs within the kernel's memory-management subsystem.
In current kernels, memory space for BPF programs (after JIT translation) is allocated using the same code that allocates space for loadable kernel modules; this would seem to make sense since, in either case, that space will be used for executable code that runs within the kernel. But there is a key difference between those two use cases. Kernel modules are relatively static; they are almost never removed once they have been loaded. BPF programs, instead, can come and go frequently; there can be thousands of loading and unloading events over the life of the system.
That difference turns out to be important. Memory for executable code must, unsurprisingly, have execute permissions set and thus, must also be read-only. That requires this memory to have its own mapping in the page tables, meaning that it must be split out of the kernel's (huge-page) direct mapping. That breaks up the direct map into smaller pages. Over time, this has the effect of fragmenting the direct map, which can affect performance measurably. The main goal for the BPF allocator was to segregate these allocations into a set of dedicated, huge pages and avoid this fragmentation.
Shortly after this code was merged, though, the regression reports, along with more general expressions of concern, started to roll in. That drew the attention of Linus Torvalds and other developers, and revealed a series of problems. While some of those problems were in the BPF allocator itself, the most disruptive issue come down to an older change made in an entirely different subsystem: the vmalloc() allocator.
-
NUMA rebalancing on tiered-memory systems [LWN.net]
The classic NUMA architecture is built around nodes, each of which contains a set of CPUs and some local memory; all nodes are more-or-less equal. Recently, though, "tiered-memory" NUMA systems have begun to appear; these include CPU-less nodes that contain persistent memory rather than (faster, but more expensive) DRAM. One possible use for that memory is to hold less-frequently-used pages rather than forcing them out to a backing-store device. There is an interesting problem that emerges from this use case, though: how does the kernel manage the movement of pages between faster and slower memory? Several recent patch sets have taken differing approaches to the problem of rebalancing memory on these systems.
-
The 2022 Linux Storage, Filesystem, Memory-Management, and BPF Summit [LWN.net]
The Linux Storage, Filesystem, Memory-Management, and BPF Summit (LSFMM) has long been one of the key events for many core kernel developers. The last LSFMM event, though, was held in the innocent, pre-pandemic days of early 2019. After three years, it finally was possible to hold an in-person gathering at beginning of May 2022 in Palm Springs, California, USA. As usual, LWN was there.
-
A memory-folio update [LWN.net]
The folio project is not yet two years old, but it has already resulted in significant changes to the kernel's memory-management and filesystem layers. While much work has been done, quite a bit remains. In the opening plenary session at the 2022 Linux Storage, Filesystem, Memory-management and BPF Summit, Matthew Wilcox provided an update on the folio transition and led a discussion on the work that remains to be done.
Wilcox began with an overview of the folio work, a more complete description of which can be found in the above-linked article. In short, a folio is a way of representing a set of physically contiguous base pages. It is a response to a longstanding confusion in the memory-management subsystem, wherein a "page" can refer either to a base page or a larger compound page. Adding a new term disambiguates the term "page" and simplifies many memory-management interfaces.
Beyond terminology, there is another motivation for the folio work. The kernel really needs to manage memory in larger chunks than 4KB base pages. There are millions of those pages even on a typical laptop; that is a lot of pages to manage and a pain to deal with in general, causing the waste of a lot of time and energy. Better interfaces are needed to facilitate management of larger units, though; folios are meant to be that better interface.
-
