Kernel: UEFI, LWN on Linux (Paywall Lapse) and BPF
Why you want a Linux bootloader even on UEFI systems
So while you can directly load Linux through UEFI with the kernel's EFISTUB support (also), the result is not up to the standards that people want and expect from a Linux boot environment, at least on servers. It might be okay for a Linux machine with a single disk where the user will only ever boot the most recent kernel or the second most recent one (in case the most recent one doesn't work), and if that doesn't work they'll boot from some alternate Linux live media or recovery media.
Avoiding retpolines with static calls
January 2018 was a sad time in the kernel community. The Meltdown and Spectre vulnerabilities had finally been disclosed, and the required workarounds hurt kernel performance in a number of ways. One of those workarounds — retpolines — continues to cause pain, with developers going out of their way to avoid indirect calls, since they must now be implemented with retpolines. In some cases, though, there may be a way to avoid retpolines and regain much of the lost performance; after a long gestation period, the "static calls" mechanism may finally be nearing the point where it can be merged upstream.
Indirect calls happen when the address of a function to be called is not known at compile time; instead, that address is stored in a pointer variable and used at run time. These indirect calls, as it turns out, are readily exploited by speculative-execution attacks. Retpolines defeat these attacks by turning an indirect call into a rather more complex (and expensive) code sequence that cannot be executed speculatively.
Retpolines solved the problem, but they also slow down the kernel, so developers have been keenly interested in finding ways to avoid them. A number of approaches have been tried; a few of which were covered here in late 2018. While some of those techniques have been merged, static calls have remained outside of the mainline. They have recently returned in the form of this patch set posted by Peter Zijlstra; it contains the work of others as well, in particular Josh Poimboeuf, who posted the original static-call implementation.
An indirect call works from a location in writable memory where the destination of the jump can be found. Changing the destination of the call is a matter of storing a new address in that location. Static calls, instead, use a location in executable memory containing a jump instruction that points to the target function. Actually executing a static call requires "calling" to this special location, which will immediately jump to the real target. The static-call location is, in other words, a classic code trampoline. Since both jumps are direct — the target address is found directly in the executable code itself — no retpolines are needed and execution is fast.
Per-system-call kernel-stack offset randomization
In recent years, the kernel has (finally) upped its game when it comes to hardening. It is rather harder to compromise a running kernel than it used to be. But "rather harder" is relative: attackers still manage to find ways to exploit kernel bugs. One piece of information that can be helpful to attackers is the location of the kernel stack; this patch set from Kees Cook and Elena Reshetova may soon make that information harder to come by and nearly useless in any case.
The kernel stack will always be an attractive target. It typically contains no end of useful information that can be used, for example, to find the location of other kernel data structures. If it can be written to, it can be used for return-oriented programming attacks. Many exploits seen in the wild (Cook mentioned this video4linux exploit as an example) depend on locating the kernel stack as part of the sequence of steps to take over a running system.
In current kernels, the kernel stack is allocated from the vmalloc() area at process creation time. Among other things, this approach makes the location of any given process's kernel stack hard to guess, since it depends on the state of the memory allocator at the time of its creation. Once the stack has been allocated, though, its location remains fixed for as long as the process runs. So if an attacker can figure out where the kernel stack for a target process is, that information can be used for as long as that process lives.
As it turns out, there are a number of ways for an attacker to do that. Despite extensive cleanup work, there are still numerous kernel messages that will expose addresses of data structures, including the stack, in the kernel log. There are also attacks using ptrace() and cache timing that can be used to locate the stack. So the protection offered by an uncertain stack location is not as strong as one might like it to be.
Some 5.6 kernel development statistics
When the 5.6 kernel was released on March 29, 12,665 non-merge changesets had been accepted from 1,712 developers, making this a fairly typical development cycle in a number of ways. As per longstanding LWN tradition, what follows is a look at where those changesets came from and who supported the work that created them. This may have been an ordinary cycle, but there are still a couple of differences worth noting.
As Linus Torvalds pointed out in the release announcement, the current coronavirus pandemic does not appear to have seriously affected kernel development — so far. One should not, though, lose track of the fact that the 5.6 merge window closed in early February, well before the impact of this disaster was broadly felt outside of China. Most of the work merged for 5.6 was done even earlier, of course. Given the delays involved in getting work into the mainline, the full effect may not be felt until the 5.8 cycle.
It goes without saying that we hope those effects are minimal, and that the people in our community (and beyond) come through this experience as well as possible.
Of the developers working on 5.6, 214 were first-time contributors. Many projects would be delighted to have that many new contributors in a nine-week period, but that is low for the kernel — the lowest since 3.11, which featured 203 first-time contributors and was released in September 2013.
eBPF - Rethinking the Linux Kernel
Thomas Graf talks about how companies like Facebook and Google use BPF to patch 0-day exploits, how BPF will change the way features are added to the kernel forever, and how BPF is introducing a new type of application deployment method for the Linux kernel.
