Language Selection

English French German Italian Portuguese Spanish

Linux kernel coverage at LWN (now outside the paywall)

Filed under
  • Flash storage topics

    At the 2018 Linux Storage, Filesystem, and Memory-Management Summit (LSFMM), Jaegeuk Kim described some current issues for flash storage, especially with regard to Android. Kim is the F2FS developer and maintainer, and the filesystem-track session was ostensibly about that filesystem. In the end, though, the talk did not focus on F2FS and instead ranged over a number of problem areas for Android flash storage.

    He started by noting that Universal Flash Storage (UFS) devices have high read/write speeds, but can also have high latency for some operations. For example, ext4 will issue a discard command but a UFS device might take ten seconds to process it. That leads the user to think that Android is broken, he said.

  • The ZUFS zero-copy filesystem

    At the 2018 Linux Storage, Filesystem, and Memory-Management Summit (LSFMM), Boaz Harrosh presented his zero-copy user-mode filesystem (ZUFS). It is both a filesystem in its own right and a framework similar to FUSE for implementing filesystems in user space. It is geared toward extremely low latency and high performance, particularly for systems using persistent memory.

    Harrosh began by saying that the idea behind his talk is to hopefully entice others into helping out with ZUFS. There are lots of "big iron machines" these days, some with extremely fast I/O paths (e.g. NVMe over fabrics with throughput higher than memory). "For some reason" there may be a need to run a filesystem in user space but the current interface is slow because "everyone is copy happy", he said.

  • A filesystem "change journal" and other topics

    At the 2017 Linux Storage, Filesystem, and Memory-Management Summit (LSFMM), Amir Goldstein presented his work on adding a superblock watch mechanism to provide a scalable way to notify applications of changes in a filesystem. At the 2018 edition of LSFMM, he was back to discuss adding NTFS-like change journals to the kernel in support of backup solutions of various sorts. As a second topic for the session, he also wanted to discuss doing more performance-regression testing for filesystems.

    Goldstein said he is working on getting the superblock watch feature merged. It works well and is used in production by his employer, CTERA Networks, but there is a need to get information about filesystem changes even after a crash. Jan Kara suggested that what was wanted was an indication of which files had changed since the last time the filesystem changes were queried; Goldstein agreed.

  • Will staging lose its Lustre?

    The kernel's staging tree is meant to be a path by which substandard code can attract increased developer attention, be improved, and eventually find its way into the mainline kernel. Not every module graduates from staging; some are simply removed after it becomes clear that nobody cares about them. It is rare, though, for a project that is actively developed and widely used to be removed from the staging tree, but that may be about to happen with the Lustre filesystem.

    The staging tree was created almost exactly ten years ago as a response to the ongoing problem of out-of-tree drivers that had many users but which lacked the code quality to get into the kernel. By giving such code a toehold, it was hoped, the staging tree would help it to mature more quickly; in the process, it would also provide a relatively safe place for aspiring kernel developers to get their hands dirty fixing up the code. By some measures, staging has been a great success: it has seen nearly 50,000 commits contributed by a large community of developers, and a number of drivers have, indeed, shaped up and moved into the mainline. The "ccree" TrustZone CryptoCell driver graduated from staging in 4.17, for example, and the visorbus driver moved to the mainline in 4.16.

  • Statistics from the 4.17 kernel development cycle

    The 4.17 kernel appears to be on track for a June 3 release, barring an unlikely last-minute surprise. So the time has come for the usual look at some development statistics for this cycle. While 4.17 is a normal cycle for the most part, it does have one characteristic of note: it is the third kernel release ever to be smaller (in terms of lines of code) than its predecessor.

    The 4.17 kernel, as of just after 4.17-rc7, has brought in 13,453 non-merge changesets from 1,696 developers. Of those developers, 256 made their first contribution to the kernel in this cycle; that is the smallest number of first-time developers since 4.8 (which had 237). The changeset count is nearly equal to 4.16 (which had 13,630), but the developer count is down from the 1,774 seen in the previous cycle.

  • Deferring seccomp decisions to user space

    There has been a lot of work in recent years to use BPF to push policy decisions into the kernel. But sometimes, it seems, what is really wanted is a way for a BPF program to punt a decision back to user space. That is the objective behind this patch set giving the secure computing (seccomp) mechanism a way to pass complex decisions to a user-space helper program.

    Seccomp, in its most flexible mode, allows user space to load a BPF program (still "classic" BPF, not the newer "extended" BPF) that has the opportunity to review every system call made by the controlled process. This program can choose to allow a call to proceed, or it can intervene by forcing a failure return or the immediate death of the process. These seccomp filters are known to be challenging to write for a number of reasons, even when the desired policy is simple.

    Tycho Andersen, the author of the "seccomp trap to user space" patch set, sees a number of situations where the current mechanism falls short. His scenarios include allowing a container to load modules, create device nodes, or mount filesystems — with rigid controls applied. For example, creation of a /dev/null device would be allowed, but new block devices (or almost anything else) would not. Policies to allow this kind of action can be complex and site-specific; they are not something that would be easily implemented in a BPF program. But it might be possible to write something in user space that could handle decisions like these.