Coverage From 2018 Linux Storage, Filesystem, and Memory-Management Summit (LSFMM)
-
Is it time to remove ZONE_DMA?
The DMA zone (ZONE_DMA) is a memory-management holdover from the distant past. Once upon a time, many devices (those on the ISA bus in particular) could only use 24 bits for DMA addresses, and were thus limited to the bottom 16MB of memory. Such devices are hard to find on contemporary computers. Luis Rodriguez scheduled the last memory-management-track session of the 2018 Linux Storage, Filesystem, and Memory-Management Summit to discuss whether the time has come to remove ZONE_DMA altogether.
-
Zone-lock and mmap_sem scalability
The memory-management subsystem is a central point that handles all of the system's memory, so it is naturally subject to scalability problems as systems grow larger. Two sessions during the memory-management track of the 2018 Linux Storage, Filesystem, and Memory-Management Summit looked at specific contention points: the zone locks and the mmap_sem semaphore.
-
Hotplugging and poisoning
Memory hotplugging is one of the least-loved areas of the memory-management subsystem; there are many use cases for it, but nobody has taken ownership of it. A similar situation exists for hardware page poisoning, a somewhat neglected mechanism for dealing with memory errors. At the 2018 Linux Storage, Filesystem, and Memory-Management summit, Michal Hocko and Mike Kravetz dedicated a pair of brief memory-management track sessions to problems that have been encountered in these subsystems, one of which seems more likely to get the attention it needs than the other.
-
Reworking page-table traversal
A system's page tables are organized into a tree that is as many as five levels deep. In many ways those levels are all similar, but the kernel treats them all as being different, with the result that page-table manipulations include a fair amount of repetitive code. During the memory-management track of the 2018 Linux Storage, Filesystem, and Memory-Management Summit, Kirill Shutemov proposed reworking how page tables are maintained. The idea was popular, but the implementation is likely to be tricky.
-
get_user_pages() continued
At a plenary session held relatively early during the 2018 Linux Storage, Filesystem, and Memory-Management Summit, the developers discussed a number of problems with the kernel's get_user_pages() interface. During the waning hours of LSFMM, a tired (but dedicated) set of developers convened again in the memory-management track to continue the discussion and try to push it toward a real solution.
Jan Kara and Dan Williams scheduled the session to try to settle on a way to deal with the issues associated with get_user_pages() — in particular, the fact that code that has pinned pages in this way can modify those pages in ways that will surprise other users, such as filesystems. During the first session, Jérôme Glisse had suggested using the MMU notifier mechanism as a way to solve these problems. Rather than pin pages with get_user_pages(), kernel code could leave the pages unpinned and respond to notifications when the status of those pages changes. Kara said he had thought about the idea, and it seemed to make some sense.
-
XFS parent pointers
At the 2018 Linux Storage, Filesystem, and Memory-Management Summit (LSFMM), Allison Henderson led a session to discuss an XFS feature she has been working on: parent pointers. These would be pointers stored in extended attributes (xattrs) that would allow various tools to reconstruct the path for a file from its inode. In XFS repair scenarios, that path will help with reconstruction as well as provide users with better information about where the problems lie.
-
Shared memory mappings for devices
In a short filesystem-only discussion at the 2018 Linux Storage, Filesystem, and Memory-Management Summit (LSFMM), Jérôme Glisse wanted to talk about some (more) changes to support GPUs, FPGAs, and RDMA devices. In other talks at LSFMM, he discussed changes to struct page in support of these kinds of devices, but here he was looking to discuss other changes to support mapping a device's memory into multiple processes. It should be noted that I had a hard time following the discussion in this session, so there may be significant gaps in what follows.
-
A new API for mounting filesystems
The mount() system call suffers from a number of different shortcomings that has led some to consider a different API. At last year's Linux Storage, Filesystem, and Memory-Management Summit (LSFMM), that someone was Miklos Szeredi, who led a session to discuss his ideas for a new filesystem mounting API. Since then, David Howells has been working with Szeredi and VFS maintainer Al Viro on this API; at the 2018 LSFMM, he presented that work.
He began by noting some of the downsides of the current mounting API. For one thing, you can pass a data page to the mount() call, but it is limited to a single page; if too many options are needed, or simply options with too many long parameters, they won't fit. The error messages and information on what went wrong could be better. There are also filesystems that have a bug where an invalid option will fail the mount() call but leave the superblock in an inconsistent state due to earlier options having been applied. Several in the audience were quick to note that both ext4 and XFS had fixed the latter bug along the way, though there may still be filesystems that have that behavior.
-
Controlling block-I/O latency
Chris Mason and Josef Bacik led a brief discussion on the block-I/O controller for control groups (cgroups) in the filesystem track at the 2018 Linux Storage, Filesystem, and Memory-Management Summit. Mostly they were just aiming to get feedback on the approach they have taken. They are trying to address the needs of their employer, Facebook, with regard to the latency of I/O operations.
Mason said that the goal is to strictly control the latency of block I/O operations, but that the filesystems themselves have priority inversions that make that difficult. For Btrfs and XFS, they have patches to tag the I/O requests, which mostly deals with the problem. They have changes for ext4 as well, but those are not quite working yet.
-
A mapping layer for filesystems
In a plenary session on the second day of the Linux Storage, Filesystem, and Memory-Management Summit (LSFMM), Dave Chinner described his ideas for a virtual block address-space layer. It would allow "space accounting to be shared and managed at various layers in the storage stack". One of the targets for this work is for filesystems on thin-provisioned devices, where the filesystem is larger than the storage devices holding it (and administrators are expected to add storage as needed); in current systems, running out of space causes huge problems for filesystems and users because the filesystem cannot communicate that error in a usable fashion.
His talk is not about block devices, he said; it is about a layer that provides a managed logical-block address (LBA) space. It will allow user space to make fallocate() calls that truly reserve the space requested. Currently, a filesystem will tell a caller that the space was reserved even though the underlying block device may not actually have that space (or won't when user space goes to use it), as in a thin-provisioned scenario. He also said that he would not be talking about his ideas for a snapshottable subvolume for XFS that was the subject of his talk at linux.conf.au 2018.
- Login or register to post comments
- Printer-friendly version
- 2868 reads
- PDF version
More in Tux Machines
- Highlights
- Front Page
- Latest Headlines
- Archive
- Recent comments
- All-Time Popular Stories
- Hot Topics
- New Members
digiKam 7.7.0 is releasedAfter three months of active maintenance and another bug triage, the digiKam team is proud to present version 7.7.0 of its open source digital photo manager. See below the list of most important features coming with this release. |
Dilution and Misuse of the "Linux" Brand
|
Samsung, Red Hat to Work on Linux Drivers for Future TechThe metaverse is expected to uproot system design as we know it, and Samsung is one of many hardware vendors re-imagining data center infrastructure in preparation for a parallel 3D world. Samsung is working on new memory technologies that provide faster bandwidth inside hardware for data to travel between CPUs, storage and other computing resources. The company also announced it was partnering with Red Hat to ensure these technologies have Linux compatibility. |
today's howtos
|
Recent comments
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago