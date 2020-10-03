The set_fs() function dates back to the earliest days of the Linux kernel; it is a key part of the machinery that keeps user-space and kernel-space memory separated from each other. It is also easy to misuse and has been the source of various security problems over the years; kernel developers have long wanted to be rid of it. They won't completely get their wish in the 5.10 kernel but, as the result of work that has been quietly progressing for several months, the end of set_fs() will be easily visible at that point.

This 2017 article describes set_fs() and its history in some detail. The short version is that set_fs() sets the location of the boundary between the user-space portion of the address space and the kernel's part. Any virtual address that is below the boundary set by the last set_fs() call on behalf of a given process is fair game for that process to access, though the memory permissions stored in the page tables still apply. Anything above that limit belongs to the kernel and is out of bounds.

Normally, that boundary should be firmly fixed in place. When the need to move it arises, the reason is usually the same: some kernel subsystem needs to invoke a function that is intended to access user-space data, but on a kernel-space address. Think, for example, of the simple task of reading the contents of a file into a memory buffer; the read() system call will do that, but it also performs all of the usual access checks, meaning that it will refuse to read into a kernel-space buffer. If a kernel subsystem must perform such a read, it first calls set_fs() to disable those checks; if all goes well, it remembers to restore the old boundary with another set_fs() call when the work is done.

Naturally, history has proved that all does not always go well. It's thus not surprising that the development community has wanted to rid itself of set_fs() for many years. It's also unsurprising that this hasn't happened, though. The kernel project does not lack for developers, but there is always a shortage of people who are willing and able to do this sort of deep infrastructural work; it tends to not feature highly in any company's marketing plan. So the task of removing set_fs() has languished for years.