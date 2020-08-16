The realtime developers have been working for many years to create a kernel where the highest-priority task is always able to run without delay. That has meant a long process of finding and fixing situations where high-priority tasks might be blocked from running; one of the persistent problems in this regard has been kernel code that disables preemption. One tool that the realtime developers have reached for is disabling migration (moving a process from one CPU to another) rather than preemption; this approach has not been entirely popular among scheduler developers, though. Even so, the solution would appear to be this migration-disable patch set from scheduler developer Peter Zijlstra. One of the key scalability techniques used in the kernel is per-CPU data. System-wide locking is an effective way of protecting shared data, but it can kill performance in a number of ways, even if a given lock is itself not heavily contested. Any data structure that is only accessed by a single CPU does not need to be protected by system-wide locks, avoiding this problem. Thus, for example, the memory allocators maintain per-CPU lists of available memory that can be handed out without interference from the other CPUs on the system. But kernel code can only safely manipulate per-CPU data if it has exclusive access to the CPU; if some other process is able to jump in, it could find (or create) inconsistent per-CPU data structures. The normal way to prevent this from happening is to disable preemption when necessary; it is a cheap operation (setting a flag, essentially) that ensures that a given task will not be interrupted until its work is done. Disabling preemption runs afoul of the goals of the realtime developers, who have put so much work into ensuring that any given task can be interrupted if a higher-priority task needs the CPU. As they have worked to remove preemption-disabled regions, they have observed that, often, all that is really needed is to keep tasks from being moved between CPUs while they are accessing per-CPU data, with perhaps some (normally CPU-local) locking as well. See, for example, the kmap_local() work. Disabling migration still allows a process to be preempted, so it does not interfere with the goals of the realtime project — or so those developers hope. Disabling migration brings problems of its own, though. The kernel's CPU scheduler is tasked with making the best use of all of the CPUs in the system. If there are N CPUs available, they should be running the N highest-priority tasks at any given time. That goal cannot be achieved without occasionally moving tasks between CPUs; it would be nice if tasks just happened to land on the right processors every time, but the real world is not like that. Depriving the scheduler of the ability to migrate tasks, even for brief periods, thus takes away a tool that is crucial for the overall behavior and throughput of the system.