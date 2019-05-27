Even with radiators and fans, a system's CPUs can overheat. When that happens, the kernel's thermal governor will cap the maximum frequency of that CPU to allow it to cool. The scheduler, however, is not aware that the CPU's capacity has changed; it may schedule more work than optimal in the current conditions, leading to a performance degradation. Recently, Thara Gopinath did some research and posted a patch set to address this problem. The solution adds an interface to inform the scheduler about thermal events so that it can assign tasks better and thus improve the overall system performance.

The thermal framework in Linux includes a number of elements, including the thermal governor. Its task is to manage the temperature of the system's thermal zones, keeping it within an acceptable range while maintaining good performance (an overview of the thermal framework can be found in this slide set [PDF]). There are a number of thermal governors that can be found in the drivers/thermal/ subdirectory of the kernel tree. If the CPU overheats, the governor may cap the maximum frequency of that CPU, meaning that the processing capacity of the CPU gets reduced too.

The CPU capacity in the scheduler is a value representing the ability of a specific CPU to process tasks (interested readers can find more information in this article). The capacities of the CPUs in a system may vary, especially on architectures like big.LITTLE. The scheduler knows (at least it assumes it knows) how much work can be done on each CPU; it uses that information to balance the task load across the system. If the information the scheduler has on what a given CPU can do is inaccurate because of thermal events (or any other frequency capping), it is likely to put too much work onto that CPU.

Gopinath introduces a term that is useful when talking about this kind of event: "thermal pressure", which is the difference between the maximum processing capacity of a CPU and the currently available capacity, which may be reduced by overheating events. Gopinath explained in the patch set cover letter that the raw thermal pressure is hard to observe and that there is a delay between the capping of the frequency and the scheduler taking it into account. Because of this, the proposal is to use a weighted average over time, where the weight corresponds to the amount of time the maximum frequency was capped.