Operating-System-Directed Power-Management (OSPM) Summit
-
The third Operating-System-Directed Power-Management summit
he third edition of the Operating-System-Directed Power-Management (OSPM) summit was held May 20-22 at the ReTiS Lab of the Scuola Superiore Sant'Anna in Pisa, Italy. The summit is organized to collaborate on ways to reduce the energy consumption of Linux systems, while still meeting performance and other goals. It is attended by scheduler, power-management, and other kernel developers, as well as academics, industry representatives, and others interested in the topics.
-
The future of SCHED_DEADLINE and SCHED_RT for capacity-constrained and asymmetric-capacity systems
The kernel's deadline scheduling class (SCHED_DEADLINE) enables realtime scheduling where every task is guaranteed to meet its deadlines. Unfortunately SCHED_DEADLINE's current view on CPU capacity is far too simple. It doesn't take dynamic voltage and frequency scaling (DVFS), simultaneous multithreading (SMT), asymmetric CPU capacity, or any kind of performance capping (e.g. due to thermal constraints) into consideration.
In particular, if we consider running deadline tasks in a system with performance capping, the question is "what level of guarantee should SCHED_DEADLINE provide?". An interesting discussion about the pro and cons of different approaches (weak, hard, or mixed guarantees) developed during this presentation. There were many different views but the discussion didn't really conclude and will have to be continued at the Linux Plumbers Conference later this year.
The topic of guaranteed performance will become more important for mobile systems in the future as performance capping is likely to become more common. Defining hard guarantees is almost impossible on real systems since silicon behavior very much depends on environmental conditions. The main pushback on the existing scheme is that the guaranteed bandwidth budget might be too conservative. Hence SCHED_DEADLINE might not allow enough bandwidth to be reserved for use cases with higher bandwidth requirements that can tolerate bandwidth reservations not being honored.
-
Scheduler behavioral testing
Validating scheduler behavior is a tricky affair, as multiple subsystems both compete and cooperate with each other to produce the task placement we observe. Valentin Schneider from Arm described the approach taken by his team (the folks behind energy-aware scheduling — EAS) to tackle this problem.
-
CFS wakeup path and Arm big.LITTLE/DynamIQ
"One task per CPU" workloads, as emulated by multi-core Geekbench, can suffer on traditional two-cluster big.LITTLE systems due to the fact that tasks finish earlier on the big CPUs. Arm has introduced a more flexible DynamIQ architecture that can combine big and LITTLE CPUs into a single cluster; in this case, early products apply what's known as phantom scheduler domains (PDs). The concept of PDs is needed for DynamIQ so that the task scheduler can use the existing big.LITTLE extensions in the Completely Fair Scheduler (CFS) scheduler class.
Multi-core Geekbench consists of several tests during which N CFS tasks perform an equal amount of work. The synchronization mechanism pthread_barrier_wait() (i.e. a futex) is used to wait for all tasks to finish their work in test T before starting the tasks again for test T+1.
The problem for Geekbench on big.LITTLE is related to the grouping of big and LITTLE CPUs in separate scheduler (or CPU) groups of the so-called die-level scheduler domain. The two groups exists because the big CPUs share a last-level cache (LLC) and so do the LITTLE CPUs. This isn't true any more for DynamIQ, hence the use of the "phantom" notion here.
The tasks of test T finish earlier on big CPUs and go to sleep at the barrier B. Load balancing then makes sure that the tasks on the LITTLE CPUs migrate to the big CPUs where they continue to run the rest of their work in T before they also go to sleep at B. At this moment, all the tasks in the wake queue have a big CPU as their previous CPU (p->prev_cpu). After the last task has entered pthread_barrier_wait() on a big CPU, all tasks on the wake queue are woken up.
-
I-MECH: realtime virtualization for industrial automation
The typical systems used in industrial automation (e.g. for axis control) consist of a "black box" executing a commercial realtime operating system (RTOS) plus a set of control design tools meant to be run on a different desktop machine. This approach, besides imposing expensive royalties on the system integrator, often does not offer the desired degree of flexibility for testing/implementing novel solutions (e.g., running both control code and design tools on the same platform).
-
Virtual-machine scheduling and scheduling in virtual machines
As is probably well known, a scheduler is the component of an operating system that decides which CPU the various tasks should run on and for how long they are allowed to do so. This happens when an OS runs on the bare hardware of a physical host and it is also the case when the OS runs inside a virtual machine. The only difference being that, in the latter case, the OS scheduler marshals tasks among virtual CPUs.
And what are virtual CPUs? Well, in most platforms they are also a kind of special task and they want to run on some CPUs ... therefore we need a scheduler for that! This is usually called the "double-scheduling" property of systems employing virtualization because, well, there literally are two schedulers: one — let us call it the host scheduler, or the hypervisor scheduler — that schedules the virtual CPUs on the host physical CPUs; and another one — let us call it the guest scheduler — that schedules the guest OS's tasks on the guest's virtual CPUs.
Now what are these two schedulers? That depends on the virtualization platform. They are always different, in the sense that it will never happen that, at runtime, a scheduler has to deal with scheduling virtual CPUs and also scheduling tasks that want to run on those same virtual CPUs (well, it can happen, but then you are not doing virtualization). They can be the same, in terms of code, or they can be completely different from that respect as well.
-
Rock and a hard place: How hard it is to be a CPU idle-time governor
In the opening session of OSPM 2019, Rafael Wysocki from Intel gave a talk about potential problems faced by the designers of CPU idle-time-management governors, which was inspired by his own experience from the timer-events oriented (TEO) governor work done last year.
In the first place, he said, it should be noted that "CPU idleness" is defined at the level of logical CPUs, which may be CPU cores or simultaneous multithreading (SMT) threads, depending on the hardware configuration of the processor. In Linux, a logical CPU is idle when there are no runnable tasks in its queue, so it falls back to executing the idle task associated with it (there is one idle task for each logical CPU in the system, but they all share the same code, which is the idle loop). Therefore "CPU idleness" is an OS (not hardware) concept and if the idle loop is entered by a CPU, there is an opportunity to save some energy with a relatively small impact on performance (or even without any impact on performance at all) — if the hardware supports that.
The idle loop runs on each idle CPU and it only takes this particular CPU into consideration. As a rule, two code modules are invoked in every iteration of it. The first one, referred to as the CPU idle-time-management governor, is responsible for deciding whether or not to stop the scheduler tick and what to tell the hardware to do; the second one, called the CPU idle-time-management driver, passes the governor's decisions down to the hardware, usually in an architecture- or platform-specific way. Then, presumably, the processor enters a special state in which the CPU in question stops fetching instructions (that is, it does literally nothing at all); that may allow the processor's power draw to be reduced and some energy to be saved as a result. If that happens, the processor needs to be woken up from that state by a hardware event after spending some time, referred to as the idle duration, in it. At that point, the governor is called again so it can save the idle-duration value for future use.
- Login or register to post comments
- Printer-friendly version
- 3003 reads
- PDF version
More in Tux Machines
- Highlights
- Front Page
- Latest Headlines
- Archive
- Recent comments
- All-Time Popular Stories
- Hot Topics
- New Members
digiKam 7.7.0 is releasedAfter three months of active maintenance and another bug triage, the digiKam team is proud to present version 7.7.0 of its open source digital photo manager. See below the list of most important features coming with this release. |
Dilution and Misuse of the "Linux" Brand
|
Samsung, Red Hat to Work on Linux Drivers for Future TechThe metaverse is expected to uproot system design as we know it, and Samsung is one of many hardware vendors re-imagining data center infrastructure in preparation for a parallel 3D world. Samsung is working on new memory technologies that provide faster bandwidth inside hardware for data to travel between CPUs, storage and other computing resources. The company also announced it was partnering with Red Hat to ensure these technologies have Linux compatibility. |
today's howtos
|
Recent comments
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago