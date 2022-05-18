Language Selection

Bridging the synchronization gap on Linux

Submitted by Roy Schestowitz on Friday 10th of June 2022 12:35:15 AM Filed under
Development
Graphics/Benchmarks
Linux

With older graphics APIs like OpenGL, the client makes a series of API calls, each of which either mutates some bit of state or performs a draw operation. There are a number of techniques that have been used over the years to parallelize the rendering work, but the implementation has to ensure that everything appears to happen in order from the client's perspective. While this served us well for years, it's become harder and harder to keep the GPU fully occupied. Games and other 3D applications have gotten more complex and need multiple CPU cores in order to have enough processing power to reliably render their entire scene in less than 16 milliseconds and achieve a smooth 60 frames per second. GPUs have also gotten larger with more parallelism, and there's only so much a driver can do behind the client's back to parallelize things.

To improve both GPU and CPU utilization, modern APIs like Vulkan take a different approach. Most Vulkan objects such as images are immutable: while the underlying image contents may change, the fundamental properties of the image such as its dimensions, color format, and number of miplevels do not. This is different from OpenGL where the application can change any property of anything at any time. To draw, the client records sequences of rendering commands in command buffers which are submitted to the GPU as a separate step. The command buffers themselves are still stateful, and the recorded commands have the same in-order guarantees as OpenGL. However, the state and ordering guarantees only apply within the command buffer, making it safe to record multiple command buffers simultaneously from different threads. The client only needs to synchronize between threads at the last moment when they submit those command buffers to the GPU. Vulkan also allows the driver to expose multiple hardware work queues of different types which all run in parallel. Getting the most out of a large desktop GPU often requires having 3D rendering, compute, and image/buffer copy (DMA) work happening all at the same time and in parallel with the CPU prep work for the next batch of GPU work.

Enabling all this additional CPU and GPU parallelism comes at a cost: synchronization. One piece of GPU work may depend on other pieces of GPU work, possibly on a different queue. For instance, you may upload a texture on a copy queue and then use that texture on a 3D queue. Because command buffers can be built in parallel and the driver has no idea what the client is actually trying to do, the client has to explicitly provide that dependency information to the driver. In Vulkan, this is done through VkSemaphore objects. If command buffers are the nodes in the dependency graph of work to be done, semaphores are the edges. When a command buffer is submitted to a queue, the client provides two sets of semaphores: a set to wait on before executing the command buffer and a set to signal when the command buffer completes. In our texture upload example, the client would tell the driver to signal a semaphore when the texture upload operation completes and then have it wait on that same semaphore before doing the 3D rendering which uses the texture. This allows the client to take advantage of as much parallelism as it can manage while still having things happen in the correct order as needed.

Sailfish OS - Command Line Interface & Customisation

MTP (the same protocol Android devices use) will expose the /home/nemo folder to the connecting computer, both to graphical and command line applications. But it hides some files and folders from the connected system. A more powerful way is to connect through a terminal via SSH, as explained here. The phone's shell and my beloved URxvt terminal emulator have difficulties communicating. I installed xterm on my computer and am using this to connect to the phone. Works perfectly. Of course one can always use the terminal app directly on the phone. It gives full access to the system via devel-su, just as via SSH. Read more

Haiku and FreeBSD OS Development Reports

  • Haiku Activity & Contract Report: May 2022

    As now seems to be the usual way of things, the monthly Activity Report is hereby combined with my Contract Report. This report covers hrev56088 to hrev56147. Before I get into the development items from May, you may want to know that, though it has not been formally announced yet, you can now donate to Haiku, Inc. (and thus support my ongoing contract) through GitHub Sponsors! Applications PulkoMandy fixed text in the status view being cut off in Installer. Jim906 improved window cascading in FileTypes. jadedctrl added per-track scripting support to MediaPlayer. Now you can use hey (or another interface to Haiku’s scripting suites system) to control MediaPlayer’s playlist. korli added more vendor identifiers to Screen preferences, so that it can identify monitor manufacturers more accurately. apl fixed version date updates in HaikuDepot. He also fixed more properly a crash which waddlesplash had implemented a workaround for last time. dcieslak fixed locale-aware display in DeskCalc, which had been somewhat broken by his changes last time. Jim906 fixed WebPositive to not display the “…” icon on the bookmark bar when the overflow menu would be empty.

  • FreeBSD January to March 2022 Status Report

Fedora Project / Red Hat Leftovers

  • Fedora Community Blog: Community Blog monthly summary: May 2022

    In May, we published 17 posts. The site had 7,126 visits from 4,510 unique viewers. 2,591 visits came from search engines, while 61 came from Fedora Discussion and 58 came from Twitter. The most read post last month was Help Us Test Fedora Linux 36 Beta wallpaper with 758 views. The most read post written last month was Nest With Fedora: Call for proposals and sponsors with 116 views.

  • EBS Storage Performance Notes – Instance throughput vs Volume throughput

    This is important because we have to be aware of the Maximum Total Throughput Capacity for a specific volume vs the Maximum Total Instance Throughput. Because, if your instance type (or server) is able to produce a throughput of 1250MiB/s (i.e M4.16xl)) and your EBS Maximum Throughput is 500MiB/s (i.e. ST1), not only you will hit a bottleneck trying to write to the specific volumes, but also throttling might occur (i.e. EBS on cloud services).

  • Cockpit Project: Cockpit 271

    Cockpit is the modern Linux admin interface. We release regularly.

  • Three insights you might have missed from KubeCon + CloudNativeCon EU [Ed: "Disclosure: TheCUBE is a paid media partner for the KubeCon + CloudNativeCon Europe event." This is paid-for "media" "coverage" -- a growing and disturbing trend]

    If there has been a slowdown in the rush to build new technologies for the Kubernetes and cloud-native world, there wasn’t any sign of it in Valencia, Spain, during the recent KubeCon + CloudNativeCon Europe event. The show was a sellout this year, with more than 7,500 attendees. Perhaps even more significant was that 65% of them attended the gathering for the first time. Attendees included a portion of the Cloud Native Computing Foundation’s 7.1 million cloud-native developers, who gathered to hear the latest on the non-profit’s more than 120 active projects.

  • 3 IT hiring myths about generational differences

    Generational stereotypes aren’t always insulting, but they can be harmful and limit our ability to design and deliver the solutions users want. Many people tend to think of the youngest generation of workers as the most innovative and the oldest as technophobic and resistant to change. While that perception may be true of some people in each generation, it is generally not supported by data. Here are three generational myths worth dispelling.

  • Digital transformation: 3 ways it changes companies

    I’ve worked on intelligent automation projects with many organizations worldwide, and there is no doubt: The digital transformation process has a huge impact – not only by bringing about obvious initial change but also by affecting business leaders’ outlook for the future. One common result is they want more, expect more, and invest more. Here’s why.

  • Using Multipath TCP to better survive outages and increase bandwidth

    MultiPath TCP (MPTCP) gives you a way to bundle multiple paths between systems to increase bandwidth and resilience from failures. Here we show you how to configure a basic setup so you can try out MPTCP, using two virtual guests. We’ll then look at the details of a real world scenario, measuring bandwidth and latency when single links disappear.

  • systemd-oomd issues on desktop [Ed: Systemd is IBM]
    
During the 22.04 cycle, we enabled systemd-oomd [1] by default on
desktop. Since then, there have been reports of systemd-oomd killing
user applications too frequently (e.g. browsers, IDEs, and gnome-shell
in some cases). In addition to a couple of LPs [2][3], I have heard
these reports by word-of-mouth, and there have been discussions on
internal Mattermost. A common theme in these reports is that e.g.
Chrome is killed "suddenly" without any other observable symptoms of
the system nearing OOM.

For more context, systemd-oomd basically has two methods for deciding
a unit's cgroup is a candidate for OOM kill:

1. When total system memory usage _and_ swap usage both exceed
SwapUsedLimit (90% by default, and on Ubuntu) [4], monitored cgoups
with greater than 5% swap usage become OOM kill candidates, and
cgroups with the highest swap usage are acted on first.

2. When a unit's cgroup memory pressure exceeds MemoryPressureLimit
[5] for at least MemoryPressureDurationSec [6], monitored descendant
cgroups will be acted on starting from the ones with the most reclaim
activity to the least reclaim activity.

In the reports I refer to above, applications are being killed due to
(1). In practice, the SwapUsedLimit might be too easy to reach on
Ubuntu, largely because Ubuntu provides just 1GB of swap. Since we
follow the suggestion of setting ManagedOOMSwap=kill on the root slice
[7], every cgroup is eligible for swap kill. When this condition is
met, user applications like browsers are going to be killed first.

While investigating [2], we patched upstream systemd-oomd to fix how
"used memory" was calculated, and we brought the patch into Jammy.
This may have helped the situation a bit, but it does not appear this
was enough to fix the issue entirely.

Given the current situation, I think we should re-consider how
systemd-oomd is configured on Ubuntu. These are the options that come
to mind:

1. Increase SwapUsedLimit (again, currently at 90%). I think this is
probably the safest change, but it is not clear to me how significant
of an impact this would have.

2. Set ManagedOOMSwap more selectively. Again, we currently follow the
recommendation of setting ManagedOOMSwap=kill on the root slice
(-.slice), so every descendant cgroup is a candidate for swap kill. It
_might_ be effective to say "do not swap kill cgroups descendant of
user's app.slice". The downsides of this approach would be that the
configuration does not scale well (i.e. a lot more configuration
needed to get the proper swap kill "coverage"), and this may just
place the problem onto a different class of processes.

3. Do not enable swap kill at all. This would mean systemd-oomd would
only act when memory pressure limits are reached. Given Ubuntu's swap
configuration, does it make sense for systemd-oomd to act on high swap
usage?

4. Increase swap on Ubuntu. I am adding this for completeness, but I
doubt this is a viable option.

I think that either option (1) or (3) would be the most reasonable --
maybe trying (1) first and falling back to (3) if necessary. If anyone
has an opinion on this, or can think of other options, I would
appreciate the input.

Thanks,

Nick 'enr0n' Rosbrook

Videos: KDE, Careers, and Wayland

