Language Selection

English French German Italian Portuguese Spanish

Kernel and Graphics: Vulkan, NVIDIA Memory Compaction and Intel DRM Driver

Filed under
Graphics/Benchmarks
Linux
  • vkBasalt CAS Vulkan Layer Adds FXAA Support

    The open-source vkBasalt project is the independent effort implementing AMD Radeon Image Sharpening / Contrast Adaptive Sharpening technique as a Vulkan post-processing layer that can be used regardless of the (Vulkan-powered) game. With vkBasalt 0.1 also now comes the ability to apply FXAA.

    Fast Approximate Anti-Aliasing (FXAA) is the latest feature of vkBasalt besides the contrast adaptive sharpening. However, for the v0.1 release, CAS and FXAA cannot both be enabled at the same time. It's on the project TODO list for being able to enable both FXAA and CAS in a future release. Like the existing CAS support, the anti-aliasing technique can be used for any Vulkan game thanks to this being implemented as a post-processing layer for this graphics API.

  • mm: Proactive compaction
    For some applications we need to allocate almost all memory as
    hugepages. However, on a running system, higher order allocations can
    fail if the memory is fragmented. Linux kernel currently does on-demand
    compaction as we request more hugepages but this style of compaction
    incurs very high latency. Experiments with one-time full memory
    compaction (followed by hugepage allocations) shows that kernel is able
    to restore a highly fragmented memory state to a fairly compacted memory
    state within <1 sec for a 32G system. Such data suggests that a more
    proactive compaction can help us allocate a large fraction of memory as
    hugepages keeping allocation latencies low.
    
    For a more proactive compaction, the approach taken here is to define
    per page-node tunable called ‘hpage_compaction_effort’ which dictates
    bounds for external fragmentation for HPAGE_PMD_ORDER pages which
    kcompactd should try to maintain.
    
    The tunable is exposed through sysfs:
      /sys/kernel/mm/compaction/node-n/hpage_compaction_effort
    
    The value of this tunable is used to determine low and high thresholds
    for external fragmentation wrt HPAGE_PMD_ORDER order.
    
    Note that previous version of this patch [1] was found to introduce too
    many tunables (per-order, extfrag_{low, high}) but this one reduces them
    to just (per-node, hpage_compaction_effort). Also, the new tunable is an
    opaque value instead of asking for specific bounds of “external
    fragmentation” which would have been difficult to estimate. The internal
    interpretation of this opaque value allows for future fine-tuning.
    
    Currently, we use a simple translation from this tunable to [low, high]
    extfrag thresholds (low=100-hpage_compaction_effort, high=low+10%). To
    periodically check per-node extfrag status, we reuse per-node kcompactd
    threads which are woken up every few milliseconds to check the same. If
    any zone on its corresponding node has extfrag above the high threshold
    for the HPAGE_PMD_ORDER order, the thread starts compaction in
    background till all zones are below the low extfrag level for this
    order. By default. By default, the tunable is set to 0 (=> low=100%,
    high=100%).
    
    This patch is largely based on ideas from Michal Hocko posted here:
    https://lore.kernel.org/linux-mm/20161230131412.GI13301@dhcp22.suse.cz/
    
    * Performance data
    
    System: x64_64, 32G RAM, 12-cores.
    
    I made a small driver that allocates as many hugepages as possible and
    measures allocation latency:
    
    The driver first tries to allocate hugepage using GFP_TRANSHUGE_LIGHT
    and if that fails, tries to allocate with `GFP_TRANSHUGE |
    __GFP_RETRY_MAYFAIL`. The drives stops when both methods fail for a
    hugepage allocation.
    
    Before starting the driver, the system was fragmented from a userspace
    program that allocates all memory and then for each 2M aligned section,
    frees 3/4 of base pages using munmap. The workload is mainly anonymous
    userspace pages which are easy to move around. I intentionally avoided
    unmovable pages in this test to see how much latency we incur just by
    hitting the slow path for most allocations.
    
  • NVIDIA Engineer Continues Working On Proactive Memory Compaction For Linux

    NVIDIA's Nitin Gupta continues working on proactive compaction for the Linux kernel's memory management code.

    This proactive compaction is designed to avoid the high latency introduced right now when the Linux kernel does on-demand compaction when an application needs a lot of hugepages. With this proactive compaction, a large number of hugepages can be requested while avoiding high latencies.

  • Intel Submits Last Bits For Linux 5.5 DRM Driver - Includes More TGL/Gen12, Discrete Bit

    Intel's open-source crew has submitted the last of their feature updates to their "i915" Direct Rendering Manager graphics driver for staging in DRM-Next ahead of the upcoming Linux 5.5 kernel cycle.

    In the previous weeks they've been bringing up a lot of their Tiger Lake / Gen12 graphics code as the dominating theme for the Linux 5.5 kernel. There has also been Jasper Lake support, Xe multi-GPU prepping, and their other routine code clean-ups and driver improvements. Out this morning is the last of their feature work targeting Linux 5.5.

More in Tux Machines

Security: Updates, Mozilla AMO and Reproducible Arch Linux Packages

  • Security updates for Monday

    Security updates have been issued by Debian (ampache, chromium, djvulibre, firefox-esr, gdal, and ruby-haml), Fedora (chromium, file, gd, hostapd, nspr, and rssh), openSUSE (bcm20702a1-firmware, firefox, gdal, libtomcrypt, php7, python-ecdsa, python3, samba, and thunderbird), SUSE (apache2-mod_auth_openidc, libssh2_org, and rsyslog), and Ubuntu (bash).

  • Security improvements in AMO upload tools

    We are making some changes to the submission flow for all add-ons (both AMO- and self-hosted) to improve our ability to detect malicious activity. These changes, which will go into effect later this month, will introduce a small delay in automatic approval for all submissions. The delay can be as short as a few minutes, but may take longer depending on the add-on file. If you use a version of web-ext older than 3.2.1, or a custom script that connects to AMO’s upload API, this new delay in automatic approval will likely cause a timeout error. This does not mean your upload failed; the submission will still go through and be approved shortly after the timeout notification. Your experience using these tools should remain the same otherwise.

  • Reproducible Arch Linux Packages

    Arch Linux has been involved with the reproducible builds efforts since 2016. The goal is to achieve deterministic building of software packages to enhance the security of the distribution. After almost 3 years of continued effort, along with the release of pacman 5.2 and contributions from a lot of people, we are finally able to reproduce packages distributed by Arch Linux! This enables users to build packages and compare them with the ones distributed by the Arch Linux team. Users can independently verify the work done by our packagers, and figure out if malicious code has been included in the pristine source during the build, which in turns enhances the overall supply chain security. We are one of the first binary distributions that has achieved this, and can provide tooling down to users. That was the TL;DR! The rest of the blog post will explain the reproducible builds efforts, and the technical work that has gone into achieving this.

  • Arch Linux Updates Its Kernel Installation Handling

    Arch Linux has updated the behavior when installing the linux, linux-lts, linux-zen, and linux-hardened kernel options on this popular distribution.  The actual kernel images for their official Linux, Linux LTS, Linux Zen, and Linux Hardened flavors will no longer be installed to /boot by default. By not having the actual kernel reside on /boot should help those with separate boot partitions that are quite small and avoid running out of space when keeping multiple kernels installed. 

Sparky 2019.11 Special Editions

There are new live/install media of Sparky 2019.11 “Po Tolo” Special Editions available to download: GameOver, Multimedia & Rescue. The live system is based on the testing branch of Debian “Bullseye”. GameOver Edition features a very large number of preinstalled games, useful tools and scripts. It’s targeted to gamers. Multimedia Edition features a large set of tools for creating and editing graphics, audio, video and HTML pages. The live system of Rescue Edition contains a large set of tools for scanning and fixing files, partitions and operating systems installed on hard drives. Read more

The Many Features & Improvements of the KDE Plasma 5.18 LTS Desktop Environment

With the KDE Plasma 5.17 release out the door last month, it's time to take a closer look at the new features and improvements coming to KDE Plasma 5.18, which will be released early next year as the next LTS (Long Term Support) version of open-source desktop environment designed to run on GNU/Linux distributions. Among the enhancements of the KDE Plasma 5.18 LTS desktop environment, we can mention the ability to select and remove multiple Bluetooth devices simultaneously, support for KSysGuard to display stats for Nvidia graphics hardware, and a new "Home" button in System Settings that will take users back to the main page. Read more

Open-spec, dual-port router offers a choice of Allwinner H3 or H5

FriendlyElec’s Linux-driven, $20 “NanoPi R1S-H3” router uses a modified version of the Allwinner H3-based NanoPi R1, upgrading the second LAN port to GbE while removing a USB port. There’s also a similar, $23 “NanoPi R1S-H5” with a quad -A53 H5. Back in February, FriendlyElec launched the community-backed NanoPi R1 router SBC, which still sells for $29. Now it has followed up with two more affordable NanoPi R1S routers based on upgraded versions of the NanoPi R1 that that give you dual GbE ports instead of 10/100Mbps and 10/1000/1000Mbps. The mainboards are smaller than the R1 at 55.6 x 52mm, and the board and the case have been entirely redesigned. Read more