Language Selection

English French German Italian Portuguese Spanish

Debian

Syndicate content
Planet Debian - https://planet.debian.org/
Updated: 5 hours 21 min ago

Russell Coker: Jitsi on Debian

15 hours 14 min ago

I’ve just setup an instance of the Jitsi video-conference software for my local LUG. Here is an overview of how to set it up on Debian.

Firstly create a new virtual machine to run it. Jitsi is complex and has lots of inter-dependencies. It’s packages want to help you by dragging in other packages and configuring them. This is great if you have a blank slate to start with, but if you already have one component installed and running then it can break things. It wants to configure the Prosody Jabber server and a web server and my first attempt at an install failed when it tried to reconfigure the running instances of Prosody and Apache.

Here’s the upstream install docs [1]. They cover everything fairly well, but I’ll document the configuration I wanted (basic public server with password required to create a meeting).

Basic Installation

The first thing to do is to get a short DNS name like j.example.com. People will type that every time they connect and will thank you for making it short.

Using Certbot for certificates is best. It seems that you need them for j.example.com and auth.j.example.com.

apt install curl certbot /usr/bin/letsencrypt certonly --standalone -d j.example.com,auth.j.example.com -m you@example.com curl https://download.jitsi.org/jitsi-key.gpg.key | gpg --dearmor > /etc/apt/jitsi-keyring.gpg echo "deb [signed-by=/etc/apt/jitsi-keyring.gpg] https://download.jitsi.org stable/" > /etc/apt/sources.list.d/jitsi-stable.list apt-get update apt-get -y install jitsi-meet

When apt installs jitsi-meet and it’s dependencies you get asked many questions for configuring things. Most of it works well.

If you get the nginx certificate wrong or don’t have the full chain then phone clients will abort connections for no apparent reason, it seems that you need to edit /etc/nginx/sites-enabled/j.example.com.conf to use the following ssl configuration:

ssl_certificate /etc/letsencrypt/live/j.example.com/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/j.example.com/privkey.pem;

Then you have to edit /etc/prosody/conf.d/j.example.com.cfg.lua to use the following ssl configuration:

key = "/etc/letsencrypt/live/j.example.com/privkey.pem"; certificate = "/etc/letsencrypt/live/j.example.com/fullchain.pem";

It seems that you need to have an /etc/hosts entry with the public IP address of your server and the names “j.example.com j auth.j.example.com”. Jitsi also appears to use the names “speakerstats.j.example.com conferenceduration.j.example.com lobby.j.example.com conference.j.example.com conference.j.example.com internal.auth.j.example.com” but they aren’t required for a basic setup, I guess you could add them to /etc/hosts to avoid the possibility of strange errors due to it not finding an internal host name. There are optional features of Jitsi which require some of these names, but so far I’ve only used the basic functionality.

Access Control

This section describes how to restrict conference creation to authenticated users.

The secure-domain document [2] shows how to restrict access, but I’ll summarise the basics.

Edit /etc/prosody/conf.avail/j.example.com.cfg.lua and use the following line in the main VirtualHost section:

authentication = "internal_hashed"

Then add the following section:

VirtualHost "guest.j.example.com" authentication = "anonymous" c2s_require_encryption = false modules_enabled = { "turncredentials"; }

Edit /etc/jitsi/meet/j.example.com-config.js and add the following line:

anonymousdomain: 'guest.j.example.com',

Edit /etc/jitsi/jicofo/sip-communicator.properties and add the following line:

org.jitsi.jicofo.auth.URL=XMPP:j.example.com

Then run commands like the following to create new users who can create rooms:

prosodyctl register admin j.example.com

Then restart most things (Prosody at least, maybe parts of Jitsi too), I rebooted the VM.

Now only the accounts you created on the Prosody server will be able to create new meetings. You should be able to add, delete, and change passwords for users via prosodyctl while it’s running once you have set this up.

Conclusion

Once I gave up on the idea of running Jitsi on the same server as anything else it wasn’t particularly difficult to set up. Some bits were a little fiddly and hopefully this post will be a useful resource for people who have trouble understanding the documentation. Generally it’s not difficult to install if it is the only thing running on a VM.

Related posts:

  1. Using LetsEncrypt Lets Encrypt is a new service to provide free SSL...
  2. 802.1x Authentication on Debian I recently had to setup some Linux workstations with 802.1x...
  3. installing Xen domU on Debian Etch I have just been installing a Xen domU on Debian...

Markus Koschany: My Free Software Activities in July 2020

16 hours 13 min ago

Welcome to gambaru.de. Here is my monthly report (+ the first week in August) that covers what I have been doing for Debian. If you’re interested in Java, Games and LTS topics, this might be interesting for you.

Debian Games
  • Last month GCC 10 became the new default compiler for Debian 11 and compilation errors are now release critical. The change affected dozens of games in the archive but fortunately most of them are rather easy to fix and a quick workaround is available. I uploaded several packages with patches from Reiner Herrmann including blastem, freegish, gngb, phlipple, xaos, xboard, gamazons and freesweep. I could add to this list atomix, teg, neverball and biniax2. I am quite confident we can fix the rest of those FTBFS bugs before the freeze.
  • Finally freeorion 0.4.10 was released last month. Among new gameplay changes and bug fixes, freeorion’s Python 2 code was ported to Python 3.
  • Due to the ongoing Python 2 removal pygame-sdl2 in unstable could no longer be built from source and I had to upload the new Python 3 version from experimental. This in turn breaks renpy, a framework for developing visual-novel type games. At the moment it is uncertain if there will be a Python 3 version of renpy for Debian 11 in time while this issue is still being worked on upstream.
  • I uploaded an new upstream release of mgba, a Game Boy Advance emulator, for Ryan Tandy.
Debian Java Misc
  • I fixed the GCC 10 FTBFS in iftop and packaged a new upstream release of osmo, a lean and lightweight personal organizer.
  • New versions of privacybadger, binaryen, wabt and most importantly ublock-origin are also available now. Since the new binary packages webext-ublock-origin-firefox and webext-ublock-origin-chromium were finally accepted into the archive, I am planning to package version 1.29.0 now.
Debian LTS

This was my 53. month as a paid contributor and I have been paid to work 15 hours on Debian LTS, a project started by Raphaël Hertzog. In that time I did the following:

  • DLA-2278-2. Issued a regression update for squid3. It was discovered that the patch for CVE-2019-12523 interrupted the communication between squid and icap or ecap services. The setup is most commonly used with clamav or similar antivirus scanners. I debugged the problem and created a new patch to address the error. In this process I also updated the patch for CVE-2019-12529 to use more code from Debian’s cryptographic nettle library. I also enabled the test suite by default now and corrected a failing test.
  • I have been working on fixing CVE-2020-15049 in squid3. The upstream patch for the 4.x series appears to be simple but to completely address the underlying problem, squid3 requires a backport of the new HttpHeader parsing code which has improved a lot over the last couple of years. The patch is complete but requires more testing. A new update will follow soon.
ELTS

Extended Long Term Support (ELTS) is a project led by Freexian to further extend the lifetime of Debian releases. It is not an official Debian project but all Debian users benefit from it without cost. The current ELTS release is Debian 8 „Jessie“. This was my 26. month and I have been paid to work 13,25 hours on ELTS.

  • ELA-242-1. Issued a security update for tomcat7 fixing 1 CVE.
  • ELA-243-1. Issued a security update for tomcat8 fixing 1 CVE.
  • ELA-253-1. Issued a security update for imagemagick fixing 18 CVE.
  • ELA-254-1. Issued a security update for imagemagick fixing 1 CVE.

Thanks for reading and see you next time.

Jonathan Carter: bashtop, now in buster-backports

17 hours 15 min ago

Recently, I discovered bashtop, yet another fancy top-like utility that’s mostly written in bash (it uses some python3-psutil and shells out to other common system utilities). I like its use of high-colour graphics and despite being written in bash, it’s not as resource heavy as I would have expected and also quite snappy (even on a raspberry pi). While writing this post, I also discovered that the author of bashtop ported it to Python and that the python version is called bpytop (hmm, doesn’t quite have the same ring to it), which is even faster and less resource intensive than the bash version (although I haven’t tried that yet, I guess I will soon…).

I set out to package it, but someone beat me to it, but since I’m also on the backports team these days, I went ahead and backported it for buster. So if you have backports enabled, you can now install it using “apt install bashtop -t buster-backports”.

Dylan Aïssi, who packaged bashtop in Debian, has already filed an ITP for bpytop, so we’ll soon have yet another top-like tool in our collection :-)

Sven Hoexter: Retrieve WLAN PSK via nmcli

18 hours 8 min ago

Note to myself so I do not have to figure that out every few month when I've to dig out a WLAN PSK from my existing configuration.

Step 1: Figure out the UUID of the network:

$ nmcli con show NAME UUID TYPE DEVICE br-59d010130b86 d8672d3d-7cf6-484f-9ef8-e6ec3e73bef7 bridge br-59d010130b86 FRITZ!Box 7411 1ed1cec1-f586-4e75-ba6d-c9f6f4cba6e2 wifi wlp4s0 [...]

Step 2: Request to view the PSK for this network based on the UUID

$ nmcli --show-secrets --fields 802-11-wireless-security.psk con show '1ed1cec1-f586-4e75-ba6d-c9f6f4cba6e2' 802-11-wireless-security.psk: 0815471123420511111

Jonathan Dowland: Generic Haskell

Friday 14th of August 2020 06:41:23 AM

When I did the work described earlier in template haskell, I also explored generic programming in Haskell to solve a particular problem. StrIoT is a program generator: it outputs source code, which may depend upon other modules, which need to be imported via declarations at the top of the source code files.

The data structure that StrIoT manipulates contains information about what modules are loaded to resolve the names that have been used in the input code, so we can walk that structure to automatically derive an import list. The generic programming tools I used for this are from Scrap Your Boilerplate (SYB), a module written to complement a paper of the same name. In this code snippet, everything and mkQ are from SYB:

extractNames :: Data a => a -> [Name] extractNames = everything (++) (\a -> mkQ [] f a) where f = (:[])

The input must be any type which implements typeclass Data, as must all its members (and their members etc.): This holds for the Template Haskell Exp types. The output is a normal list of Names. The utility function has a more specific type Name -> [Name]. This is all that's needed to walk over the heterogeneous data structures and do something specific (f) when we encounter a Name.

Post-processing the Names to get a list of modules is simple

nub . catMaybes . map nameModule . concatMap extractNames

Unfortunately, there's a weird GHC behaviour relating to the module names for some Prelude functions that makes the above less useful in practice. For example, the Prelude function words :: String -> [String] can normally be used without an explicit import (since it's a Prelude function). However, once round-tripped through a Name, it becomes GHC.OldList.words. Attempting to import GHC.OldList fails in some contexts, because it's a hidden module or package. I've been meaning to investigate further and, if necessary, file a GHC bug about this.

For this reason I've scrapped all the above and gone with a different plan. We go back to requiring the user to specify their required import list explicitly. We then walk over the Exp data type prior to code generation and decanonicalize all the Names. I also use generic programming/SYB to do this:

unQualifyNames :: Exp -> Exp unQualifyNames = everywhere (\a -> mkT f a) where f :: Name -> Name f n = if n == '(.) then mkName "." else (mkName . last . splitOn "." . pprint) n

I've had to special-case composition (.) since that code-point is also used as the delimiter between package, module and function. Otherwise this looks very similar to the earlier function, except using everywhere and mkT (make transformation) instead of everything and mkQ (make query).

Keith Packard: picolibc-news

Friday 14th of August 2020 05:43:53 AM
Picolibc Updates

I thought work on picolibc would slow down at some point, but I keep finding more things that need work. I spent a few weeks working in libm and then discovered some important memory allocation bugs in the last week that needed attention too.

Cleaning up the Picolibc Math Library

Picolibc uses the same math library sources as newlib, which includes code from a range of sources:

  • SunPro (Sun Microsystems). This forms the bulk of the common code for the math library, with copyright dates stretching back to 1993. This code is designed for processors with FPUs and uses 'float' for float functions and 'double' for double functions.

  • NetBSD. This is where the complex functions came from, with Copyright dates of 2007.

  • FreeBSD. fenv support for aarch64, arm and sparc

  • IBM. SPU processor support along with a number of stubs for long-double support where long double is the same as double

  • Various processor vendors have provided processor-specific code for exceptions and a few custom functions.

  • Szabolcs Nagy and Wilco Dijkstra (ARM). These two re-implemented some of the more important functions in 2017-2018 for both float and double using double precision arithmetic and fused multiply-add primitives to improve performance for systems with hardware double precision support.

The original SunPro math code had been split into two levels at some point:

  1. IEEE-754 functions. These offer pure IEEE-754 semantics, including return values and exceptions. They do not set the POSIX errno value. These are all prefixed with __ieee754_ and can be called directly by applications if desired.

  2. POSIX functions. These can offer POSIX semantics, including setting errno and returning expected values when errno is set.

New Code Sponsored by ARM

Szabolcs Nagy and Wilco Dijkstra's work in the last few years has been to improve the performance of some of the core math functions, which is much appreciated. They've adopted a more modern coding style (C99) and written faster code at the expense of a larger memory foot print.

One interesting choice was to use double computations for the float implementations of various functions. This makes these functions shorter and more accurate than versions done using float throughout. However, for machines which don't have HW double, this pulls in soft double code which adds considerable size to the resulting binary and slows down the computations, especially if the platform does support HW float.

The new code also takes advantage of HW fused-multiply-add instructions. Those offer more precision than a sequence of primitive instructions, and so the new code can be much shorter as a result.

The method used to detect whether the target machine supported fma operations was slightly broken on 32-bit ARM platforms, where those with 'float' fma acceleration but without 'double' fma acceleration would use the shorter code sequence, but with an emulated fma operation that used the less-precise sequence of operations, leading to significant reductions in the quality of the resulting math functions.

I fixed the double fma detection and then also added float fma detection along with implementations of float and double fma for ARM and RISC-V. Now both of those platforms get fma-enhanced math functions where available.

Errno Adventures

I'd submitted patches to newlib a while ago that aliased the regular math library names to the __ieee754_ functions when the library was configured to not set errno, which is pretty common for embedded environments where a shared errno is a pain anyways.

Note the use of the word “can” in remark about the old POSIX wrapper functions. That's because all of these functions are run-time switchable between “_IEEE_” and “_POSIX_” mode using the _LIB_VERSION global symbol. When left in the usual _IEEE_ mode, none of this extra code was ever executed, so these wrapper functions never did anything beyond what the underlying __ieee754_ functions did.

The new code by Nagy and Dijkstra changed how functions are structured to eliminate the underlying IEEE-754 api. These new functions use tail calls to various __math_ error reporting functions. Those can be configured at library build time to set errno or not, centralizing those decisions in a few functions.

The result of this combination of source material is that in the default configuration, some library functions (those written by Nagy and Dijkstra) would set errno and others (the old SunPro code) would not. To disable all errno references, the library would need to be compiled with a set of options, -D_IEEE_LIBM to disable errno in the SunPro code and -DWANT_ERRNO=0 to disable errno in the new code. To enable errno everywhere, you'd set -D_POSIX_MODE to make the default value for _LIB_VERSION be _POSIX_ instead of _IEEE_.

To clean all of this up, I removed the run-time _LIB_VERSION variable and made that compile-time. In combination with the earlier work to alias the __ieee754_ functions to the regular POSIX names when _IEEE_LIBM was defined this means that the old SunPro POSIX functions now only get used when _IEEE_LIBM is not defined, and in that case the _LIB_VERSION tests always force use of the errno setting code. In addition, I made the value of WANT_ERRNO depend on whether _IEEE_LIBM was defined, so now a single definition (-D_IEEE_LIBM) causes all of the errno handling from libm to be removed, independent of which code is in use.

As part of this work, I added a range of errno tests for the math functions to find places where the wrong errno value was being used.

Exceptions

As an alternative to errno, C also provides for IEEE-754 exceptions through the fenv functions. These have some significant advantages, including having independent bits for each exception type and having them accumulate instead of sharing errno with a huge range of other C library functions. Plus, they're generally implemented in hardware, so you get exceptions for both library functions and primitive operations.

Well, you should get exceptions everywhere, except that the GCC soft float libraries don't support them at all. So, errno can still be useful if you need to know what happened in your library functions when using soft floats.

Newlib has recently seen a spate of fenv support being added for various architectures, so I decided that it would be a good idea to add some tests. I added tests for both primitive operations, and then tests for library functions to check both exceptions and errno values. Oddly, this uncovered a range of minor mistakes in various math functions. Lots of these were mistakes in the SunPro POSIX wrapper functions where they modified the return values from the __ieee754_ implementations. Simply removing those value modifications fixed many of those errors.

Fixing Memory Allocator bugs

Picolibc inherits malloc code from newlib which offers two separate implementations, one big and fast, the other small and slow(er). Selecting between them is done while building the library, and as Picolibc is expected to be used on smaller systems, the small and slow one is the default.

Contributed by someone from ARM back in 2012/2013, nano-mallocr reminds me of the old V7 memory allocator. A linked list, sorted in address order, holds discontiguous chunks of available memory.

Allocation is done by searching for a large enough chunk in the list. The first one large enough is selected, and if it is large enough, a chunk is split off and left on the free list while the remainder is handed to the application. When the list doesn't have any chunk large enough, sbrk is called to get more memory.

Free operations involve walking the list and inserting the chunk in the right location, merging the freed memory with any immediately adjacent chunks to reduce fragmentation.

The size of each chunk is stored just before the first byte of memory used by the application, where it remains while the memory is in use and while on the free list. The free list is formed by pointers stored in the active area of the chunk, so the only overhead for chunks in use is the size field.

Something Something Padding

To deal with the vagaries of alignment, the original nano-mallocr code would allow for there to be 'padding' between the size field and the active memory area. The amount of padding could vary, depending on the alignment required for a particular chunk (in the case of memalign, that padding can be quite large). If present, nano-mallocr would store the padding value in the location immediately before the active area and distinguish that from a regular size field by a negative sign.

The whole padding thing seems mysterious to me -- why would it ever be needed when the allocator could simply create chunks that were aligned to the required value and a multiple of that value in size. The only use I could think of was for memalign; adding this padding field would allow for less over-allocation to find a suitable chunk. I didn't feel like this one (infrequent) use case was worth the extra complexity; it certainly caused me difficulty in reading the code.

A Few Bugs

In reviewing the code, I found a couple of easy-to-fix bugs.

  • calloc was not checking for overflow in multiplication. This is something I've only heard about in the last five or six years -- multiplying the size of each element by the number of elements can end up wrapping around to a small value which may actually succeed and cause the program to mis-behave.

  • realloc copied new_size bytes from the original location to the new location. If the new size was larger than the old, this would read off the end of the original allocation, potentially disclosing information from an adjacent allocation or walk off the end of physical memory and cause some hard fault.

Time For Testing

Once I had uncovered a few bugs in this code, I decided that it would be good to write a few tests to exercise the API. With the tests running on four architectures in nearly 60 variants, it seemed like I'd be able to uncover at least a few more failures:

  • Error tests. Allocating too much memory and make sure the correct errors were returned and that nothing obviously untoward happened.

  • Touch tests. Just call the allocator and validate the return values.

  • Stress test. Allocate lots of blocks, resize them and free them. Make sure, using 'mallinfo', that the malloc arena looked reasonable.

These new tests did find bugs. But not where I expected them. Which is why I'm so fond of testing.

GCC Optimizations

One of my tests was to call calloc and make sure it returned a chunk of memory that appeared to work or failed with a reasonable value. To my surprise, on aarch64, that test never finished. It worked elsewhere, but on that architecture it hung in the middle of calloc itself. Which looked like this:

void * nano_calloc(malloc_size_t n, malloc_size_t elem) { ptrdiff_t bytes; void * mem; if (__builtin_mul_overflow (n, elem, &bytes)) { RERRNO = ENOMEM; return NULL; } mem = nano_malloc(bytes); if (mem != NULL) memset(mem, 0, bytes); return mem; }

Note the naming here -- nano_mallocr uses nano_ prefixes in the code, but then uses #defines to change their names to those expected in the ABI. (No, I don't understand why either). However, GCC sees the real names and has some idea of what these functions are supposed to do. In particular, the pattern:

foo = malloc(n); if (foo) memset(foo, '\0', n);

is converted into a shorter and semantically equivalent:

foo = calloc(n, 1);

Alas, GCC doesn't take into account that this optimization is occurring inside of the implementation of calloc.

Another sequence of code looked like this:

chunk->size = foo nano_free((char *) chunk + CHUNK_OFFSET);

Well, GCC knows that the content of memory passed to free cannot affect the operation of the application, and so it converted this into:

nano_free((char *) chunk + CHUNK_OFFSET);

Remember that nano_mallocr stores the size of the chunk just before the active memory. In this case, nano_mallocr was splitting a large chunk into two pieces, setting the size of the left-over part and placing that on the free list. Failing to set that size value left whatever was there before for the size and usually resulted in the free list becoming quite corrupted.

Both of these problems can be corrected by compiling the code with a couple of GCC command-line switches (-fno-builtin-malloc and -fno-builtin-free).

Reworking Malloc

Having spent this much time reading through the nano_mallocr code, I decided to just go through it and make it easier for me to read today, hoping that other people (which includes 'future me') will also find it a bit easier to follow. I picked a couple of things to focus on:

  1. All newly allocated memory should be cleared. This reduces information disclosure between whatever code freed the memory and whatever code is about to use the memory. Plus, it reduces the effect of un-initialized allocations as they now consistently get zeroed memory. Yes, this masks bugs. Yes, this goes slower. This change is dedicated to Kees Cook, but please blame me for it not him.

  2. Get rid of the 'Padding' notion. Every time I read this code it made my brain hurt. I doubt I'll get any smarter in the future.

  3. Realloc could use some love, improving its efficiency in common cases to reduce memory usage.

  4. Reworking linked list walking. nano_mallocr uses a singly-linked free list and open-codes all list walking. Normally, I'd switch to a library implementation to avoid introducing my own bugs, but in this fairly simple case, I think it's a reasonable compromise to open-code the list operations using some patterns I learned while working at MIT from Bob Scheifler.

  5. Discover necessary values, like padding and the limits of the memory space, from the environment rather than having them hard-coded.

Padding

To get rid of 'Padding' in malloc, I needed to make sure that every chunk was aligned and sized correctly. Remember that there is a header on every allocated chunk which is stored before the active memory which contains the size of the chunk. On 32-bit machines, that size is 4 bytes. If the machine requires allocations to be aligned on 8-byte boundaries (as might be the case for 'double' values), we're now going to force the alignment of the header to 8-bytes, wasting four bytes between the size field and the active memory.

Well, the existing nano_mallocr code also wastes those four bytes to store the 'padding' value. Using a consistent alignment for chunk starting addresses and chunk sizes has made the code a lot simpler and easier to reason about while not using extra memory for normal allocation. Except for memalign, which I'll cover in the next section.

realloc

The original nano_realloc function was as simple as possible:

mem = nano_malloc(new_size); if (mem) { memcpy(mem, old, MIN(old_size, new_size)); nano_free(old); } return mem;

However, this really performs badly when the application is growing a buffer while accumulating data. A couple of simple optimizations occurred to me:

  1. If there's a free chunk just after the original location, it could be merged to the existing block and avoid copying the data.

  2. If the original chunk is at the end of the heap, call sbrk() to increase the size of the chunk.

The second one seems like the more important case; in a small system, the buffer will probably land at the end of the heap at some point, at which point growing it to the size of available memory becomes quite efficient.

When shrinking the buffer, instead of allocating new space and copying, if there's enough space being freed for a new chunk, create one and add it to the free list.

List Walking

Walking singly-linked lists seem like one of the first things we see when learning pointer manipulation in C:

for (element = head; element; element = element->next) do stuff ...

However, this becomes pretty complicated when 'do stuff' includes removing something from the list:

prev = NULL; for (element = head; element; element = element->next) ... if (found) break; ... prev = element if (prev != NULL) prev->next = element->next; else head = element->next;

An extra variable, and a test to figure out how to re-link the list. Bob showed me a simpler way, which I'm sure many people are familiar with:

for (ptr = &head; (element = *ptr); ptr = &(element->next)) ... if (found) break; *ptr = element->next;

Insertion is similar, as you would expect:

for (ptr = &head; (element = *ptr); ptr = &(element->next)) if (found) break; new_element->next = element; *ptr = new_element;

In terms of memory operations, it's the same -- each 'next' pointer is fetched exactly once and the list is re-linked by performing a single store. In terms of reading the code, once you've seen this pattern, getting rid of the extra variable and the conditionals around the list update makes it shorter and less prone to errors.

In the nano_mallocr code, instead of using 'prev = NULL', it actually used 'prev = free_list', and the test for updating the head was 'prev == element', which really caught me unawares.

System Parameters

Any malloc implementation needs to know a couple of things about the system it's running on:

  1. Address space. The maximum range of possible addresses sets the limit on how large a block of memory might be allocated, and hence the size of the 'size' field. Fortunately, we've got the 'size_t' type for this, so we can just use that.

  2. Alignment requirements. These derive from the alignment requirements of the basic machine types, including pointers, integers and floating point numbers which are formed from a combination of machine requirements (some systems will fault if attempting to use memory with the wrong alignment) along with a compromise between memory usage and memory system performance.

I decided to let the system tell me the alignment necessary using a special type declaration and the 'offsetof' operation:

typedef struct { char c; union { void *p; double d; long long ll; size_t s; } u; } align_t; #define MALLOC_ALIGN (offsetof(align_t, u))

Because C requires struct fields to be stored in order of declaration, the 'u' field would have to be after the 'c' field, and would have to be assigned an offset equal to the largest alignment necessary for any of its members. Testing on a range of machines yields the following alignment requirements:

Architecture Alignment x86_64 8 RISC-V 8 aarch64 8 arm 8 x86 4

So, I guess I could have just used a constant value of '8' and not worried about it, but using the compiler-provided value means that running picolibc on older architectures might save a bit of memory at no real cost in the code.

Now, the header containing the 'size' field can be aligned to this value, and all allocated blocks can be allocated in units of this value.

memalign

memalign, valloc and pvalloc all allocate memory with restrictions on the alignment of the base address and length. You'd think these would be simple -- allocate a large chunk, align within that chunk and return the address. However, they also all require that the address can be successfully passed to free. Which means that the allocator needs to do some tricks to make it all work. Essentially, you allocate 'lots' of memory and then arrange that any bytes at the head and tail of the allocation can be returned to the free list.

The tail part is easy; if it's large enough to form a free chunk (which must contain the size and a 'next' pointer for the free list), it can be split off. Otherwise, it just sits at the end of the allocation being wasted space.

The head part is a bit tricky when it's not large enough to form a free chunk. That's where the 'padding' business came in handy; that can be as small as a 'size_t' value, which (on 32-bit systems) is only four bytes.

Now that we're giving up trying to reason about 'padding', any extra block at the start must be big enough to hold a free block, which includes the size and a next pointer. On 32-bit systems, that's just 8 bytes which (for most of our targets) is the same as the alignment value we're using. On 32-bit systems that can use 4-byte alignment, and on 64-bit systems, it's possible that the alignment required by the application for memalign and the alignment of a chunk returned by malloc might be off by too small an amount to create a free chunk.

So, we just allocate a lot of extra space; enough so that we can create a block of size 'toosmall + align' at the start and create a free chunk of memory out of that.

This works, and at least returns all of the unused memory back for other allocations.

Sending Patches Back to Newlib

I've sent the floating point fixes upstream to newlib where they've already landed on master. I've sent most of the malloc fixes, but I'm not sure they really care about seeing nano_mallocr refactored. If they do, I'll spend the time necessary to get the changes ported back to the newlib internal APIs and merged upstream.

John Goerzen: In Which COVID-19 Misinformation Leads To A Bunch of Graphs Made With Rust

Friday 14th of August 2020 01:44:03 AM

A funny — and by funny, I mean sad — thing has happened. Recently the Kansas Department of Health and Environment (KDHE) has been analyzing data from the patchwork implementation of mask requirements in Kansas. They came to a conclusion that shouldn’t be surprising to anyone: masks help. They published a chart showing this. A right-wing propaganda publication got ahold of this, and claimed the numbers were “doctored” because there were two-different Y-axes.

I set about to analyze the data myself from public sources, and produced graphs of various kinds using a single Y-axis and supporting the idea that the graphs were not, in fact, doctored. Here’s one graph that’s showing that:

In order to do that, I had imported COVID-19 data from various public sources. Many states in the US are large enough to have significant variation in COVID-19 conditions, and many of the source people look at don’t show county-level data over time. I wanted to do that.

Eventually, I wrote covid19db, which ingests data from a number of public sources and generates a SQLite database file. Using Github Actions, this file is automatically updated every morning and available for download. Or, you can download the code and generate a database yourself locally.

Then, I wrote covid19ks, which generates various pretty graphs covering the data. These graphs, incidentally, turn out to highlight just how poorly the United States is doing compared to the rest of the industrialized world.

I hope that these resources, and especially covid19db, might be useful to others that would like to analyze the data. The code isn’t the prettiest since it was done in a hurry, but I think that functionally this is useful.

Reproducible Builds (diffoscope): diffoscope 156 released

Friday 14th of August 2020 12:00:00 AM

The diffoscope maintainers are pleased to announce the release of diffoscope version 156. This version includes the following changes:

[ Chris Lamb ] * Update PPU tests for compatibility with Free Pascal versions 3.2.0 or greater. (Closes: #968124) * Emit a debug-level logging message when our ppudump(1) version does not match file header. * Add and use an assert_diff helper that loads and compares a fixture output to avoid a bunch of test boilerplate. [ Frazer Clews ] * Apply some pylint suggestions to the codebase.

You find out more by visiting the project homepage.

Sven Hoexter: An Average IT Org

Thursday 13th of August 2020 04:14:25 PM

Supply chain attacks are a known issue, and also lately there was a discussion around the relevance of reproducible builds. Looking in comparison at an average IT org doing something with the internet, I believe the pressing problem is neither supply chain attacks nor a lack of reproducible builds. The real problem is the amount of prefabricated binaries supplied by someone else, created in an unknown build environment with unknown tools, the average IT org requires to do anything.

The Mess the World Runs on

By chance I had an opportunity to look at what some other people I know use, and here is the list I could compile by scratching just at the surface:

  • 80% of what HashiCorp releases. Vagrant, packer, nomad, terraform, just all of it. In the case of terraform of course with a bunch of providers and for Vagrant with machine images from the official registry.
  • Lots of ansible usecases, usually retrieved by pip.
  • Jenkins + a myriad of plugins from the Jenkins plugin registry.
  • All the tools/SDKs of a cloud provider du jour to interface with the Cloud. Mostly via Debian repository.
  • docker (the repo for dockerd) and DockerHub
  • Mobile SDKs.
  • Kafka fetched somewhere from apache.org.
  • Binary downloads from github. Many. Go and Rust make it possible.
  • Elastic, more or less the whole stack they offer via their Debian repo.
  • Postgres + the tools around it from the apt.postgresql.org Debian repo.
  • archive.debian.org because it's hard to keep up at times.
  • Maven Central.

Of course there are also all the script language repos - Python, Ruby, Node/Typescript - around as well.

Looking at myself, who's working in a different IT org but with a similar focus, I have the following lingering around on my for work laptop and retrieved it as a binary from a 3rd party:

  • dockerd from the docker repo
  • vscode from the microsoft repo
  • vivaldi from the vivaldi repo
  • Google Cloud SDK from the google repo
  • terraform + all the providers from hashicorp
  • govc form github
  • containerdiff from github(yes, by now included in Debian main)
  • github gh cli tool from github
  • wtfutil from github

Yes some of that is even non-free and might contain spyw^telemetry.

Takeway I

By guessing based on Pareto Principle probably 80% of the software mentioned above is also open source software. But, and here we leave Pareto behind, close to none is build by the average IT org from source.

Why should the average IT org care about advanced issues like supply chain attacks on source code and mitigations, when it already gets into very hot water the day DockerHub closes down, HashiCorp moves from open core to full proprietary or Elastic decides to no longer offer free binary builds?

The reality out there seems to be that infrastructure of "modern" IT orgs is managed similar to the Windows 95 installation of my childhood. You just grab running binaries from somewhere and run them. The main difference seems to be that you no longer have the inconvenience of downloading a .xls from geocities you've to rename to .rar and that it's legal.

Takeway II

In the end the binary supply is like a drug for the user, and somehow the Debian project is also just another dealer / middle man in this setup. There are probably a lot of open questions to think about in that context.

Are we the better dealer because we care about signed sources we retrieve from upstream and because we engage in reproducible build projects?

Are our own means of distributing binaries any better than a binary download from github via https with a manual checksum verification, or the Debian repo at download.docker.com?

Is the approach of the BSD/Gentoo ports, where you have to compile at least some software from source, the better one?

Do I really want to know how some of the software is actually build?

Or some more candid ones like is gnutls a good choice for the https support in apt and how solid is the gnupg code base?

Norbert Preining: Switching from KDE/Plasma to Gnome3 for one week

Thursday 13th of August 2020 12:35:21 PM

This guy is doing a great work, providing patches to improve kwin, and then tried Gnome3 for a week, 7 days. His verdict:

overall, after one week of using this, I can say…

it’s f**king HORRIBLE!!!

…well, it’s not as bad as I thought… but yeah, it was overall a pretty unpleasant experience…
I definitely have seen improvements, but the desktop is still not in shape.

I mean, it’s still not as usable as KDE is….

Honestly, I can’t agree more. I have tried Gnome3 for over a year, again and again, and it feels like a block of concrete put onto the feet of dissidents by Italian mafia bosses. It drowns and kills you.

Here are the links: Start, Day 1, Day 2, Day 3, Day 4, Day 5, Day 6, Day 7.

Thanks a lot for these blog posts, incredibly informative and convincing!

Erich Schubert: Publisher MDPI lies to prospective authors

Thursday 13th of August 2020 08:21:40 AM

The publisher MDPI is a spammer and lies.

If you upload a paper draft to arXiv, MDPI will send spam to the authors to solicit submission. Within minutes of an upload I received the following email (sent by MDPI staff, not some overly eager new editor):

We read your recent manuscript "[...]" on arXiv, and sincerely invite you to submit it to our journal Future Internet, if it has not been published or submitted elsewhere. Future Internet (ISSN 1999-5903, indexed by Scopus, Ei compendex, *ESCI*-Web of Science) is a journal on Internet technologies and the information society. It maintains a rigorous and fast peer review system with a median publication time of 35 days from submission to online publication, and 3 days from acceptance to publication. The journal scope is shown here: https://www.mdpi.com/journal/futureinternet/about. Editorial Board: https://www.mdpi.com/journal/futureinternet/editors. Since Future Internet is an open access journal there is a publication fee. Your paper will be published, with a 20% discount (amounting to 200 CHF), and provided that it is accepted after our standard peer-review procedure.

First of all, the email begins with a lie. Because this paper clearly states that it is submitted elsewhere. Also, it fits other journals much better, and if they had read even just the abstract, they would have known.

This is predatory behavior by MDPI. Clearly, it is just about getting as many submissions as possible. The journal charges 1000 CHF (next year, 1400 CHF) to publish the papers. Its about the money.

Also, there have been reports that MDPI ignores the reviews, and always publishes even when reviewers recommended rejection…

The reviewer requests I have received from MDPI came with unreasonable deadlines, which will not allow for a thorough peer review. Hence I asked to not ever be emailed by them again. I must assume that many other qualified reviewers do the same. MDPI boasts in their 2019 annual report a median time to first decision of 19 days – in my discipline the typical time window to ask for reviews is at least a month (for shorter conference papers, not full journal articles), because professors tend to have lots of other duties, hence they need more flexibility. Above paper has been submitted in March, and is now under review for 4 months already. This is an annoying long time window, and I would appreciate if this were less, but it shows how extremely short the MDPI time frame is. They also claim 269.1k submissions and 106.2k published papers, so the acceptance rate is around 40% on average, and assuming that there are some journals with higher standards there then some must have acceptance rates much higher than this. I’d assume that many reputable journals have 40% desk-rejection rate for papers that are not even on-topic …

The average cost to authors is given as 1144 CHF (after discounts, 25% waived feeds etc.), so they, so we are talking about 120 million CHF of revenue from authors. Is that what you want academic publishing to be?

I am not happy with some of the established publishers such as Elsevier that also overcharge universities heavily. I do think we need to change academic publishing, and arXiv is a big improvement here. But I do not respect publishers such as MDPI that lie and send spam.

Norbert Preining: KDE/Plasma Status Update 2020-08-13

Wednesday 12th of August 2020 10:37:03 PM

Short status update on my KDE/Plasma packages for Debian sid and testing:

  • Frameworks update to 5.73
  • Apps are now available from the main archive in the same upstream version and superseeding my packages
  • New architecture: aarch64

Hope that helps a few people. See this post for how to setup archives.

Enjoy.

Michael Stapelberg: distri: 20x faster initramfs (initrd) from scratch

Wednesday 12th of August 2020 08:07:23 AM

In case you are not yet familiar with why an initramfs (or initrd, or initial ramdisk) is typically used when starting Linux, let me quote the wikipedia definition:

“[…] initrd is a scheme for loading a temporary root file system into memory, which may be used as part of the Linux startup process […] to make preparations before the real root file system can be mounted.”

Many Linux distributions do not compile all file system drivers into the kernel, but instead load them on-demand from an initramfs, which saves memory.

Another common scenario, in which an initramfs is required, is full-disk encryption: the disk must be unlocked from userspace, but since userspace is encrypted, an initramfs is used.

Motivation

Thus far, building a distri disk image was quite slow:

This is on an AMD Ryzen 3900X 12-core processor (2019):

distri % time make cryptimage serial=1 80.29s user 13.56s system 186% cpu 50.419 total # 19s image, 31s initrd

Of these 50 seconds, dracut’s initramfs generation accounts for 31 seconds (62%)!

Initramfs generation time drops to 8.7 seconds once dracut no longer needs to use the single-threaded gzip(1) , but the multi-threaded replacement pigz(1) :

This brings the total time to build a distri disk image down to:

distri % time make cryptimage serial=1 76.85s user 13.23s system 327% cpu 27.509 total # 19s image, 8.7s initrd

Clearly, when you use dracut on any modern computer, you should make pigz available. dracut should fail to compile unless one explicitly opts into the known-slower gzip. For more thoughts on optional dependencies, see “Optional dependencies don’t work”.

But why does it take 8.7 seconds still? Can we go faster?

The answer is Yes! I recently built a distri-specific initramfs I’m calling minitrd. I wrote both big parts from scratch:

  1. the initramfs generator program (distri initrd)
  2. a custom Go userland (cmd/minitrd), running as /init in the initramfs.

minitrd generates the initramfs image in ≈400ms, bringing the total time down to:

distri % time make cryptimage serial=1 50.09s user 8.80s system 314% cpu 18.739 total # 18s image, 400ms initrd

(The remaining time is spent in preparing the file system, then installing and configuring the distri system, i.e. preparing a disk image you can run on real hardware.)

How can minitrd be 20 times faster than dracut?

dracut is mainly written in shell, with a C helper program. It drives the generation process by spawning lots of external dependencies (e.g. ldd or the dracut-install helper program). I assume that the combination of using an interpreted language (shell) that spawns lots of processes and precludes a concurrent architecture is to blame for the poor performance.

minitrd is written in Go, with speed as a goal. It leverages concurrency and uses no external dependencies; everything happens within a single process (but with enough threads to saturate modern hardware).

Measuring early boot time using qemu, I measured the dracut-generated initramfs taking 588ms to display the full disk encryption passphrase prompt, whereas minitrd took only 195ms.

The rest of this article dives deeper into how minitrd works.

What does an initramfs do?

Ultimately, the job of an initramfs is to make the root file system available and continue booting the system from there. Depending on the system setup, this involves the following 5 steps:

1. Load kernel modules to access the block devices with the root file system

Depending on the system, the block devices with the root file system might already be present when the initramfs runs, or some kernel modules might need to be loaded first. On my Dell XPS 9360 laptop, the NVMe system disk is already present when the initramfs starts, whereas in qemu, we need to load the virtio_pci module, followed by the virtio_scsi module.

How will our userland program know which kernel modules to load? Linux kernel modules declare patterns for their supported hardware as an alias, e.g.:

initrd# grep virtio_pci lib/modules/5.4.6/modules.alias alias pci:v00001AF4d*sv*sd*bc*sc*i* virtio_pci

Devices in sysfs have a modalias file whose content can be matched against these declarations to identify the module to load:

initrd# cat /sys/devices/pci0000:00/*/modalias pci:v00001AF4d00001005sv00001AF4sd00000004bc00scFFi00 pci:v00001AF4d00001004sv00001AF4sd00000008bc01sc00i00 […]

Hence, for the initial round of module loading, it is sufficient to locate all modalias files within sysfs and load the responsible modules.

Loading a kernel module can result in new devices appearing. When that happens, the kernel sends a uevent, which the uevent consumer in userspace receives via a netlink socket. Typically, this consumer is udev(7) , but in our case, it’s minitrd.

For each uevent messages that comes with a MODALIAS variable, minitrd will load the relevant kernel module(s).

When loading a kernel module, its dependencies need to be loaded first. Dependency information is stored in the modules.dep file in a Makefile-like syntax:

initrd# grep virtio_pci lib/modules/5.4.6/modules.dep kernel/drivers/virtio/virtio_pci.ko: kernel/drivers/virtio/virtio_ring.ko kernel/drivers/virtio/virtio.ko

To load a module, we can open its file and then call the Linux-specific finit_module(2) system call. Some modules are expected to return an error code, e.g. ENODEV or ENOENT when some hardware device is not actually present.

Side note: next to the textual versions, there are also binary versions of the modules.alias and modules.dep files. Presumably, those can be queried more quickly, but for simplicitly, I have not (yet?) implemented support in minitrd.

2. Console settings: font, keyboard layout

Setting a legible font is necessary for hi-dpi displays. On my Dell XPS 9360 (3200 x 1800 QHD+ display), the following works well:

initrd# setfont latarcyrheb-sun32

Setting the user’s keyboard layout is necessary for entering the LUKS full-disk encryption passphrase in their preferred keyboard layout. I use the NEO layout:

initrd# loadkeys neo 3. Block device identification

In the Linux kernel, block device enumeration order is not necessarily the same on each boot. Even if it was deterministic, device order could still be changed when users modify their computer’s device topology (e.g. connect a new disk to a formerly unused port).

Hence, it is good style to refer to disks and their partitions with stable identifiers. This also applies to boot loader configuration, and so most distributions will set a kernel parameter such as root=UUID=1fa04de7-30a9-4183-93e9-1b0061567121.

Identifying the block device or partition with the specified UUID is the initramfs’s job.

Depending on what the device contains, the UUID comes from a different place. For example, ext4 file systems have a UUID field in their file system superblock, whereas LUKS volumes have a UUID in their LUKS header.

Canonically, probing a device to extract the UUID is done by libblkid from the util-linux package, but the logic can easily be re-implemented in other languages and changes rarely. minitrd comes with its own implementation to avoid cgo or running the blkid(8) program.

4. LUKS full-disk encryption unlocking (only on encrypted systems)

Unlocking a LUKS-encrypted volume is done in userspace. The kernel handles the crypto, but reading the metadata, obtaining the passphrase (or e.g. key material from a file) and setting up the device mapper table entries are done in user space.

initrd# modprobe algif_skcipher initrd# cryptsetup luksOpen /dev/sda4 cryptroot1

After the user entered their passphrase, the root file system can be mounted:

initrd# mount /dev/dm-0 /mnt 5. Continuing the boot process (switch_root)

Now that everything is set up, we need to pass execution to the init program on the root file system with a careful sequence of chdir(2) , mount(2) , chroot(2) , chdir(2) and execve(2) system calls that is explained in this busybox switch_root comment.

initrd# mount -t devtmpfs dev /mnt/dev initrd# exec switch_root -c /dev/console /mnt /init

To conserve RAM, the files in the temporary file system to which the initramfs archive is extracted are typically deleted.

How is an initramfs generated?

An initramfs “image” (more accurately: archive) is a compressed cpio archive. Typically, gzip compression is used, but the kernel supports a bunch of different algorithms and distributions such as Ubuntu are switching to lz4.

Generators typically prepare a temporary directory and feed it to the cpio(1) program. In minitrd, we read the files into memory and generate the cpio archive using the go-cpio package. We use the pgzip package for parallel gzip compression.

The following files need to go into the cpio archive:

minitrd Go userland

The minitrd binary is copied into the cpio archive as /init and will be run by the kernel after extracting the archive.

Like the rest of distri, minitrd is built statically without cgo, which means it can be copied as-is into the cpio archive.

Linux kernel modules

Aside from the modules.alias and modules.dep metadata files, the kernel modules themselves reside in e.g. /lib/modules/5.4.6/kernel and need to be copied into the cpio archive.

Copying all modules results in a ≈80 MiB archive, so it is common to only copy modules that are relevant to the initramfs’s features. This reduces archive size to ≈24 MiB.

The filtering relies on hard-coded patterns and module names. For example, disk encryption related modules are all kernel modules underneath kernel/crypto, plus kernel/drivers/md/dm-crypt.ko.

When generating a host-only initramfs (works on precisely the computer that generated it), some initramfs generators look at the currently loaded modules and just copy those.

Console Fonts and Keymaps

The kbd package’s setfont(8) and loadkeys(1) programs load console fonts and keymaps from /usr/share/consolefonts and /usr/share/keymaps, respectively.

Hence, these directories need to be copied into the cpio archive. Depending on whether the initramfs should be generic (work on many computers) or host-only (works on precisely the computer/settings that generated it), the entire directories are copied, or only the required font/keymap.

cryptsetup, setfont, loadkeys

These programs are (currently) required because minitrd does not implement their functionality.

As they are dynamically linked, not only the programs themselves need to be copied, but also the ELF dynamic linking loader (path stored in the .interp ELF section) and any ELF library dependencies.

For example, cryptsetup in distri declares the ELF interpreter /ro/glibc-amd64-2.27-3/out/lib/ld-linux-x86-64.so.2 and declares dependencies on shared libraries libcryptsetup.so.12, libblkid.so.1 and others. Luckily, in distri, packages contain a lib subdirectory containing symbolic links to the resolved shared library paths (hermetic packaging), so it is sufficient to mirror the lib directory into the cpio archive, recursing into shared library dependencies of shared libraries.

cryptsetup also requires the GCC runtime library libgcc_s.so.1 to be present at runtime, and will abort with an error message about not being able to call pthread_cancel(3) if it is unavailable.

time zone data

To print log messages in the correct time zone, we copy /etc/localtime from the host into the cpio archive.

minitrd outside of distri?

I currently have no desire to make minitrd available outside of distri. While the technical challenges (such as extending the generator to not rely on distri’s hermetic packages) are surmountable, I don’t want to support people’s initramfs remotely.

Also, I think that people’s efforts should in general be spent on rallying behind dracut and making it work faster, thereby benefiting all Linux distributions that use dracut (increasingly more). With minitrd, I have demonstrated that significant speed-ups are achievable.

Conclusion

It was interesting to dive into how an initramfs really works. I had been working with the concept for many years, from small tasks such as “debug why the encrypted root file system is not unlocked” to more complicated tasks such as “set up a root file system on DRBD for a high-availability setup”. But even with that sort of experience, I didn’t know all the details, until I was forced to implement every little thing.

As I suspected going into this exercise, dracut is much slower than it needs to be. Re-implementing its generation stage in a modern language instead of shell helps a lot.

Of course, my minitrd does a bit less than dracut, but not drastically so. The overall architecture is the same.

I hope my effort helps with two things:

  1. As a teaching implementation: instead of wading through the various components that make up a modern initramfs (udev, systemd, various shell scripts, …), people can learn about how an initramfs works in a single place.

  2. I hope the significant time difference motivates people to improve dracut.

Appendix: qemu development environment

Before writing any Go code, I did some manual prototyping. Learning how other people prototype is often immensely useful to me, so I’m sharing my notes here.

First, I copied all kernel modules and a statically built busybox binary:

% mkdir -p lib/modules/5.4.6 % cp -Lr /ro/lib/modules/5.4.6/* lib/modules/5.4.6/ % cp ~/busybox-1.22.0-amd64/busybox sh

To generate an initramfs from the current directory, I used:

% find . | cpio -o -H newc | pigz > /tmp/initrd

In distri’s Makefile, I append these flags to the QEMU invocation:

-kernel /tmp/kernel \ -initrd /tmp/initrd \ -append "root=/dev/mapper/cryptroot1 rdinit=/sh ro console=ttyS0,115200 rd.luks=1 rd.luks.uuid=63051f8a-54b9-4996-b94f-3cf105af2900 rd.luks.name=63051f8a-54b9-4996-b94f-3cf105af2900=cryptroot1 rd.vconsole.keymap=neo rd.vconsole.font=latarcyrheb-sun32 init=/init systemd.setenv=PATH=/bin rw vga=836"

The vga= mode parameter is required for loading font latarcyrheb-sun32.

Once in the busybox shell, I manually prepared the required mount points and kernel modules:

ln -s sh mount ln -s sh lsmod mkdir /proc /sys /run /mnt mount -t proc proc /proc mount -t sysfs sys /sys mount -t devtmpfs dev /dev modprobe virtio_pci modprobe virtio_scsi

As a next step, I copied cryptsetup and dependencies into the initramfs directory:

% for f in /ro/cryptsetup-amd64-2.0.4-6/lib/*; do full=$(readlink -f $f); rel=$(echo $full | sed 's,^/,,g'); mkdir -p $(dirname $rel); install $full $rel; done % ln -s ld-2.27.so ro/glibc-amd64-2.27-3/out/lib/ld-linux-x86-64.so.2 % cp /ro/glibc-amd64-2.27-3/out/lib/ld-2.27.so ro/glibc-amd64-2.27-3/out/lib/ld-2.27.so % cp -r /ro/cryptsetup-amd64-2.0.4-6/lib ro/cryptsetup-amd64-2.0.4-6/ % mkdir -p ro/gcc-libs-amd64-8.2.0-3/out/lib64/ % cp /ro/gcc-libs-amd64-8.2.0-3/out/lib64/libgcc_s.so.1 ro/gcc-libs-amd64-8.2.0-3/out/lib64/libgcc_s.so.1 % ln -s /ro/gcc-libs-amd64-8.2.0-3/out/lib64/libgcc_s.so.1 ro/cryptsetup-amd64-2.0.4-6/lib % cp -r /ro/lvm2-amd64-2.03.00-6/lib ro/lvm2-amd64-2.03.00-6/

In busybox, I used the following commands to unlock the root file system:

modprobe algif_skcipher ./cryptsetup luksOpen /dev/sda4 cryptroot1 mount /dev/dm-0 /mnt

Michael Stapelberg: distri: a Linux distribution to research fast package management

Wednesday 12th of August 2020 08:07:23 AM

Over the last year or so I have worked on a research linux distribution in my spare time. It’s not a distribution for researchers (like Scientific Linux), but my personal playground project to research linux distribution development, i.e. try out fresh ideas.

This article focuses on the package format and its advantages, but there is more to distri, which I will cover in upcoming blog posts.

Motivation

I was a Debian Developer for the 7 years from 2012 to 2019, but using the distribution often left me frustrated, ultimately resulting in me winding down my Debian work.

Frequently, I was noticing a large gap between the actual speed of an operation (e.g. doing an update) and the possible speed based on back of the envelope calculations. I wrote more about this in my blog post “Package managers are slow”.

To me, this observation means that either there is potential to optimize the package manager itself (e.g. apt), or what the system does is just too complex. While I remember seeing some low-hanging fruit¹, through my work on distri, I wanted to explore whether all the complexity we currently have in Linux distributions such as Debian or Fedora is inherent to the problem space.

I have completed enough of the experiment to conclude that the complexity is not inherent: I can build a Linux distribution for general-enough purposes which is much less complex than existing ones.

① Those were low-hanging fruit from a user perspective. I’m not saying that fixing them is easy in the technical sense; I know too little about apt’s code base to make such a statement.

Key idea: packages are images, not archives

One key idea is to switch from using archives to using images for package contents. Common package managers such as dpkg(1) use tar(1) archives with various compression algorithms.

distri uses SquashFS images, a comparatively simple file system image format that I happen to be familiar with from my work on the gokrazy Raspberry Pi 3 Go platform.

This idea is not novel: AppImage and snappy also use images, but only for individual, self-contained applications. distri however uses images for distribution packages with dependencies. In particular, there is no duplication of shared libraries in distri.

A nice side effect of using read-only image files is that applications are immutable and can hence not be broken by accidental (or malicious!) modification.

Key idea: separate hierarchies

Package contents are made available under a fully-qualified path. E.g., all files provided by package zsh-amd64-5.6.2-3 are available under /ro/zsh-amd64-5.6.2-3. The mountpoint /ro stands for read-only, which is short yet descriptive.

Perhaps surprisingly, building software with custom prefix values of e.g. /ro/zsh-amd64-5.6.2-3 is widely supported, thanks to:

  1. Linux distributions, which build software with prefix set to /usr, whereas FreeBSD (and the autotools default), which build with prefix set to /usr/local.

  2. Enthusiast users in corporate or research environments, who install software into their home directories.

Because using a custom prefix is a common scenario, upstream awareness for prefix-correctness is generally high, and the rarely required patch will be quickly accepted.

Key idea: exchange directories

Software packages often exchange data by placing or locating files in well-known directories. Here are just a few examples:

  • gcc(1) locates the libusb(3) headers via /usr/include
  • man(1) locates the nginx(1) manpage via /usr/share/man.
  • zsh(1) locates executable programs via PATH components such as /bin

In distri, these locations are called exchange directories and are provided via FUSE in /ro.

Exchange directories come in two different flavors:

  1. global. The exchange directory, e.g. /ro/share, provides the union of the share sub directory of all packages in the package store.
    Global exchange directories are largely used for compatibility, see below.

  2. per-package. Useful for tight coupling: e.g. irssi(1) does not provide any ABI guarantees, so plugins such as irssi-robustirc can declare that they want e.g. /ro/irssi-amd64-1.1.1-1/out/lib/irssi/modules to be a per-package exchange directory and contain files from their lib/irssi/modules.

Note: Only a few exchange directories are also available in the package build environment (as opposed to run-time). Search paths sometimes need to be fixed

Programs which use exchange directories sometimes use search paths to access multiple exchange directories. In fact, the examples above were taken from gcc(1) ’s INCLUDEPATH, man(1) ’s MANPATH and zsh(1) ’s PATH. These are prominent ones, but more examples are easy to find: zsh(1) loads completion functions from its FPATH.

Some search path values are derived from --datadir=/ro/share and require no further attention, but others might derive from e.g. --prefix=/ro/zsh-amd64-5.6.2-3/out and need to be pointed to an exchange directory via a specific command line flag.

Note: To create the illusion of a writable search path at package build-time, $DESTDIR/ro/share and $DESTDIR/ro/lib are diverted to $DESTDIR/$PREFIX/share and $DESTDIR/$PREFIX/lib, respectively. FHS compatibility

Global exchange directories are used to make distri provide enough of the Filesystem Hierarchy Standard (FHS) that third-party software largely just works. This includes a C development environment.

I successfully ran a few programs from their binary packages such as Google Chrome, Spotify, or Microsoft’s Visual Studio Code.

Fast package manager

I previously wrote about how Linux distribution package managers are too slow.

distri’s package manager is extremely fast. Its main bottleneck is typically the network link, even at high speed links (I tested with a 100 Gbps link).

Its speed comes largely from an architecture which allows the package manager to do less work. Specifically:

  1. Package images can be added atomically to the package store, so we can safely skip fsync(2) . Corruption will be cleaned up automatically, and durability is not important: if an interactive installation is interrupted, the user can just repeat it, as it will be fresh on their mind.

  2. Because all packages are co-installable thanks to separate hierarchies, there are no conflicts at the package store level, and no dependency resolution (an optimization problem requiring SAT solving) is required at all.
    In exchange directories, we resolve conflicts by selecting the package with the highest monotonically increasing distri revision number.

  3. distri proves that we can build a useful Linux distribution entirely without hooks and triggers. Not having to serialize hook execution allows us to download packages into the package store with maximum concurrency.

  4. Because we are using images instead of archives, we do not need to unpack anything. This means installing a package is really just writing its package image and metadata to the package store. Sequential writes are typically the fastest kind of storage usage pattern.

Fast installation also make other use-cases more bearable, such as creating disk images, be it for testing them in qemu(1) , booting them on real hardware from a USB drive, or for cloud providers such as Google Cloud.

Note: To saturate links above 1 Gbps, transfer packages without compression. Fast package builder

Contrary to how distribution package builders are usually implemented, the distri package builder does not actually install any packages into the build environment.

Instead, distri makes available a filtered view of the package store (only declared dependencies are available) at /ro in the build environment.

This means that even for large dependency trees, setting up a build environment happens in a fraction of a second! Such a low latency really makes a difference in how comfortable it is to iterate on distribution packages.

Package stores

In distri, package images are installed from a remote package store into the local system package store /roimg, which backs the /ro mount.

A package store is implemented as a directory of package images and their associated metadata files.

You can easily make available a package store by using distri export.

To provide a mirror for your local network, you can periodically distri update from the package store you want to mirror, and then distri export your local copy. Special tooling (e.g. debmirror in Debian) is not required because distri install is atomic (and update uses install).

Producing derivatives is easy: just add your own packages to a copy of the package store.

The package store is intentionally kept simple to manage and distribute. Its files could be exchanged via peer-to-peer file systems, or synchronized from an offline medium.

distri’s first release

distri works well enough to demonstrate the ideas explained above. I have branched this state into branch jackherer, distri’s first release code name. This way, I can keep experimenting in the distri repository without breaking your installation.

From the branch contents, our autobuilder creates:

  1. disk images, which…

  2. a package repository. Installations can pick up new packages with distri update.

  3. documentation for the release.

The project website can be found at https://distr1.org. The website is just the README for now, but we can improve that later.

The repository can be found at https://github.com/distr1/distri

Project outlook

Right now, distri is mainly a vehicle for my spare-time Linux distribution research. I don’t recommend anyone use distri for anything but research, and there are no medium-term plans of that changing. At the very least, please contact me before basing anything serious on distri so that we can talk about limitations and expectations.

I expect the distri project to live for as long as I have blog posts to publish, and we’ll see what happens afterwards. Note that this is a hobby for me: I will continue to explore, at my own pace, parts that I find interesting.

My hope is that established distributions might get a useful idea or two from distri.

There’s more to come: subscribe to the distri feed

I don’t want to make this post too long, but there is much more!

Please subscribe to the following URL in your feed reader to get all posts about distri:

https://michael.stapelberg.ch/posts/tags/distri/feed.xml

Next in my queue are articles about hermetic packages and good package maintainer experience (including declarative packaging).

Feedback or questions?

I’d love to discuss these ideas in case you’re interested!

Please send feedback to the distri mailing list so that everyone can participate!

Michael Stapelberg: Linux distributions: Can we do without hooks and triggers?

Wednesday 12th of August 2020 08:07:23 AM

Hooks are an extension feature provided by all package managers that are used in larger Linux distributions. For example, Debian uses apt, which has various maintainer scripts. Fedora uses rpm, which has scriptlets. Different package managers use different names for the concept, but all of them offer package maintainers the ability to run arbitrary code during package installation and upgrades. Example hook use cases include adding daemon user accounts to your system (e.g. postgres), or generating/updating cache files.

Triggers are a kind of hook which run when other packages are installed. For example, on Debian, the man(1) package comes with a trigger which regenerates the search database index whenever any package installs a manpage. When, for example, the nginx(8) package is installed, a trigger provided by the man(1) package runs.

Over the past few decades, Open Source software has become more and more uniform: instead of each piece of software defining its own rules, a small number of build systems are now widely adopted.

Hence, I think it makes sense to revisit whether offering extension via hooks and triggers is a net win or net loss.

Hooks preclude concurrent package installation

Package managers commonly can make very little assumptions about what hooks do, what preconditions they require, and which conflicts might be caused by running multiple package’s hooks concurrently.

Hence, package managers cannot concurrently install packages. At least the hook/trigger part of the installation needs to happen in sequence.

While it seems technically feasible to retrofit package manager hooks with concurrency primitives such as locks for mutual exclusion between different hook processes, the required overhaul of all hooks¹ seems like such a daunting task that it might be better to just get rid of the hooks instead. Only deleting code frees you from the burden of maintenance, automated testing and debugging.

① In Debian, there are 8620 non-generated maintainer scripts, as reported by find shard*/src/*/debian -regex ".*\(pre\|post\)\(inst\|rm\)$" on a Debian Code Search instance.

Triggers slow down installing/updating other packages

Personally, I never use the apropos(1) command, so I don’t appreciate the man(1) package’s trigger which updates the database used by apropos(1). The process takes a long time and, because hooks and triggers must be executed serially (see previous section), blocks my installation or update.

When I tell people this, they are often surprised to learn about the existance of the apropos(1) command. I suggest adopting an opt-in model.

Unnecessary work if programs are not used between updates

Hooks run when packages are installed. If a package’s contents are not used between two updates, running the hook in the first update could have been skipped. Running the hook lazily when the package contents are used reduces unnecessary work.

As a welcome side-effect, lazy hook evaluation automatically makes the hook work in operating system images, such as live USB thumb drives or SD card images for the Raspberry Pi. Such images must not ship the same crypto keys (e.g. OpenSSH host keys) to all machines, but instead generate a different key on each machine.

Why do users keep packages installed they don’t use? It’s extra work to remember and clean up those packages after use. Plus, users might not realize or value that having fewer packages installed has benefits such as faster updates.

I can also imagine that there are people for whom the cost of re-installing packages incentivizes them to just keep packages installed—you never know when you might need the program again…

Implemented in an interpreted language

While working on hermetic packages (more on that in another blog post), where the contained programs are started with modified environment variables (e.g. PATH) via a wrapper bash script, I noticed that the overhead of those wrapper bash scripts quickly becomes significant. For example, when using the excellent magit interface for Git in Emacs, I encountered second-long delays² when using hermetic packages compared to standard packages. Re-implementing wrappers in a compiled language provided a significant speed-up.

Similarly, getting rid of an extension point which mandates using shell scripts allows us to build an efficient and fast implementation of a predefined set of primitives, where you can reason about their effects and interactions.

② magit needs to run git a few times for displaying the full status, so small overhead quickly adds up.

Incentivizing more upstream standardization

Hooks are an escape hatch for distribution maintainers to express anything which their packaging system cannot express.

Distributions should only rely on well-established interfaces such as autoconf’s classic ./configure && make && make install (including commonly used flags) to build a distribution package. Integrating upstream software into a distribution should not require custom hooks. For example, instead of requiring a hook which updates a cache of schema files, the library used to interact with those files should transparently (re-)generate the cache or fall back to a slower code path.

Distribution maintainers are hard to come by, so we should value their time. In particular, there is a 1:n relationship of packages to distribution package maintainers (software is typically available in multiple Linux distributions), so it makes sense to spend the work in the 1 and have the n benefit.

Can we do without them?

If we want to get rid of hooks, we need another mechanism to achieve what we currently achieve with hooks.

If the hook is not specific to the package, it can be moved to the package manager. The desired system state should either be derived from the package contents (e.g. required system users can be discovered from systemd service files) or declaratively specified in the package build instructions—more on that in another blog post. This turns hooks (arbitrary code) into configuration, which allows the package manager to collapse and sequence the required state changes. E.g., when 5 packages are installed which each need a new system user, the package manager could update /etc/passwd just once.

If the hook is specific to the package, it should be moved into the package contents. This typically means moving the functionality into the program start (or the systemd service file if we are talking about a daemon). If (while?) upstream is not convinced, you can either wrap the program or patch it. Note that this case is relatively rare: I have worked with hundreds of packages and the only package-specific functionality I came across was automatically generating host keys before starting OpenSSH’s sshd(8)³.

There is one exception where moving the hook doesn’t work: packages which modify state outside of the system, such as bootloaders or kernel images.

③ Even that can be moved out of a package-specific hook, as Fedora demonstrates.

Conclusion

Global state modifications performed as part of package installation today use hooks, an overly expressive extension mechanism.

Instead, all modifications should be driven by configuration. This is feasible because there are only a few different kinds of desired state modifications. This makes it possible for package managers to optimize package installation.

Michael Stapelberg: Optional dependencies don’t work

Wednesday 12th of August 2020 08:07:23 AM

In the i3 projects, we have always tried hard to avoid optional dependencies. There are a number of reasons behind it, and as I have recently encountered some of the downsides of optional dependencies firsthand, I summarized my thoughts in this article.

What is a (compile-time) optional dependency?

When building software from source, most programming languages and build systems support conditional compilation: different parts of the source code are compiled based on certain conditions.

An optional dependency is conditional compilation hooked up directly to a knob (e.g. command line flag, configuration file, …), with the effect that the software can now be built without an otherwise required dependency.

Let’s walk through a few issues with optional dependencies.

Inconsistent experience in different environments

Software is usually not built by end users, but by packagers, at least when we are talking about Open Source.

Hence, end users don’t see the knob for the optional dependency, they are just presented with the fait accompli: their version of the software behaves differently than other versions of the same software.

Depending on the kind of software, this situation can be made obvious to the user: for example, if the optional dependency is needed to print documents, the program can produce an appropriate error message when the user tries to print a document.

Sometimes, this isn’t possible: when i3 introduced an optional dependency on cairo and pangocairo, the behavior itself (rendering window titles) worked in all configurations, but non-ASCII characters might break depending on whether i3 was compiled with cairo.

For users, it is frustrating to only discover in conversation that a program has a feature that the user is interested in, but it’s not available on their computer. For support, this situation can be hard to detect, and even harder to resolve to the user’s satisfaction.

Packaging is more complicated

Unfortunately, many build systems don’t stop the build when optional dependencies are not present. Instead, you sometimes end up with a broken build, or, even worse: with a successful build that does not work correctly at runtime.

This means that packagers need to closely examine the build output to know which dependencies to make available. In the best case, there is a summary of available and enabled options, clearly outlining what this build will contain. In the worst case, you need to infer the features from the checks that are done, or work your way through the --help output.

The better alternative is to configure your build system such that it stops when any dependency was not found, and thereby have packagers acknowledge each optional dependency by explicitly disabling the option.

Untested code paths bit rot

Code paths which are not used will inevitably bit rot. If you have optional dependencies, you need to test both the code path without the dependency and the code path with the dependency. It doesn’t matter whether the tests are automated or manual, the test matrix must cover both paths.

Interestingly enough, this principle seems to apply to all kinds of software projects (but it slows down as change slows down): one might think that important Open Source building blocks should have enough users to cover all sorts of configurations.

However, consider this example: building cairo without libxrender results in all GTK application windows, menus, etc. being displayed as empty grey surfaces. Cairo does not fail to build without libxrender, but the code path clearly is broken without libxrender.

Can we do without them?

I’m not saying optional dependencies should never be used. In fact, for bootstrapping, disabling dependencies can save a lot of work and can sometimes allow breaking circular dependencies. For example, in an early bootstrapping stage, binutils can be compiled with --disable-nls to disable internationalization.

However, optional dependencies are broken so often that I conclude they are overused. Read on and see for yourself whether you would rather commit to best practices or not introduce an optional dependency.

Best practices

If you do decide to make dependencies optional, please:

  1. Set up automated testing for all code path combinations.
  2. Fail the build until packagers explicitly pass a --disable flag.
  3. Tell users their version is missing a dependency at runtime, e.g. in --version.

Michael Stapelberg: Hermetic packages (in distri)

Wednesday 12th of August 2020 08:07:23 AM

In distri, packages (e.g. emacs) are hermetic. By hermetic, I mean that the dependencies a package uses (e.g. libusb) don’t change, even when newer versions are installed.

For example, if package libusb-amd64-1.0.22-7 is available at build time, the package will always use that same version, even after the newer libusb-amd64-1.0.23-8 will be installed into the package store.

Another way of saying the same thing is: packages in distri are always co-installable.

This makes the package store more robust: additions to it will not break the system. On a technical level, the package store is implemented as a directory containing distri SquashFS images and metadata files, into which packages are installed in an atomic way.

Out of scope: plugins are not hermetic by design

One exception where hermeticity is not desired are plugin mechanisms: optionally loading out-of-tree code at runtime obviously is not hermetic.

As an example, consider glibc’s Name Service Switch (NSS) mechanism. Page 29.4.1 Adding another Service to NSS describes how glibc searches $prefix/lib for shared libraries at runtime.

Debian ships about a dozen NSS libraries for a variety of purposes, and enterprise setups might add their own into the mix.

systemd (as of v245) accounts for 4 NSS libraries, e.g. nss-systemd for user/group name resolution for users allocated through systemd’s DynamicUser= option.

Having packages be as hermetic as possible remains a worthwhile goal despite any exceptions: I will gladly use a 99% hermetic system over a 0% hermetic system any day.

Side note: Xorg’s driver model (which can be characterized as a plugin mechanism) does not fall under this category because of its tight API/ABI coupling! For this case, where drivers are only guaranteed to work with precisely the Xorg version for which they were compiled, distri uses per-package exchange directories.

Implementation of hermetic packages in distri

On a technical level, the requirement is: all paths used by the program must always result in the same contents. This is implemented in distri via the read-only package store mounted at /ro, e.g. files underneath /ro/emacs-amd64-26.3-15 never change.

To change all paths used by a program, in practice, three strategies cover most paths:

ELF interpreter and dynamic libraries

Programs on Linux use the ELF file format, which contains two kinds of references:

First, the ELF interpreter (PT_INTERP segment), which is used to start the program. For dynamically linked programs on 64-bit systems, this is typically ld.so(8).

Many distributions use system-global paths such as /lib64/ld-linux-x86-64.so.2, but distri compiles programs with -Wl,--dynamic-linker=/ro/glibc-amd64-2.31-4/out/lib/ld-linux-x86-64.so.2 so that the full path ends up in the binary.

The ELF interpreter is shown by file(1), but you can also use readelf -a $BINARY | grep 'program interpreter' to display it.

And secondly, the rpath, a run-time search path for dynamic libraries. Instead of storing full references to all dynamic libraries, we set the rpath so that ld.so(8) will find the correct dynamic libraries.

Originally, we used to just set a long rpath, containing one entry for each dynamic library dependency. However, we have since switched to using a single lib subdirectory per package as its rpath, and placing symlinks with full path references into that lib directory, e.g. using -Wl,-rpath=/ro/grep-amd64-3.4-4/lib. This is better for performance, as ld.so uses a per-directory cache.

Note that program load times are significantly influenced by how quickly you can locate the dynamic libraries. distri uses a FUSE file system to load programs from, so getting proper -ENOENT caching into place drastically sped up program load times.

Instead of compiling software with the -Wl,--dynamic-linker and -Wl,-rpath flags, one can also modify these fields after the fact using patchelf(1). For closed-source programs, this is the only possibility.

The rpath can be inspected by using e.g. readelf -a $BINARY | grep RPATH.

Environment variable setup wrapper programs

Many programs are influenced by environment variables: to start another program, said program is often found by checking each directory in the PATH environment variable.

Such search paths are prevalent in scripting languages, too, to find modules. Python has PYTHONPATH, Perl has PERL5LIB, and so on.

To set up these search path environment variables at run time, distri employs an indirection. Instead of e.g. teensy-loader-cli, you run a small wrapper program that calls precisely one execve system call with the desired environment variables.

Initially, I used shell scripts as wrapper programs because they are easily inspectable. This turned out to be too slow, so I switched to compiled programs. I’m linking them statically for fast startup, and I’m linking them against musl libc for significantly smaller file sizes than glibc (per-executable overhead adds up quickly in a distribution!).

Note that the wrapper programs prepend to the PATH environment variable, they don’t replace it in its entirely. This is important so that users have a way to extend the PATH (and other variables) if they so choose. This doesn’t hurt hermeticity because it is only relevant for programs that were not present at build time, i.e. plugin mechanisms which, by design, cannot be hermetic.

Shebang interpreter patching

The Shebang of scripts contains a path, too, and hence needs to be changed.

We don’t do this in distri yet (the number of packaged scripts is small), but we should.

Performance requirements

The performance improvements in the previous sections are not just good to have, but practically required when many processes are involved: without them, you’ll encounter second-long delays in magit which spawns many git processes under the covers, or in dracut, which spawns one cp(1) process per file.

Downside: rebuild of packages required to pick up changes

Linux distributions such as Debian consider it an advantage to roll out security fixes to the entire system by updating a single shared library package (e.g. openssl).

The flip side of that coin is that changes to a single critical package can break the entire system.

With hermetic packages, all reverse dependencies must be rebuilt when a library’s changes should be picked up by the whole system. E.g., when openssl changes, curl must be rebuilt to pick up the new version of openssl.

This approach trades off using more bandwidth and more disk space (temporarily) against reducing the blast radius of any individual package update.

Downside: long env variables are cumbersome to deal with

This can be partially mitigated by removing empty directories at build time, which will result in shorter variables.

In general, there is no getting around this. One little trick is to use tr : '\n', e.g.:

distri0# echo $PATH /usr/bin:/bin:/usr/sbin:/sbin:/ro/openssh-amd64-8.2p1-11/out/bin distri0# echo $PATH | tr : '\n' /usr/bin /bin /usr/sbin /sbin /ro/openssh-amd64-8.2p1-11/out/bin Edge cases

The implementation outlined above works well in hundreds of packages, and only a small handful exhibited problems of any kind. Here are some issues I encountered:

Issue: accidental ABI breakage in plugin mechanisms

NSS libraries built against glibc 2.28 and newer cannot be loaded by glibc 2.27. In all likelihood, such changes do not happen too often, but it does illustrate that glibc’s published interface spec is not sufficient for forwards and backwards compatibility.

In distri, we could likely use a per-package exchange directory for glibc’s NSS mechanism to prevent the above problem from happening in the future.

Issue: wrapper bypass when a program re-executes itself

Some programs try to arrange for themselves to be re-executed outside of their current process tree. For example, consider building a program with the meson build system:

  1. When meson first configures the build, it generates ninja files (think Makefiles) which contain command lines that run the meson --internal helper.

  2. Once meson returns, ninja is called as a separate process, so it will not have the environment which the meson wrapper sets up. ninja then runs the previously persisted meson command line. Since the command line uses the full path to meson (not to its wrapper), it bypasses the wrapper.

Luckily, not many programs try to arrange for other process trees to run them. Here is a table summarizing how affected programs might try to arrange for re-execution, whether the technique results in a wrapper bypass, and what we do about it in distri:

technique to execute itself uses wrapper mitigation run-time: find own basename in PATH yes wrapper program compile-time: embed expected path no; bypass! configure or patch run-time: argv[0] or /proc/self/exe no; bypass! patch

One might think that setting argv[0] to the wrapper location seems like a way to side-step this problem. We tried doing this in distri, but had to revert and go the other way.

Misc smaller issues Appendix: Could other distributions adopt hermetic packages?

At a very high level, adopting hermetic packages will require two steps:

  1. Using fully qualified paths whose contents don’t change (e.g. /ro/emacs-amd64-26.3-15) generally requires rebuilding programs, e.g. with --prefix set.

  2. Once you use fully qualified paths you need to make the packages able to exchange data. distri solves this with exchange directories, implemented in the /ro file system which is backed by a FUSE daemon.

The first step is pretty simple, whereas the second step is where I expect controversy around any suggested mechanism.

Appendix: demo (in distri)

This appendix contains commands and their outputs, run on upcoming distri version supersilverhaze, but verified to work on older versions, too.

Large outputs have been collapsed and can be expanded by clicking on the output.

The /bin directory contains symlinks for the union of all package’s bin subdirectories:

distri0# readlink -f /bin/teensy_loader_cli/ro/teensy-loader-cli-amd64-2.1+g20180927-7/bin/teensy_loader_cli

The wrapper program in the bin subdirectory is small:

distri0# ls -lh $(readlink -f /bin/teensy_loader_cli)-rwxr-xr-x 1 root root 46K Apr 21 21:56 /ro/teensy-loader-cli-amd64-2.1+g20180927-7/bin/teensy_loader_cli

Wrapper programs execute quickly:

distri0# strace -fvy /bin/teensy_loader_cli |& head | cat -n 1 execve("/bin/teensy_loader_cli", ["/bin/teensy_loader_cli"], ["USER=root", "LOGNAME=root", "HOME=/root", "PATH=/ro/bash-amd64-5.0-4/bin:/r"..., "SHELL=/bin/zsh", "TERM=screen.xterm-256color", "XDG_SESSION_ID=c1", "XDG_RUNTIME_DIR=/run/user/0", "DBUS_SESSION_BUS_ADDRESS=unix:pa"..., "XDG_SESSION_TYPE=tty", "XDG_SESSION_CLASS=user", "SSH_CLIENT=10.0.2.2 42556 22", "SSH_CONNECTION=10.0.2.2 42556 10"..., "SSHTTY=/dev/pts/0", "SHLVL=1", "PWD=/root", "OLDPWD=/root", "=/usr/bin/strace", "LD_LIBRARY_PATH=/ro/bash-amd64-5"..., "PERL5LIB=/ro/bash-amd64-5.0-4/ou"..., "PYTHONPATH=/ro/bash-amd64-5.b0-4/"...]) = 0 2 arch_prctl(ARCH_SET_FS, 0x40c878) = 0 3 set_tid_address(0x40ca9c) = 715 4 brk(NULL) = 0x15b9000 5 brk(0x15ba000) = 0x15ba000 6 brk(0x15bb000) = 0x15bb000 7 brk(0x15bd000) = 0x15bd000 8 brk(0x15bf000) = 0x15bf000 9 brk(0x15c1000) = 0x15c1000 10 execve("/ro/teensy-loader-cli-amd64-2.1+g20180927-7/out/bin/teensy_loader_cli", ["/ro/teensy-loader-cli-amd64-2.1+"...], ["USER=root", "LOGNAME=root", "HOME=/root", "PATH=/ro/bash-amd64-5.0-4/bin:/r"..., "SHELL=/bin/zsh", "TERM=screen.xterm-256color", "XDG_SESSION_ID=c1", "XDG_RUNTIME_DIR=/run/user/0", "DBUS_SESSION_BUS_ADDRESS=unix:pa"..., "XDG_SESSION_TYPE=tty", "XDG_SESSION_CLASS=user", "SSH_CLIENT=10.0.2.2 42556 22", "SSH_CONNECTION=10.0.2.2 42556 10"..., "SSHTTY=/dev/pts/0", "SHLVL=1", "PWD=/root", "OLDPWD=/root", "=/usr/bin/strace", "LD_LIBRARY_PATH=/ro/bash-amd64-5"..., "PERL5LIB=/ro/bash-amd64-5.0-4/ou"..., "PYTHONPATH=/ro/bash-amd64-5.0-4/"...]) = 0

Confirm which ELF interpreter is set for a binary using readelf(1):

distri0# readelf -a /ro/teensy-loader-cli-amd64-2.1+g20180927-7/out/bin/teensy_loader_cli | grep 'program interpreter'[Requesting program interpreter: /ro/glibc-amd64-2.31-4/out/lib/ld-linux-x86-64.so.2]

Confirm the rpath is set to the package’s lib subdirectory using readelf(1):

distri0# readelf -a /ro/teensy-loader-cli-amd64-2.1+g20180927-7/out/bin/teensy_loader_cli | grep RPATH 0x000000000000000f (RPATH) Library rpath: [/ro/teensy-loader-cli-amd64-2.1+g20180927-7/lib]

…and verify the lib subdirectory has the expected symlinks and target versions:

distri0# find /ro/teensy-loader-cli-amd64-*/lib -type f -printf '%P -> %l\n'libc.so.6 -> /ro/glibc-amd64-2.31-4/out/lib/libc-2.31.solibpthread.so.0 -> /ro/glibc-amd64-2.31-4/out/lib/libpthread-2.31.so librt.so.1 -> /ro/glibc-amd64-2.31-4/out/lib/librt-2.31.so libudev.so.1 -> /ro/libudev-amd64-245-11/out/lib/libudev.so.1.6.17 libusb-0.1.so.4 -> /ro/libusb-compat-amd64-0.1.5-7/out/lib/libusb-0.1.so.4.4.4 libusb-1.0.so.0 -> /ro/libusb-amd64-1.0.23-8/out/lib/libusb-1.0.so.0.2.0

To verify the correct libraries are actually loaded, you can set the LD_DEBUG environment variable for ld.so(8):

distri0# LD_DEBUG=libs teensy_loader_cli[…] 678: find library=libc.so.6 [0]; searching 678: search path=/ro/teensy-loader-cli-amd64-2.1+g20180927-7/lib (RPATH from file /ro/teensy-loader-cli-amd64-2.1+g20180927-7/out/bin/teensy_loader_cli) 678: trying file=/ro/teensy-loader-cli-amd64-2.1+g20180927-7/lib/libc.so.6 678: […]

NSS libraries that distri ships:

find /lib/ -name "libnss_*.so.2" -type f -printf '%P -> %l\n'libnss_myhostname.so.2 -> ../systemd-amd64-245-11/out/lib/libnss_myhostname.so.2libnss_mymachines.so.2 -> ../systemd-amd64-245-11/out/lib/libnss_mymachines.so.2 libnss_resolve.so.2 -> ../systemd-amd64-245-11/out/lib/libnss_resolve.so.2 libnss_systemd.so.2 -> ../systemd-amd64-245-11/out/lib/libnss_systemd.so.2 libnss_compat.so.2 -> ../glibc-amd64-2.31-4/out/lib/libnss_compat.so.2 libnss_db.so.2 -> ../glibc-amd64-2.31-4/out/lib/libnss_db.so.2 libnss_dns.so.2 -> ../glibc-amd64-2.31-4/out/lib/libnss_dns.so.2 libnss_files.so.2 -> ../glibc-amd64-2.31-4/out/lib/libnss_files.so.2 libnss_hesiod.so.2 -> ../glibc-amd64-2.31-4/out/lib/libnss_hesiod.so.2

Dirk Eddelbuettel: RcppSimdJson 0.1.1: More Features

Tuesday 11th of August 2020 10:15:00 PM

A first update following for the exciting RcppSimdJson 0.1.0 release last month is now on CRAN. Version 0.1.1 brings further enhancements such direct parsing of raw chars, working with compressed files as well as much expanded querying ability all thanks to Brendan, some improvements to our demos thanks to Daniel as well as a small fix via a one-liner borrowed from upstream for a reported UBSAN issue.

RcppSimdJson wraps the fantastic and genuinely impressive simdjson library by Daniel Lemire and collaborators. Via very clever algorithmic engineering to obtain largely branch-free code, coupled with modern C++ and newer compiler instructions, it results in parsing gigabytes of JSON parsed per second which is quite mindboggling. The best-case performance is ‘faster than CPU speed’ as use of parallel SIMD instructions and careful branch avoidance can lead to less than one cpu cycle use per byte parsed; see the video of the talk by Daniel Lemire at QCon (also voted best talk).

The detailed list of changes follows.

Changes in version 0.1.1 (2020-08-10)
  • Corrected incorrect file deletion when mixing local and remote files (Brendan in #34) closing #33.

  • Added support for raw vectors, compressed files, and compressed downloads (Dirk and Brendan in #36, #39, and #45 closing #35 and addressing issues raised in #40 and #44).

  • Examples in two demos are now more self-sufficient (Daniel Lemire and Dirk in #42).

  • Expanded query functionality to include single, flat, and nested queries (Brendan in #45 closing #43).

  • Split error handling parameters from error_ok/on_error into parse_error_ok/on_parse_error and query_error_ok/on_query_error (Brendan in #45).

  • One-line upstream change to address sanitizer error on cast.

Courtesy of CRANberries, there is also a diffstat report for this release.

For questions, suggestions, or issues please use the issue tracker at the GitHub repo.

If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Jonathan Carter: GameMode in Debian

Tuesday 11th of August 2020 03:35:49 PM

What is GameMode, what does it do?

About two years ago, I ran into some bugs running a game on Debian, so installed Windows 10 on a spare computer and ran it on there. I learned that when you launch a game in Windows 10, it automatically disables notifications, screensaver, reduces power saving measures and gives the game maximum priority. I thought “Oh, that’s actually quite nice, but we probably won’t see that kind of integration on Linux any time soon”. The very next week, I read the initial announcement of GameMode, a tool from Feral Interactive that does a bunch of tricks to maximise performance for games running on Linux.

When GameMode is invoked it:

  • Sets the kernel performance governor from ‘powersave’ to ‘performance’
  • Provides I/O priority to the game process
  • Optionally sets nice value to the game process
  • Inhibits the screensaver
  • Tweak the kernel scheduler to enable soft real-time capabilities (handled by the MuQSS kernel scheduler, if available in your kernel)
  • Sets GPU performance mode (NVIDIA and AMD)
  • Attempts GPU overclocking (on supported NVIDIA cards)
  • Runs custom pre/post run scripts. You might want to run a script to disable your ethereum mining or suspend VMs when you start a game and resume it all once you quit.

How GameMode is invoked

Some newer games (proprietary games like “Rise of the Tomb Raider”, “Total War Saga: Thrones of Britannia”, “Total War: WARHAMMER II”, “DiRT 4” and “Total War: Three Kingdoms”) will automatically invoke GameMode if it’s installed. For games that don’t, you can manually evoke it using the gamemoderun command.

Lutris is a tool that makes it easy to install and run games on Linux, and it also integrates with GameMode. (Lutris is currently being packaged for Debian, hopefully it will make it in on time for Bullseye).

Screenshot of Lutris, a tool that makes it easy to install your non-Linux games, which also integrates with GameMode.

GameMode in Debian

The latest GameMode is packaged in Debian (Stephan Lachnit and I maintain it in the Debian Games Team) and it’s also available for Debian 10 (Buster) via buster-backports. All you need to do to get up and running with GameMode is to install the ‘gamemode’ package.

GameMode in Debian supports 64 bit and 32 bit mode, so running it with older games (and many proprietary games) still work. Some distributions (like Arch Linux), have dropped 32 bit support, so 32 bit games on such systems lose any kind of integration with GameMode even if you can get those games running via other wrappers on such systems.

We also include a binary called ‘gamemode-simulate-game’ (installed under /usr/games/). This is a minimalistic program that will invoke gamemode automatically for 10 seconds and then exit without an error if it was successful. Its source code might be useful if you’d like to add GameMode support to your game, or patch a game in Debian to automatically invoke it.

In Debian we install Gamemode’s example config file to /etc/gamemode.ini where a user can customise their system-wide preferences, or alternatively they can place a copy of that in ~/.gamemode.ini with their personal preferences. In this config file, you can also choose to explicitly allow or deny games.

GameMode might also be useful for many pieces of software that aren’t games. I haven’t done any benchmarks on such software yet, but it might be great for users who use CAD programs or use a combination of their CPU/GPU to crunch a large amount of data.

I’ve also packaged an extension for GNOME called gamemode-extension. The Debian package is called ‘gnome-shell-extension-gamemode’. You’ll need to enable it using gnome-tweaks after installation, it will then display a green controller in your notification area whenever GameMode is active. It’s only in testing/bullseye since it relies on a newer gnome-shell than what’s available in buster.

Running gamemode-simulate-game, with the shell extension showing that it’s activated in the top left corner.

Mike Gabriel: No Debian LTS Work in July 2020

Tuesday 11th of August 2020 03:16:00 PM

In July 2020, I was originally assigned 8h of work on Debian LTS as a paid contributor, but holiday season overwhelmed me and I did not do any LTS work, at all.

The assigned hours from July I have taken with me into August 2020.

light+love,
Mike

More in Tux Machines

IBM/Red Hat/Fedora Leftovers

  • The modern developer experience

    We hear from many clients that developer productivity and efficiency continue to be pain points. Cloud adoption can help normalize developer experiences across application stacks and runtimes. The path and steps for your developers to push code should be clear, simple, and easy to implement, even on Day 1. The modern developer experience provides a unified and normalized practice with modern tools. Developers thrive in the inner loop where unit tests and code come together, and in a penalty-free runtime execution environment where no one gets hurt, no processes take down precious workloads, and no one knows that it took 20 minutes to resolve that pesky runtime error. The inner loop occurs in a developer workspace that is easy to set up, manage, prepare, maintain, and, more importantly, easy to allocate. If a new developer is added to your squad, they can have all of the mechanical things they need to push code changes into the pipeline on their first day. An important part of the modern developer experience is expressed as Red Hat CodeReady Workspaces, which provides a set of constructs to provision a developer workspace in the cloud where they can perform their inner loop. A save action to a workspace file initiates an inner loop build in their local workspace, and an endpoint for the developer to see their changes quickly.

  • Call for Code Daily: Grillo, and how your code can help

    The power of Call for Code® is in the global community that we have built around this major #TechforGood initiative. Whether it is the deployments that are underway across pivotal projects, developers leveraging the starter kits in the cloud, or ecosystem partners joining the fight, everyone has a story to tell. Call for Code Daily highlights all the amazing #TechforGood stories taking place around the world. Every day, you can count on us to share these stories with you. Check out the stories from the week of August 10th:

  • Culture of Innovation and Collaboration: Hybrid Cloud, Privacy in AI and Data Caching

    Red Hat is continually innovating and part of that innovation includes researching and striving to solve the problems our customers face. That innovation is driven through the Office of the CTO and includes OpenShift, OpenShift Container Storage and use cases such as the hybrid cloud, privacy concerns in AI, and data caching. We recently interviewed Hugh Brock, research director for the office of the CTO, here at Red Hat about these very topics.

  • Fedora program update: 2020-33

    Here’s your report of what has happened in Fedora this week. Fedora 33 has branched from Rawhide. Please update the Release Readiness page with your team’s status. I have weekly office hours in #fedora-meeting-1. Drop by if you have any questions or comments about the schedule, Changes, elections, or anything else.

  • Fedora Magazine: Come test a new release of pipenv, the Python development tool

    Pipenv is a tool that helps Python developers maintain isolated virtual environments with specifacally defined set of dependencies to achieve reproducible development and deployment environments. It is similar to tools for different programming languages, such as bundler, composer, npm, cargo, yarn, etc. A new version of pipenv, 2020.6.2, has been recently released. It is now available in Fedora 33 and rawhide. For older Fedoras, the maintainers decided to package it in COPR to be tested first. So come try it out, before they push it into stable Fedora versions. The new version doesn’t bring any fancy new features, but after two years of development it fixes a lot of problems and does many things differently under the hood. What worked for you previously should continue to work, but might behave slightly differently.

  • Introduction to cloud-native CI/CD with Tekton (KubeCon Europe 2020)

    If you’re interested in cloud-native CI/CD and Tekton but haven’t had a chance to get hands-on with the technology yet, the KubeCon Europe Virtual event provides an opportunity to do that. Tekton is a powerful and flexible open source framework for creating cloud-native CI/CD pipelines. It integrates with Kubernetes and allows developers to build, test, and deploy across multiple cloud providers and on-premises clusters as shown in Figure 1.

  • Introduction to Strimzi: Apache Kafka on Kubernetes (KubeCon Europe 2020)

    Apache Kafka has emerged as the leading platform for building real-time data pipelines. Born as a messaging system, mainly for the publish/subscribe pattern, Kafka has established itself as a data-streaming platform for processing data in real-time. Today, Kafka is also heavily used for developing event-driven applications, enabling the services in your infrastructure to communicate with each other through events using Apache Kafka as the backbone. Meanwhile, cloud-native application development is gathering more traction thanks to Kubernetes. Thanks to the abstraction layer provided by this platform, it’s easy to move your applications from running on bare metal to any cloud provider (AWS, Azure, GCP, IBM, and so on) enabling hybrid-cloud scenarios as well. But how do you move your Apache Kafka workloads to the cloud? It’s possible, but it’s not simple. You could learn all of the Apache Kafka tools for handling a cluster well enough to move your Kafka workloads to Kubernetes, or you could leverage the Kubernetes knowledge you already have using Strimzi.

  • OpenShift for Kubernetes developers: Getting started

    If you are familiar with containers and Kubernetes, you have likely heard of the enterprise features that Red Hat OpenShift brings to this platform. In this article, I introduce developers familiar with Kubernetes to OpenShift’s command-line features and native extension API resources, including build configurations, deployment configurations, and image streams.

  • Man-DB Brings Documentation to IBM i

    IBM i developers who have a question about how a particular command or feature works in open source packages now have an easy way to look up documentations, thanks to the addition of support for the Man-DB utility in IBM i, which IBM unveiled in late July. Man-DB is an open source implementation of the standard Unix documentation system. It provides a mechanism for easily accessing the documentation that exists for open source packages, such as the Node.js language, or even for commands, like Curl. The software, which can be installed via YUM, only works with open source software on IBM i at the moment; it doesn’t support native programs or commands.

  • Making open decisions in five steps

    The group's leader made a decision, and everyone else accepted it. The leader may have been a manager, a team lead, or the alpha in a social group. Was that decision the best one for the group? Did it take all relevant factors into account? It didn’t really matter, because people didn’t want to buck authority and face the ramifications. But this behavior was typical of life in hierarchical systems.

  • 7 tips for giving and receiving better feedback

Wine 5.15 and Beyond

  • Wine Announcement
    The Wine development release 5.15 is now available.
    
    
    
    
    What's new in this release (see below for details):
      - Initial implementation of the XACT Engine libraries.
      - Beginnings of a math library in MSVCRT based on Musl.
      - Still more restructuration of the console support.
      - Direct Input performance improvements.
      - Exception handling fixes on x86-64.
      - Various bug fixes.
    
    
    
    
    The source is available from the following locations:
    
    
    
    
      https://dl.winehq.org/wine/source/5.x/wine-5.15.tar.xz
      http://mirrors.ibiblio.org/wine/source/5.x/wine-5.15.tar.xz
    
    
    
    
    Binary packages for various distributions will be available from:
    
    
    
    
      https://www.winehq.org/download
    
    
    
    
    You will find documentation on https://www.winehq.org/documentation
    
    
    
    
    You can also get the current source directly from the git
    repository. Check https://www.winehq.org/git for details.
    
    
    
    
    Wine is available thanks to the work of many people. See the file
    AUTHORS in the distribution for the complete list.
    
  • Wine 5.15 Release Brings Initial Work On XACT Engine Libraries

    Wine 5.15 is out as the latest bi-weekly development snapshot for this program allowing Windows games/applications to generally run quite gracefully on Linux and other platforms. 

  •        
  • Wine Developer Begins Experimenting With macOS ARM64 Support

    Over the months ahead with Apple preparing future desktops/laptops with their in-house Apple silicon built on the ARM 64-bit architecture, Wine developers are beginning to eye how to support these future 64-bit ARM systems with macOS Big Sur.  Wine developer Martin Storsjo has been experimenting with the macOS + ARM64 support and has got the code along far enough that "small test executables" can run on the patched copy of Wine. 

Wandboard IMX8M-Plus SBC debuts AI-enabled i.MX8M Plus

echNexion’s “Wandboard IMX8M-Plus” SBC runs Linux or Android on NXP’s new i.MX8M Plus with 2.3-TOPS NPU. Pre-orders go for $134 with 2GB RAM or $159 with 4GB and WiFi/BT, both with 32GB and M.2 with NVMe. In January, NXP announced its i.MX8M Plus — its first i.MX8 SoC with an NPU for AI acceleration — but so far the only product we’ve seen based on it is a briefly teased Verdin iMX8M Plus module from Toradex. Now, TechNexion has opened pre-orders for a Wandboard IMX8M-Plus SBC based on a SODIMM-style “EDM SOM” module equipped with the i.MX8M Plus. Read more

Kernel: Linux 5.9 Changes and Linux Plumbers Passes

  • Linux 5.9 Brings Safeguard Following NVIDIA's Recent "GPL Condom" Incident

    Stemming from the recent discussions over NVIDIA NetGPU code that relied on another shim for interfacing between NVIDIA's proprietary driver and the open-source kernel code, a new patch is on the way for Linux 5.9 to fight back against such efforts. As a result of that "NetGPU" code patch series and the ensuing discussion, longtime kernel developer Christoph Hellwig followed through with a set of kernel patches to tighten up access to kernel symbols exported as GPL-only and are frequently used by these open-source "shim" drivers to sit between the open-source kernel code and the binary kernel modules. This situation also known as the "GPL condom" defense is working to be better avoided with Linux 5.9+ kernels.

  • Linux 5.9 Dropping Xen 32-bit PV Guest Support

    Back in Linux 5.4 Xen 32-bit PV guest support was deprecated while now for Linux 5.9 it's set to be removed entirely. Last year's deprecation comes with the 32-bit usage dwindling in general but PVH being preferred to PV, Meltdown mitigations not being present, and the code not seeing much activity. Now for Linux 5.9 that support is being gutted.

  • Final passes for sale for Linux Plumbers

    We hit our registration cap again and have added a few more passes. The final date for purchasing passes is August 19th at 11:59pm PST. If the passes sell out before then we will not be adding more. Thank you all once again for your enthusiasm and we look forward to seeing you August 24-28!