Language Selection

English French German Italian Portuguese Spanish

Planet Debian

Syndicate content
Planet Debian - https://planet.debian.org/
Updated: 2 hours 26 min ago

Michael Stapelberg: Linux distributions: Can we do without hooks and triggers?

Saturday 17th of August 2019 04:47:29 PM

Hooks are an extension feature provided by all package managers that are used in larger Linux distributions. For example, Debian uses apt, which has various maintainer scripts. Fedora uses rpm, which has scriptlets. Different package managers use different names for the concept, but all of them offer package maintainers the ability to run arbitrary code during package installation and upgrades. Example hook use cases include adding daemon user accounts to your system (e.g. postgres), or generating/updating cache files.

Triggers are a kind of hook which run when other packages are installed. For example, on Debian, the man(1) package comes with a trigger which regenerates the search database index whenever any package installs a manpage. When, for example, the nginx(8) package is installed, a trigger provided by the man(1) package runs.

Over the past few decades, Open Source software has become more and more uniform: instead of each piece of software defining its own rules, a small number of build systems are now widely adopted.

Hence, I think it makes sense to revisit whether offering extension via hooks and triggers is a net win or net loss.

Hooks preclude concurrent package installation

Package managers commonly can make very little assumptions about what hooks do, what preconditions they require, and which conflicts might be caused by running multiple package’s hooks concurrently.

Hence, package managers cannot concurrently install packages. At least the hook/trigger part of the installation needs to happen in sequence.

While it seems technically feasible to retrofit package manager hooks with concurrency primitives such as locks for mutual exclusion between different hook processes, the required overhaul of all hooks¹ seems like such a daunting task that it might be better to just get rid of the hooks instead. Only deleting code frees you from the burden of maintenance, automated testing and debugging.

① In Debian, there are 8620 non-generated maintainer scripts, as reported by find shard*/src/*/debian -regex ".*\(pre\|post\)\(inst\|rm\)$" on a Debian Code Search instance.

Triggers slow down installing/updating other packages

Personally, I never use the apropos(1) command, so I don’t appreciate the man(1) package’s trigger which updates the database used by apropos(1). The process takes a long time and, because hooks and triggers must be executed serially (see previous section), blocks my installation or update.

When I tell people this, they are often surprised to learn about the existance of the apropos(1) command. I suggest adopting an opt-in model.

Unnecessary work if programs are not used between updates

Hooks run when packages are installed. If a package’s contents are not used between two updates, running the hook in the first update could have been skipped. Running the hook lazily when the package contents are used reduces unnecessary work.

As a welcome side-effect, lazy hook evaluation automatically makes the hook work in operating system images, such as live USB thumb drives or SD card images for the Raspberry Pi. Such images must not ship the same crypto keys (e.g. OpenSSH host keys) to all machines, but instead generate a different key on each machine.

Why do users keep packages installed they don’t use? It’s extra work to remember and clean up those packages after use. Plus, users might not realize or value that having fewer packages installed has benefits such as faster updates.

I can also imagine that there are people for whom the cost of re-installing packages incentivizes them to just keep packages installed—you never know when you might need the program again…

Implemented in an interpreted language

While working on hermetic packages (more on that in another blog post), where the contained programs are started with modified environment variables (e.g. PATH) via a wrapper bash script, I noticed that the overhead of those wrapper bash scripts quickly becomes significant. For example, when using the excellent magit interface for Git in Emacs, I encountered second-long delays² when using hermetic packages compared to standard packages. Re-implementing wrappers in a compiled language provided a significant speed-up.

Similarly, getting rid of an extension point which mandates using shell scripts allows us to build an efficient and fast implementation of a predefined set of primitives, where you can reason about their effects and interactions.

② magit needs to run git a few times for displaying the full status, so small overhead quickly adds up.

Incentivizing more upstream standardization

Hooks are an escape hatch for distribution maintainers to express anything which their packaging system cannot express.

Distributions should only rely on well-established interfaces such as autoconf’s classic ./configure && make && make install (including commonly used flags) to build a distribution package. Integrating upstream software into a distribution should not require custom hooks. For example, instead of requiring a hook which updates a cache of schema files, the library used to interact with those files should transparently (re-)generate the cache or fall back to a slower code path.

Distribution maintainers are hard to come by, so we should value their time. In particular, there is a 1:n relationship of packages to distribution package maintainers (software is typically available in multiple Linux distributions), so it makes sense to spend the work in the 1 and have the n benefit.

Can we do without them?

If we want to get rid of hooks, we need another mechanism to achieve what we currently achieve with hooks.

If the hook is not specific to the package, it can be moved to the package manager. The desired system state should either be derived from the package contents (e.g. required system users can be discovered from systemd service files) or declaratively specified in the package build instructions—more on that in another blog post. This turns hooks (arbitrary code) into configuration, which allows the package manager to collapse and sequence the required state changes. E.g., when 5 packages are installed which each need a new system user, the package manager could update /etc/passwd just once.

If the hook is specific to the package, it should be moved into the package contents. This typically means moving the functionality into the program start (or the systemd service file if we are talking about a daemon). If (while?) upstream is not convinced, you can either wrap the program or patch it. Note that this case is relatively rare: I have worked with hundreds of packages and the only package-specific functionality I came across was automatically generating host keys before starting OpenSSH’s sshd(8)³.

There is one exception where moving the hook doesn’t work: packages which modify state outside of the system, such as bootloaders or kernel images.

③ Even that can be moved out of a package-specific hook, as Fedora demonstrates.

Conclusion

Global state modifications performed as part of package installation today use hooks, an overly expressive extension mechanism.

Instead, all modifications should be driven by configuration. This is feasible because there are only a few different kinds of desired state modifications. This makes it possible for package managers to optimize package installation.

Michael Stapelberg: Linux package managers are slow

Saturday 17th of August 2019 04:36:31 PM

I measured how long the most popular Linux distribution’s package manager take to install small and large packages (the ack(1p) source code search Perl script and qemu, respectively).

Where required, my measurements include metadata updates such as transferring an up-to-date package list. For me, requiring a metadata update is the more common case, particularly on live systems or within Docker containers.

All measurements were taken on an Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz running Docker 1.13.1 on Linux 4.19, backed by a Samsung 970 Pro NVMe drive boasting many hundreds of MB/s write performance.

See Appendix B for details on the measurement method and command outputs.

Measurements

Keep in mind that these are one-time measurements. They should be indicative of actual performance, but your experience may vary.

ack (small Perl program) distribution package manager data wall-clock time rate Fedora dnf 107 MB 29s 3.7 MB/s NixOS Nix 15 MB 14s 1.1 MB/s Debian apt 15 MB 4s 3.7 MB/s Arch Linux pacman 6.5 MB 3s 2.1 MB/s Alpine apk 10 MB 1s 10.0 MB/s qemu (large C program) distribution package manager data wall-clock time rate Fedora dnf 266 MB 1m8s 3.9 MB/s Arch Linux pacman 124 MB 1m2s 2.0 MB/s Debian apt 159 MB 51s 3.1 MB/s NixOS Nix 262 MB 38s 6.8 MB/s Alpine apk 26 MB 2.4s 10.8 MB/s


The difference between the slowest and fastest package managers is 30x!

How can Alpine’s apk and Arch Linux’s pacman be an order of magnitude faster than the rest? They are doing a lot less than the others, and more efficiently, too.

Pain point: too much metadata

For example, Fedora transfers a lot more data than others because its main package list is 60 MB (compressed!) alone. Compare that with Alpine’s 734 KB APKINDEX.tar.gz.

Of course the extra metadata which Fedora provides helps some use case, otherwise they hopefully would have removed it altogether. The amount of metadata seems excessive for the use case of installing a single package, which I consider the main use-case of an interactive package manager.

I expect any modern Linux distribution to only transfer absolutely required data to complete my task.

Pain point: no concurrency

Because they need to sequence executing arbitrary package maintainer-provided code (hooks and triggers), all tested package managers need to install packages sequentially (one after the other) instead of concurrently (all at the same time).

In my blog post “Can we do without hooks and triggers?”, I outline that hooks and triggers are not strictly necessary to build a working Linux distribution.

Thought experiment: further speed-ups

Strictly speaking, the only required feature of a package manager is to make available the package contents so that the package can be used: a program can be started, a kernel module can be loaded, etc.

By only implementing what’s needed for this feature, and nothing more, a package manager could likely beat apk’s performance. It could, for example:

  • skip archive extraction by mounting file system images (like AppImage or snappy)
  • use compression which is light on CPU, as networks are fast (like apk)
  • skip fsync when it is safe to do so, i.e.:
    • package installations don’t modify system state
    • atomic package installation (e.g. an append-only package store)
    • automatically clean up the package store after crashes
Current landscape

Here’s a table outlining how the various package managers listed on Wikipedia’s list of software package management systems fare:

name scope package file format hooks/triggers AppImage apps image: ISO9660, SquashFS no snappy apps image: SquashFS yes: hooks FlatPak apps archive: OSTree no 0install apps archive: tar.bz2 no nix, guix distro archive: nar.{bz2,xz} activation script dpkg distro archive: tar.{gz,xz,bz2} in ar(1) yes rpm distro archive: cpio.{bz2,lz,xz} scriptlets pacman distro archive: tar.xz install slackware distro archive: tar.{gz,xz} yes: doinst.sh apk distro archive: tar.gz yes: .post-install Entropy distro archive: tar.bz2 yes ipkg, opkg distro archive: tar{,.gz} yes Conclusion

As per the current landscape, there is no distribution-scoped package manager which uses images and leaves out hooks and triggers, not even in smaller Linux distributions.

I think that space is really interesting, as it uses a minimal design to achieve significant real-world speed-ups.

I have explored this idea in much more detail, and am happy to talk more about it in my post “Introducing the distri research linux distribution".

Appendix A: related work

There are a couple of recent developments going into the same direction:

Appendix B: measurement details ack

You can expand each of these:

Fedora’s dnf takes almost 30 seconds to fetch and unpack 107 MB.

% docker run -t -i fedora /bin/bash [root@722e6df10258 /]# time dnf install -y ack Fedora Modular 30 - x86_64 4.4 MB/s | 2.7 MB 00:00 Fedora Modular 30 - x86_64 - Updates 3.7 MB/s | 2.4 MB 00:00 Fedora 30 - x86_64 - Updates 17 MB/s | 19 MB 00:01 Fedora 30 - x86_64 31 MB/s | 70 MB 00:02 […] Install 44 Packages Total download size: 13 M Installed size: 42 M […] real 0m29.498s user 0m22.954s sys 0m1.085s

NixOS’s Nix takes 14s to fetch and unpack 15 MB.

% docker run -t -i nixos/nix 39e9186422ba:/# time sh -c 'nix-channel --update && nix-env -i perl5.28.2-ack-2.28' unpacking channels... created 2 symlinks in user environment installing 'perl5.28.2-ack-2.28' these paths will be fetched (14.91 MiB download, 80.83 MiB unpacked): /nix/store/57iv2vch31v8plcjrk97lcw1zbwb2n9r-perl-5.28.2 /nix/store/89gi8cbp8l5sf0m8pgynp2mh1c6pk1gk-attr-2.4.48 /nix/store/gkrpl3k6s43fkg71n0269yq3p1f0al88-perl5.28.2-ack-2.28-man /nix/store/iykxb0bmfjmi7s53kfg6pjbfpd8jmza6-glibc-2.27 /nix/store/k8lhqzpaaymshchz8ky3z4653h4kln9d-coreutils-8.31 /nix/store/svgkibi7105pm151prywndsgvmc4qvzs-acl-2.2.53 /nix/store/x4knf14z1p0ci72gl314i7vza93iy7yc-perl5.28.2-File-Next-1.16 /nix/store/zfj7ria2kwqzqj9dh91kj9kwsynxdfk0-perl5.28.2-ack-2.28 copying path '/nix/store/gkrpl3k6s43fkg71n0269yq3p1f0al88-perl5.28.2-ack-2.28-man' from 'https://cache.nixos.org'... copying path '/nix/store/iykxb0bmfjmi7s53kfg6pjbfpd8jmza6-glibc-2.27' from 'https://cache.nixos.org'... copying path '/nix/store/x4knf14z1p0ci72gl314i7vza93iy7yc-perl5.28.2-File-Next-1.16' from 'https://cache.nixos.org'... copying path '/nix/store/89gi8cbp8l5sf0m8pgynp2mh1c6pk1gk-attr-2.4.48' from 'https://cache.nixos.org'... copying path '/nix/store/svgkibi7105pm151prywndsgvmc4qvzs-acl-2.2.53' from 'https://cache.nixos.org'... copying path '/nix/store/k8lhqzpaaymshchz8ky3z4653h4kln9d-coreutils-8.31' from 'https://cache.nixos.org'... copying path '/nix/store/57iv2vch31v8plcjrk97lcw1zbwb2n9r-perl-5.28.2' from 'https://cache.nixos.org'... copying path '/nix/store/zfj7ria2kwqzqj9dh91kj9kwsynxdfk0-perl5.28.2-ack-2.28' from 'https://cache.nixos.org'... building '/nix/store/q3243sjg91x1m8ipl0sj5gjzpnbgxrqw-user-environment.drv'... created 56 symlinks in user environment real 0m 14.02s user 0m 8.83s sys 0m 2.69s

Debian’s apt takes almost 10 seconds to fetch and unpack 16 MB.

% docker run -t -i debian:sid root@b7cc25a927ab:/# time (apt update && apt install -y ack-grep) Get:1 http://cdn-fastly.deb.debian.org/debian sid InRelease [233 kB] Get:2 http://cdn-fastly.deb.debian.org/debian sid/main amd64 Packages [8270 kB] Fetched 8502 kB in 2s (4764 kB/s) […] The following NEW packages will be installed: ack ack-grep libfile-next-perl libgdbm-compat4 libgdbm5 libperl5.26 netbase perl perl-modules-5.26 The following packages will be upgraded: perl-base 1 upgraded, 9 newly installed, 0 to remove and 60 not upgraded. Need to get 8238 kB of archives. After this operation, 42.3 MB of additional disk space will be used. […] real 0m9.096s user 0m2.616s sys 0m0.441s

Arch Linux’s pacman takes a little over 3s to fetch and unpack 6.5 MB.

% docker run -t -i archlinux/base [root@9604e4ae2367 /]# time (pacman -Sy && pacman -S --noconfirm ack) :: Synchronizing package databases... core 132.2 KiB 1033K/s 00:00 extra 1629.6 KiB 2.95M/s 00:01 community 4.9 MiB 5.75M/s 00:01 […] Total Download Size: 0.07 MiB Total Installed Size: 0.19 MiB […] real 0m3.354s user 0m0.224s sys 0m0.049s

Alpine’s apk takes only about 1 second to fetch and unpack 10 MB.

% docker run -t -i alpine / # time apk add ack fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/main/x86_64/APKINDEX.tar.gz fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/community/x86_64/APKINDEX.tar.gz (1/4) Installing perl-file-next (1.16-r0) (2/4) Installing libbz2 (1.0.6-r7) (3/4) Installing perl (5.28.2-r1) (4/4) Installing ack (3.0.0-r0) Executing busybox-1.30.1-r2.trigger OK: 44 MiB in 18 packages real 0m 0.96s user 0m 0.25s sys 0m 0.07s

qemu

You can expand each of these:

Fedora’s dnf takes over a minute to fetch and unpack 266 MB.

% docker run -t -i fedora /bin/bash [root@722e6df10258 /]# time dnf install -y qemu Fedora Modular 30 - x86_64 3.1 MB/s | 2.7 MB 00:00 Fedora Modular 30 - x86_64 - Updates 2.7 MB/s | 2.4 MB 00:00 Fedora 30 - x86_64 - Updates 20 MB/s | 19 MB 00:00 Fedora 30 - x86_64 31 MB/s | 70 MB 00:02 […] Install 262 Packages Upgrade 4 Packages Total download size: 172 M […] real 1m7.877s user 0m44.237s sys 0m3.258s

NixOS’s Nix takes 38s to fetch and unpack 262 MB.

% docker run -t -i nixos/nix 39e9186422ba:/# time sh -c 'nix-channel --update && nix-env -i qemu-4.0.0' unpacking channels... created 2 symlinks in user environment installing 'qemu-4.0.0' these paths will be fetched (262.18 MiB download, 1364.54 MiB unpacked): […] real 0m 38.49s user 0m 26.52s sys 0m 4.43s

Debian’s apt takes 51 seconds to fetch and unpack 159 MB.

% docker run -t -i debian:sid root@b7cc25a927ab:/# time (apt update && apt install -y qemu-system-x86) Get:1 http://cdn-fastly.deb.debian.org/debian sid InRelease [149 kB] Get:2 http://cdn-fastly.deb.debian.org/debian sid/main amd64 Packages [8426 kB] Fetched 8574 kB in 1s (6716 kB/s) […] Fetched 151 MB in 2s (64.6 MB/s) […] real 0m51.583s user 0m15.671s sys 0m3.732s

Arch Linux’s pacman takes 1m2s to fetch and unpack 124 MB.

% docker run -t -i archlinux/base [root@9604e4ae2367 /]# time (pacman -Sy && pacman -S --noconfirm qemu) :: Synchronizing package databases... core 132.2 KiB 751K/s 00:00 extra 1629.6 KiB 3.04M/s 00:01 community 4.9 MiB 6.16M/s 00:01 […] Total Download Size: 123.20 MiB Total Installed Size: 587.84 MiB […] real 1m2.475s user 0m9.272s sys 0m2.458s

Alpine’s apk takes only about 2.4 seconds to fetch and unpack 26 MB.

% docker run -t -i alpine / # time apk add qemu-system-x86_64 fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/main/x86_64/APKINDEX.tar.gz fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/community/x86_64/APKINDEX.tar.gz […] OK: 78 MiB in 95 packages real 0m 2.43s user 0m 0.46s sys 0m 0.09s

Michael Stapelberg: distri: a Linux distribution to research fast package management

Saturday 17th of August 2019 04:36:31 PM

Over the last year or so I have worked on a research linux distribution in my spare time. It’s not a distribution for researchers (like Scientific Linux), but my personal playground project to research linux distribution development, i.e. try out fresh ideas.

This article focuses on the package format and its advantages, but there is more to distri, which I will cover in upcoming blog posts.

Motivation

I was a Debian Developer for the 7 years from 2012 to 2019, but using the distribution often left me frustrated, ultimately resulting in me winding down my Debian work.

Frequently, I was noticing a large gap between the actual speed of an operation (e.g. doing an update) and the possible speed based on back of the envelope calculations. I wrote more about this in my blog post “Package managers are slow”.

To me, this observation means that either there is potential to optimize the package manager itself (e.g. apt), or what the system does is just too complex. While I remember seeing some low-hanging fruit¹, through my work on distri, I wanted to explore whether all the complexity we currently have in Linux distributions such as Debian or Fedora is inherent to the problem space.

I have completed enough of the experiment to conclude that the complexity is not inherent: I can build a Linux distribution for general-enough purposes which is much less complex than existing ones.

① Those were low-hanging fruit from a user perspective. I’m not saying that fixing them is easy in the technical sense; I know too little about apt’s code base to make such a statement.

Key idea: packages are images, not archives

One key idea is to switch from using archives to using images for package contents. Common package managers such as dpkg(1) use tar(1) archives with various compression algorithms.

distri uses SquashFS images, a comparatively simple file system image format that I happen to be familiar with from my work on the gokrazy Raspberry Pi 3 Go platform.

This idea is not novel: AppImage and snappy also use images, but only for individual, self-contained applications. distri however uses images for distribution packages with dependencies. In particular, there is no duplication of shared libraries in distri.

A nice side effect of using read-only image files is that applications are immutable and can hence not be broken by accidental (or malicious!) modification.

Key idea: separate hierarchies

Package contents are made available under a fully-qualified path. E.g., all files provided by package zsh-amd64-5.6.2-3 are available under /ro/zsh-amd64-5.6.2-3. The mountpoint /ro stands for read-only, which is short yet descriptive.

Perhaps surprisingly, building software with custom prefix values of e.g. /ro/zsh-amd64-5.6.2-3 is widely supported, thanks to:

  1. Linux distributions, which build software with prefix set to /usr, whereas FreeBSD (and the autotools default), which build with prefix set to /usr/local.

  2. Enthusiast users in corporate or research environments, who install software into their home directories.

Because using a custom prefix is a common scenario, upstream awareness for prefix-correctness is generally high, and the rarely required patch will be quickly accepted.

Key idea: exchange directories

Software packages often exchange data by placing or locating files in well-known directories. Here are just a few examples:

  • gcc(1) locates the libusb(3) headers via /usr/include
  • man(1) locates the nginx(1) manpage via /usr/share/man.
  • zsh(1) locates executable programs via PATH components such as /bin

In distri, these locations are called exchange directories and are provided via FUSE in /ro.

Exchange directories come in two different flavors:

  1. global. The exchange directory, e.g. /ro/share, provides the union of the share sub directory of all packages in the package store.
    Global exchange directories are largely used for compatibility, see below.

  2. per-package. Useful for tight coupling: e.g. irssi(1) does not provide any ABI guarantees, so plugins such as irssi-robustirc can declare that they want e.g. /ro/irssi-amd64-1.1.1-1/out/lib/irssi/modules to be a per-package exchange directory and contain files from their lib/irssi/modules.

Note: Only a few exchange directories are also available in the package build environment (as opposed to run-time). Search paths sometimes need to be fixed

Programs which use exchange directories sometimes use search paths to access multiple exchange directories. In fact, the examples above were taken from gcc(1) ’s INCLUDEPATH, man(1) ’s MANPATH and zsh(1) ’s PATH. These are prominent ones, but more examples are easy to find: zsh(1) loads completion functions from its FPATH.

Some search path values are derived from --datadir=/ro/share and require no further attention, but others might derive from e.g. --prefix=/ro/zsh-amd64-5.6.2-3/out and need to be pointed to an exchange directory via a specific command line flag.

Note: To create the illusion of a writable search path at package build-time, $DESTDIR/ro/share and $DESTDIR/ro/lib are diverted to $DESTDIR/$PREFIX/share and $DESTDIR/$PREFIX/lib, respectively. FHS compatibility

Global exchange directories are used to make distri provide enough of the Filesystem Hierarchy Standard (FHS) that third-party software largely just works. This includes a C development environment.

I successfully ran a few programs from their binary packages such as Google Chrome, Spotify, or Microsoft’s Visual Studio Code.

Fast package manager

I previously wrote about how Linux distribution package managers are too slow.

distri’s package manager is extremely fast. Its main bottleneck is typically the network link, even at high speed links (I tested with a 100 Gbps link).

Its speed comes largely from an architecture which allows the package manager to do less work. Specifically:

  1. Package images can be added atomically to the package store, so we can safely skip fsync(2) . Corruption will be cleaned up automatically, and durability is not important: if an interactive installation is interrupted, the user can just repeat it, as it will be fresh on their mind.

  2. Because all packages are co-installable thanks to separate hierarchies, there are no conflicts at the package store level, and no dependency resolution (an optimization problem requiring SAT solving) is required at all.
    In exchange directories, we resolve conflicts by selecting the package with the highest monotonically increasing distri revision number.

  3. distri proves that we can build a useful Linux distribution entirely without hooks and triggers. Not having to serialize hook execution allows us to download packages into the package store with maximum concurrency.

  4. Because we are using images instead of archives, we do not need to unpack anything. This means installing a package is really just writing its package image and metadata to the package store. Sequential writes are typically the fastest kind of storage usage pattern.

Fast installation also make other use-cases more bearable, such as creating disk images, be it for testing them in qemu(1) , booting them on real hardware from a USB drive, or for cloud providers such as Google Cloud.

Note: To saturate links above 1 Gbps, transfer packages without compression. Fast package builder

Contrary to how distribution package builders are usually implemented, the distri package builder does not actually install any packages into the build environment.

Instead, distri makes available a filtered view of the package store (only declared dependencies are available) at /ro in the build environment.

This means that even for large dependency trees, setting up a build environment happens in a fraction of a second! Such a low latency really makes a difference in how comfortable it is to iterate on distribution packages.

Package stores

In distri, package images are installed from a remote package store into the local system package store /roimg, which backs the /ro mount.

A package store is implemented as a directory of package images and their associated metadata files.

You can easily make available a package store by using distri export.

To provide a mirror for your local network, you can periodically distri update from the package store you want to mirror, and then distri export your local copy. Special tooling (e.g. debmirror in Debian) is not required because distri install is atomic (and update uses install).

Producing derivatives is easy: just add your own packages to a copy of the package store.

The package store is intentionally kept simple to manage and distribute. Its files could be exchanged via peer-to-peer file systems, or synchronized from an offline medium.

distri’s first release

distri works well enough to demonstrate the ideas explained above. I have branched this state into branch jackherer, distri’s first release code name. This way, I can keep experimenting in the distri repository without breaking your installation.

From the branch contents, our autobuilder creates:

  1. disk images, which…

  2. a package repository. Installations can pick up new packages with distri update.

  3. documentation for the release.

The project website can be found at https://distr1.org. The website is just the README for now, but we can improve that later.

The repository can be found at https://github.com/distr1/distri

Project outlook

Right now, distri is mainly a vehicle for my spare-time Linux distribution research. I don’t recommend anyone use distri for anything but research, and there are no medium-term plans of that changing. At the very least, please contact me before basing anything serious on distri so that we can talk about limitations and expectations.

I expect the distri project to live for as long as I have blog posts to publish, and we’ll see what happens afterwards. Note that this is a hobby for me: I will continue to explore, at my own pace, parts that I find interesting.

My hope is that established distributions might get a useful idea or two from distri.

There’s more to come: subscribe to the distri feed

I don’t want to make this post too long, but there is much more!

Please subscribe to the following URL in your feed reader to get all posts about distri:

https://michael.stapelberg.ch/posts/tags/distri/feed.xml

Next in my queue are articles about hermetic packages and good package maintainer experience (including declarative packaging).

Feedback or questions?

I’d love to discuss these ideas in case you’re interested!

Please send feedback to the distri mailing list so that everyone can participate!

Cyril Brulebois: Sending HTML messages with Net::XMPP (Perl)

Saturday 17th of August 2019 06:15:00 AM
Executive summary

It’s perfectly possible! Jump to the HTML demo!

Longer version

This started with a very simple need: wanting to improve the notifications I’m receiving from various sources. Those include:

  • changes or failures reported during Puppet runs on my own infrastructure, and on at a customer’s;
  • build failures for the Debian Installer;
  • changes in banking amounts;
  • and lately: build status for jobs in a customer’s Jenkins instance.

I’ve been using plaintext notifications for a number of years but I decided to try and pimp them a little by adding some colors.

While the XMPP-sending details are usually hidden in a local module, here’s a small self-contained example: connecting to a server, sending credentials, and then sending a message to someone else. Of course, one might want to tweak the Configuration section before trying to run this script…

#!/usr/bin/perl use strict; use warnings; use Net::XMPP; # Configuration: my $hostname = 'example.org'; my $username = 'bot'; my $password = 'call-me-alan'; my $resource = 'demo'; my $recipient = 'human@example.org'; # Open connection: my $con = Net::XMPP::Client->new(); my $status = $con->Connect( hostname => $hostname, connectiontype => 'tcpip', tls => 1, ssl_ca_path => '/etc/ssl/certs', ); die 'XMPP connection failed' if ! defined($status); # Log in: my @result = $con->AuthSend( hostname => $hostname, username => $username, password => $password, resource => $resource, ); die 'XMPP authentication failed' if $result[0] ne 'ok'; # Send plaintext message: my $msg = 'Hello, World!'; my $res = $con->MessageSend( to => $recipient, body => $msg, type => 'chat', ); die('ERROR: XMPP message failed') if $res != 0;

For reference, here’s what the XML message looks like in Gajim’s XML console (on the receiving end):

<message type='chat' to='human@example.org' from='bot@example.org/demo'> <body>Hello, World!</body> </message>

Issues start when one tries to send some HTML message, e.g. with the last part changed to:

# Send plaintext message: my $msg = 'This is a <b>failing</b> test'; my $res = $con->MessageSend( to => $recipient, body => $msg, type => 'chat', );

as that leads to the following message:

<message type='chat' to='human@example.org' from='bot@example.org/demo'> <body>This is a &lt;b&gt;failing&lt;/b&gt; test</body> </message>

So tags are getting encoded and one gets to see the uninterpreted “HTML code”.

Trying various things to embed that inside <body> and <html> tags, with or without namespaces, led nowhere.

Looking at a message sent from Gajim to Gajim (so that I could craft an HTML message myself and inspect it), I’ve noticed it goes this way (edited to concentrate on important parts):

<message xmlns="jabber:client" to="human@example.org/Gajim" type="chat"> <body>Hello, World!</body> <html xmlns="http://jabber.org/protocol/xhtml-im"> <body xmlns="http://www.w3.org/1999/xhtml"> <p>Hello, <strong>World</strong>!</p> </body> </html> </message>

Two takeaways here:

  • The message is send both in plaintext and in HTML. It seems Gajim archives the plaintext version, as opening the history/logs only shows the textual version.

  • The fact that the HTML message is under a different path (/message/html as opposed to /message/body) means that one cannot use the MessageSend method to send HTML messages…

This was verified by checking the documentation and code of the Net::XMPP::Message module. It comes with various getters and setters for attributes. Those are then automatically collected when the message is serialized into XML (through the GetXML() method). Trying to add handling for a new HTML attribute would mean being extra careful as that would need to be treated with $type = 'raw'…

Oh, wait a minute! While using git grep in the sources, looking for that raw type thing, I’ve discovered what sounded promising: an InsertRawXML() method, that doesn’t appear anywhere in either the code or the documentation of the Net::XMPP::Message module.

It’s available, though! Because Net::XMPP::Message is derived from Net::XMPP::Stanza:

use Net::XMPP::Stanza; use base qw( Net::XMPP::Stanza );

which then in turn comes with this function:

############################################################################## # # InsertRawXML - puts the specified string onto the list for raw XML to be # included in the packet. # ##############################################################################

Let’s put that aside for a moment and get back to the MessageSend() method. It wants parameters that can be passed to the Net::XMPP::Message SetMessage() method, and here is its entire code:

############################################################################### # # MessageSend - Takes the same hash that Net::XMPP::Message->SetMessage # takes and sends the message to the server. # ############################################################################### sub MessageSend { my $self = shift; my $mess = $self->_message(); $mess->SetMessage(@_); $self->Send($mess); }

The first assignment is basically equivalent to my $mess = Net::XMPP::Message->new();, so what this function does is: creating a Net::XMPP::Message for us, passing all parameters there, and handing the resulting object over to the Send() method. All in all, that’s merely a proxy.

HTML demo

The question becomes: what if we were to create that object ourselves, then tweaking it a little, and then passing it directly to Send(), instead of using the slightly limited MessageSend()? Let’s see what the rewritten sending part would look like:

# Send HTML message: my $text = 'This is a working test'; my $html = 'This is a <b>working</b> test'; my $message = Net::XMPP::Message->new(); $message->SetMessage( to => $recipient, body => $text, type => 'chat', ); $message->InsertRawXML("<html><body>$html</body></html>"); my $res = $con->Send($message);

And tada!

<message type='chat' to='human@example.org' from='bot@example.org/demo'> <body>This is a working test</body> <html> <body>This is a <b>working</b> test</body> </html> </message>

I’m absolutely no expert when it comes to XMPP standards, and one might need/want to set some more metadata like xmlns but I’m happy enough with this solution that I thought I’d share it as is. ;)

Bits from Debian: Debian celebrates 26 years, Happy DebianDay!

Friday 16th of August 2019 06:12:00 PM

26 years ago today in a single post to the comp.os.linux.development newsgroup, Ian Murdock announced the completion of a brand new Linux release named #Debian.

Since that day we’ve been into outer space, typed over 1,288,688,830 lines of code, spawned over 300 derivatives, were enhanced with 6,155 known contributors, and filed over 975,619 bug reports.

We are home to a community of thousands of users around the globe, we gather to host our annual Debian Developers Conference #DebConf which spans the world in a different country each year, and of course today's many #DebianDay celebrations held around the world.

It's not too late to throw an impromptu #DebianDay celebration or to go and join one of the many celebrations already underway.

As we celebrate our own anniversary, we also want to celebrate our many contributors, developers, teams, groups, maintainers, and users. It is all of your effort, support, and drive that continue to make Debian truly: The universal operating system.

Happy #DebianDay!

Jonathan McDowell: DebConf19: Brazil

Friday 16th of August 2019 05:46:54 PM

My first DebConf was DebConf4, held in Porte Alegre, Brazil back in 2004. Uncle Steve did the majority of the travel arrangements for 6 of us to go. We had some mishaps which we still tease him about, but it was a great experience. So when I learnt DebConf19 was to be in Brazil again, this time in Curitiba, I had to go. So last November I realised flights were only likely to get more expensive, that I’d really kick myself if I didn’t go, and so I booked my tickets. A bunch of life happened in the meantime that mean the timing wasn’t particularly great for me - it’s been a busy 6 months - but going was still the right move.

One thing that struck me about DC19 is that a lot of the faces I’m used to seeing at a DebConf weren’t there. Only myself and Steve from the UK DC4 group made it, for example. I don’t know if that’s due to the travelling distances involved, or just the fact that attendance varies and this happened to be a year where a number of people couldn’t make it. Nonetheless I was able to catch up with a number of people I only really see at DebConfs, as well as getting to hang out with some new folk.

Given how busy I’ve been this year and expect to be for at least the next year I set myself a hard goal of not committing to any additional tasks. That said DebConf often provides a welcome space to concentrate on technical bits. I reviewed and merged dkg’s work on WKD and DANE for the Debian keyring under debian.org - we’re not exposed to the recent keyserver network issues due to the fact the keyring is curated, but providing additional access to our keyring makes sense if it can be done easily. I spent some time with Ian Jackson talking about dgit - I’m not a user of it at present, but I’m intrigued by the potential for being able to do Debian package uploads via signed git tags. Of course I also attended a variety of different talks (and, as usual, at times the schedule conflicted such that I had a difficult choice about which option to chose for a particular slot).

This also marks the first time I did a non-team related talk at DebConf, warbling about my home automation (similar to my NI Dev Conf talk but with some more bits about the Debian involvement thrown in):

In addition I co-presented a couple of talks for teams I’m part of:

I only realised late in the week that 2 talks I’d normally expect to attend, an Software in the Public Interest BoF and a New Member BoF, were not on the schedule, but to be honest I don’t think I’d have been able to run either even if I’d realised in advance.

Finally, DebConf wouldn’t be DebConf without playing with some embedded hardware at some point, and this year it was the Caninos Loucos Labrador. This is a Brazilian grown single board ARM based computer with a modular form factor designed for easy integration into bigger projects. There;s nothing particularly remarkable about the hardware and you might ask why not just use a Pi? The reason is that import duties in Brazil make such things prohibitively expensive - importing a $35 board can end up costing $150 by the time shipping, taxes and customs fees are all taken into account. The intent is to design and build locally, as components can be imported with minimal taxes if the final product is being assembled within Brazil. And Mercosul allows access to many other South American countries without tariffs. I’d have loved to get hold of one of the boards, but they’ve only produced 1000 in the initial run and really need to get them into the hands of people who can help progress the project rather than those who don’t have enough time.

Next year DebConf20 is in Haifa - a city I’ve spent some time in before - but I’ve made the decision not to attend; rather than spending a single 7-10 day chunk away from home I’m going to aim to attend some more local conferences for shorter periods of time.

François Marier: Passwordless restricted guest account on Ubuntu

Friday 16th of August 2019 03:10:00 AM

Here's how I created a restricted but not ephemeral guest account on an Ubuntu 18.04 desktop computer that can be used without a password.

Create a user that can login without a password

First of all, I created a new user with a random password (using pwgen -s 64):

adduser guest

Then following these instructions, I created a new group and added the user to it:

addgroup nopasswdlogin adduser guest nopasswdlogin

In order to let that user login using GDM without a password, I added the following to the top of /etc/pam.d/gdm-password:

auth sufficient pam_succeed_if.so user ingroup nopasswdlogin

Note that this user is unable to ssh into this machine since it's not part of the sshuser group I have setup in my sshd configuration.

Privacy settings

In order to reduce the amount of digital traces left between guest sessions, I logged into the account using a GNOME session and then opened gnome-control-center. I set the following in the privacy section:

Then I replaced Firefox with Brave in the sidebar, set it as the default browser in gnome-control-center:

and configured it to clear everything on exit:

Create a password-less system keyring

In order to suppress prompts to unlock gnome-keyring, I opened seahorse and deleted the default keyring.

Then I started Brave, which prompted me to create a new keyring so that it can save the contents of its password manager securely. I set an empty password on that new keyring, since I'm not going to be using it.

I also made sure to disable saving of passwords, payment methods and addresses in the browser too.

Restrict user account further

Finally, taking an idea from this similar solution, I prevented the user from making any system-wide changes by putting the following in /etc/polkit-1/localauthority/50-local.d/10-guest-policy.pkla:

[guest-policy] Identity=unix-user:guest Action=* ResultAny=no ResultInactive=no ResultActive=no

If you know of any other restrictions that could be added, please leave a comment!

Julian Andres Klode: APT Patterns

Thursday 15th of August 2019 01:55:47 PM

If you have ever used aptitude a bit more extensively on the command-line, you’ll probably have come across its patterns. This week I spent some time implementing (some) patterns for apt, so you do not need aptitude for that, and I want to let you in on the details of this merge request !74.

so, what are patterns?

Patterns allow you to specify complex search queries to select the packages you want to install/show. For example, the pattern ?garbage can be used to find all packages that have been automatically installed but are no longer depended upon by manually installed packages. Or the pattern ?automatic allows you find all automatically installed packages.

You can combine patterns into more complex ones; for example, ?and(?automatic,?obsolete) matches all automatically installed packages that do not exist any longer in a repository.

There are also explicit targets, so you can perform queries like ?for x: ?depends(?recommends(x)): Find all packages x that depend on another package that recommends x. I do not fully comprehend those yet - I did not manage to create a pattern that matches all manually installed packages that a meta-package depends upon. I am not sure it is possible.

reducing pattern syntax

aptitude’s syntax for patterns is quite context-sensitive. If you have a pattern ?foo(?bar) it can have two possible meanings:

  1. If ?foo takes arguments (like ?depends did), then ?bar is the argument.
  2. Otherwise, ?foo(?bar) is equivalent to ?foo?bar which is short for ?and(?foo,?bar)

I find that very confusing. So, when looking at implementing patterns in APT, I went for a different approach. I first parse the pattern into a generic parse tree, without knowing anything about the semantics, and then I convert the parse tree into a APT::CacheFilter::Matcher, an object that can match against packages.

This is useful, because the syntactic structure of the pattern can be seen, without having to know which patterns have arguments and which do not - basically, for the parser ?foo and ?foo() are the same thing. That said, the second pass knows whether a pattern accepts arguments or not and insists on you adding them if required and not having them if it does not accept any, to prevent you from confusing yourself.

aptitude also supports shortcuts. For example, you could write ~c instead of config-files, or ~m for automatic; then combine them like ~m~c instead of using ?and. I have not implemented these short patterns for now, focusing instead on getting the basic functionality working.

So in our example ?foo(?bar) above, we can immediately dismiss parsing that as ?foo?bar:

  1. we do not support concatenation instead of ?and.
  2. we automatically parse ( as the argument list, no matter whether ?foo supports arguments or not

apt not understanding invalid patterns

Supported syntax

At the moment, APT supports two kinds of patterns: Basic logic ones like ?and, and patterns that apply to an entire package as opposed to a specific version. This was done as a starting point for the merge, patterns for versions will come in the next round.

We also do not have any support for explicit search targets such as ?for x: ... yet - as explained, I do not yet fully understand them, and hence do not want to commit on them.

The full list of the first round of patterns is below, helpfully converted from the apt-patterns(7) docbook to markdown by pandoc.

logic patterns

These patterns provide the basic means to combine other patterns into more complex expressions, as well as ?true and ?false patterns.

?and(PATTERN, PATTERN, ...)

Selects objects where all specified patterns match.

?false

Selects nothing.

?not(PATTERN)

Selects objects where PATTERN does not match.

?or(PATTERN, PATTERN, ...)

Selects objects where at least one of the specified patterns match.

?true

Selects all objects.

package patterns

These patterns select specific packages.

?architecture(WILDCARD)

Selects packages matching the specified architecture, which may contain wildcards using any.

?automatic

Selects packages that were installed automatically.

?broken

Selects packages that have broken dependencies.

?config-files

Selects packages that are not fully installed, but have solely residual configuration files left.

?essential

Selects packages that have Essential: yes set in their control file.

?exact-name(NAME)

Selects packages with the exact specified name.

?garbage

Selects packages that can be removed automatically.

?installed

Selects packages that are currently installed.

?name(REGEX)

Selects packages where the name matches the given regular expression.

?obsolete

Selects packages that no longer exist in repositories.

?upgradable

Selects packages that can be upgraded (have a newer candidate).

?virtual

Selects all virtual packages; that is packages without a version. These exist when they are referenced somewhere in the archive, for example because something depends on that name.

examples
apt remove ?garbage

Remove all packages that are automatically installed and no longer needed - same as apt autoremove

apt purge ?config-files

Purge all packages that only have configuration files left

oddities

Some things are not yet where I want them:

  • ?architecture does not support all, native, or same
  • ?installed should match only the installed version of the package, not the entire package (that is what aptitude does, and it’s a bit surprising that ?installed implies a version and ?upgradable does not)
the future

Of course, I do want to add support for the missing version patterns and explicit search patterns. I might even add support for some of the short patterns, but no promises. Some of those explicit search patterns might have slightly different syntax, e.g. ?for(x, y) instead of ?for x: y in order to make the language more uniform and easier to parse.

Another thing I want to do ASAP is to disable fallback to regular expressions when specifying package names on the command-line: apt install g++ should always look for a package called g++, and not for any package containing g (g++ being a valid regex) when there is no g++ package. I think continuing to allow regular expressions if they start with ^ or end with $ is fine - that prevents any overlap with package names, and would avoid breaking most stuff.

There also is the fallback to fnmatch(): Currently, if apt cannot find a package with the specified name using the exact name or the regex, it would fall back to interpreting the argument as a glob(7) pattern. For example, apt install apt* would fallback to installing every package starting with apt if there is no package matching that as a regular expression. We can actually keep those in place, as the glob(7) syntax does not overlap with valid package names.

Maybe I should allow using [] instead of () so larger patterns become more readable, and/or some support for comments.

There are also plans for AppStream based patterns. This would allow you to use apt install ?provides-mimetype(text/xml) or apt install ?provides-lib(libfoo.so.2). It’s not entirely clear how to package this though, we probably don’t want to have libapt-pkg depend directly on libappstream.

feedback

Talk to me on IRC, comment on the Mastodon thread, or send me an email if there’s anything you think I’m missing or should be looking at.

Steve Kemp: That time I didn't find a kernel bug, or did I?

Tuesday 13th of August 2019 06:00:44 PM

Recently I saw a post to the linux kernel mailing-list containing a simple fix for a use-after-free bug. The code in question originally read:

hdr->pkcs7_msg = pkcs7_parse_message(buf + buf_len, sig_len); if (IS_ERR(hdr->pkcs7_msg)) { kfree(hdr); return PTR_ERR(hdr->pkcs7_msg); }

Here the bug is obvious once it has been pointed out:

  • A structure is freed.
    • But then it is dereferenced, to provide a return value.

This is the kind of bug that would probably have been obvious to me if I'd happened to read the code myself. However patch submitted so job done? I did have some free time so I figured I'd scan for similar bugs. Writing a trivial perl script to look for similar things didn't take too long, though it is a bit shoddy:

  • Open each file.
  • If we find a line containing "free(.*)" record the line and the thing that was freed.
  • The next time we find a return look to see if the return value uses the thing that was free'd.
    • If so that's a possible bug. Report it.

Of course my code is nasty, but it looked like it immediately paid off. I found this snippet of code in linux-5.2.8/drivers/media/pci/tw68/tw68-video.c:

if (hdl->error) { v4l2_ctrl_handler_free(hdl); return hdl->error; }

That looks promising:

  • The structure hdl is freed, via a dedicated freeing-function.
  • But then we return the member error from it.

Chasing down the code I found that linux-5.2.8/drivers/media/v4l2-core/v4l2-ctrls.c contains the code for the v4l2_ctrl_handler_free call and while it doesn't actually free the structure - just some members - it does reset the contents of hdl->error to zero.

Ahah! The code I've found looks for an error, and if it was found returns zero, meaning the error is lost. I can fix it, by changing to this:

if (hdl->error) { int err = hdl->error; v4l2_ctrl_handler_free(hdl); return err; }

I did that. Then looked more closely to see if I was missing something. The code I've found lives in the function tw68_video_init1, that function is called only once, and the return value is ignored!

So, that's the story of how I scanned the Linux kernel for use-after-free bugs and contributed nothing to anybody.

Still fun though.

I'll go over my list more carefully later, but nothing else jumped out as being immediately bad.

There is a weird case I spotted in ./drivers/media/platform/s3c-camif/camif-capture.c with a similar pattern. In that case the function involved is s3c_camif_create_subdev which is invoked by ./drivers/media/platform/s3c-camif/camif-core.c:

ret = s3c_camif_create_subdev(camif); if (ret < 0) goto err_sd;

So I suspect there is something odd there:

  • If there's an error in s3c_camif_create_subdev
    • Then handler->error will be reset to zero.
    • Which means that return handler->error will return 0.
    • Which means that the s3c_camif_create_subdev call should have returned an error, but won't be recognized as having done so.
    • i.e. "0 < 0" is false.

Of course the error-value is only set if this code is hit:

hdl->buckets = kvmalloc_array(hdl->nr_of_buckets, sizeof(hdl->buckets[0]), GFP_KERNEL | __GFP_ZERO); hdl->error = hdl->buckets ? 0 : -ENOMEM;

Which means that the registration of the sub-device fails if there is no memory, and at that point what can you even do?

It's a bug, but it isn't a security bug.

Ricardo Mones: When your mail hub password is updated...

Tuesday 13th of August 2019 03:30:02 PM
don't forget to run postmap on your /etc/postfix/sasl_passwd
(repeat 100 times sotto voce or until falling asleep, whatever happens first).

Sven Hoexter: Debian/buster on HPE DL360G10 - interfaces change back to ethX

Tuesday 13th of August 2019 01:24:17 PM

For yet unknown reasons some recently installed HPE DL360G10 running buster changed back the interface names from the expected "onboard" based names eno5 and eno6 to ethX after a reboot.

My current workaround is a link file which kind of enforces the onboard scheme.

$ cat /etc/systemd/network/101-onboard-rd.link [Link] NamePolicy=onboard kernel database slot path MACAddressPolicy=none

The hosts are running the latest buster kernel

Linux foobar 4.19.0-5-amd64 #1 SMP Debian 4.19.37-5+deb10u2 (2019-08-08) x86_64 GNU/Linu

A downgrade of the kernel did not change anything. So I currently like to believe this is not related a kernel change.

I tried to collect a few information on one of the broken systems while in a broken state:

root@foobar:~# SYSTEMD_LOG_LEVEL=debug udevadm test-builtin net_setup_link /sys/class/net/eth0 === trie on-disk === tool version: 241 file size: 9492053 bytes header size 80 bytes strings 2069269 bytes nodes 7422704 bytes Load module index Found container virtualization none. timestamp of '/etc/systemd/network' changed Skipping overridden file '/usr/lib/systemd/network/99-default.link'. Parsed configuration file /etc/systemd/network/99-default.link Created link configuration context. ID_NET_DRIVER=i40e eth0: No matching link configuration found. Builtin command 'net_setup_link' fails: No such file or directory Unload module index Unloaded link configuration context. root@foobar:~# udevadm test-builtin net_id /sys/class/net/eth0 2>/dev/null ID_NET_NAMING_SCHEME=v240 ID_NET_NAME_MAC=enx48df37944ab0 ID_OUI_FROM_DATABASE=Hewlett Packard Enterprise ID_NET_NAME_ONBOARD=eno5 ID_NET_NAME_PATH=enp93s0f0

Most interesting hint right now seems to be that /sys/class/net/eth0/name_assign_type is invalid While on sytems before the reboot that breaks it, and after setting the .link file fix, contains a 4.

Since those hosts were intially installed with buster there are no remains on any ethX related configuration present. If someone has an idea what is going on write a mail (sven at stormbind dot net), or blog on planet.d.o.

I found a vaguely similar bug report for a Dell PE server in #929622, though that was a change from 4.9 (stretch) to the 4.19 stretch-bpo kernel and the device names were not changed back to the ethX scheme, and Ben found a reason for it inside the kernel. Also the hardware is different using bnxt_en, while I've tg3 and i40e in use.

Robert McQueen: Flathub, brought to you by…

Monday 12th of August 2019 03:31:02 PM

Over the past 2 years Flathub has evolved from a wild idea at a hackfest to a community of app developers and publishers making over 600 apps available to end-users on dozens of Linux-based OSes. We couldn’t have gotten anything off the ground without the support of the 20 or so generous souls who backed our initial fundraising, and to make the service a reality since then we’ve relied on on the contributions of dozens of individuals and organisations such as Codethink, Endless, GNOME, KDE and Red Hat. But for our day to day operations, we depend on the continuous support and generosity of a few companies who provide the services and resources that Flathub uses 24/7 to build and deliver all of these apps. This post is about saying thank you to those companies!

Running the infrastructure

Mythic Beasts is a UK-based “no-nonsense” hosting provider who provide managed and un-managed co-location, dedicated servers, VPS and shared hosting. They are also conveniently based in Cambridge where I live, and very nice people to have a coffee or beer with, particularly if you enjoy talking about IPv6 and how many web services you can run on a rack full of Raspberry Pis. The “heart” of Flathub is a physical machine donated by them which originally ran everything in separate VMs – buildbot, frontend, repo master – and they have subsequently increased their donation with several VMs hosted elsewhere within their network. We also benefit from huge amounts of free bandwidth, backup/storage, monitoring, management and their expertise and advice at scaling up the service.

Starting with everything running on one box in 2017 we quickly ran into scaling bottlenecks as traffic started to pick up. With Mythic’s advice and a healthy donation of 100s of GB / month more of bandwidth, we set up two caching frontend servers running in virtual machines in two different London data centres to cache the commonly-accessed objects, shift the load away from the master server, and take advantage of the physical redundancy offered by the Mythic network.

As load increased and we brought a CDN online to bring the content closer to the user, we also moved the Buildbot (and it’s associated Postgres database) to a VM hosted at Mythic in order to offload as much IO bandwidth from the repo server, to keep up sustained HTTP throughput during update operations. This helped significantly but we are in discussions with them about a yet larger box with a mixture of disks and SSDs to handle the concurrent read and write load that we need.

Even after all of these changes, we keep the repo master on one, big, physical machine with directly attached storage because repo update and delta computations are hugely IO intensive operations, and our OSTree repos contain over 9 million inodes which get accessed randomly during this process. We also have a physical HSM (a YubiKey) which stores the GPG repo signing key for Flathub, and it’s really hard to plug a USB key into a cloud instance, and know where it is and that it’s physically secure.

Building the apps

Our first build workers were under Alex’s desk, in Christian’s garage, and a VM donated by Scaleway for our first year. We still have several ARM workers donated by Codethink, but at the start of 2018 it became pretty clear within a few months that we were not going to keep up with the growing pace of builds without some more serious iron behind the Buildbot. We also wanted to be able to offer PR and test builds, beta builds, etc ­­— all of which multiplies the workload significantly.

Thanks to an introduction by the most excellent Jorge Castro and the approval and support of the Linux Foundation’s CNCF Infrastructure Lab, we were able to get access to an “all expenses paid” account at Packet. Packet is a “bare metal” cloud provider — like AWS except you get entire boxes and dedicated switch ports etc to yourself – at a handful of main datacenters around the world with a full range of server, storage and networking equipment, and a larger number of edge facilities for distribution/processing closer to the users. They have an API and a magical provisioning system which means that at the click of a button or one method call you can bring up all manner of machines, configure networking and storage, etc. Packet is clearly a service built by engineers for engineers – they are smart, easy to get hold of on e-mail and chat, share their roadmap publicly and set priorities based on user feedback.

We currently have 4 Huge Boxes (2 Intel, 2 ARM) from Packet which do the majority of the heavy lifting when it comes to building everything that is uploaded, and also use a few other machines there for auxiliary tasks such as caching source downloads and receiving our streamed logs from the CDN. We also used their flexibility to temporarily set up a whole separate test infrastructure (a repo, buildbot, worker and frontend on one box) while we were prototyping recent changes to the Buildbot.

A special thanks to Ed Vielmetti at Packet who has patiently supported our requests for lots of 32-bit compatible ARM machines, and for his support of other Linux desktop projects such as GNOME and the Freedesktop SDK who also benefit hugely from Packet’s resources for build and CI.

Delivering the data

Even with two redundant / load-balancing front end servers and huge amounts of bandwidth, OSTree repos have so many files that if those servers are too far away from the end users, the latency and round trips cause a serious problem with throughput. In the end you can’t distribute something like Flathub from a single physical location – you need to get closer to the users. Fortunately the OSTree repo format is very efficient to distribute via a CDN, as almost all files in the repository are immutable.

After a very speedy response to a plea for help on Twitter, Fastly – one of the world’s leading CDNs – generously agreed to donate free use of their CDN service to support Flathub. All traffic to the dl.flathub.org domain is served through the CDN, and automatically gets cached at dozens of points of presence around the world. Their service is frankly really really cool – the configuration and stats are reallly powerful, unlike any other CDN service I’ve used. Our configuration allows us to collect custom logs which we use to generate our Flathub stats, and to define edge logic in Varnish’s VCL which we use to allow larger files to stream to the end user while they are still being downloaded by the edge node, improving throughput. We also use their API to purge the summary file from their caches worldwide each time the repository updates, so that it can stay cached for longer between updates.

To get some feelings for how well this works, here are some statistics: The Flathub main repo is 929 GB, of which 73 GB are static deltas and 1.9 GB of screenshots. It contains 7280 refs for 640 apps (plus runtimes and extensions) over 4 architectures. Fastly is serving the dl.flathub.org domain fully cached, with a cache hit rate of ~98.7%. Averaging 9.8 million hits and 464 Gb downloaded per hour, Flathub uses between 1-2 Gbps sustained bandwidth depending on the time of day. Here are some nice graphs produced by the Fastly management UI (the numbers are per-hour over the last month):

To buy the scale of services and support that Flathub receives from our commercial sponsors would cost tens if not hundreds of thousands of dollars a month. Flathub could not exist without Mythic Beasts, Packet and Fastly‘s support of the free and open source Linux desktop. Thank you!

William (Bill) Blough: Free Software Activities (July 2019)

Saturday 10th of August 2019 10:04:14 PM
Debian
  • Bug 932626: passwordsafe — Non-English locales don't work due to translation files being installed in the wrong directory.

    The fixed versions are:

    • unstable/testing: 1.06+dfsg-2
    • buster: 1.06+dfsg-1+deb10u1 (via 932945)
    • stretch: 1.00+dfsg-1+deb9u1 (via 932944)
  • Bug 932947: file — The --mime-type flag fails on arm64 due to seccomp

    Recently, there was a message on debian-devel about enabling seccomp sandboxing for the file utility. While I knew that passwordsafe uses file to determine some mime type information, testing on my development box (which is amd64-based) didn't show any problems.

    However, this was happening around the same time that I was preparing the the fix for 932626 as noted above. Lo and behold, when I uploaded the fix, everything went fine except for on the arm64 architecture. The build there failed due to the package's test suite failing.

    After doing some troubleshooting on one of the arm64 porterboxes, it was clear that the seccomp change to file was the culprit. I haven't worked with arm64 very much, so I don't know all of the details. But based on my research, it appears that arm64 doesn't implement the access() system call, but uses faccessat() instead. However, in this case, seccomp was allowing calls to access(), but not calls to faccessat(). This led to the difference in behavior between arm64 and the other architectures.

    So I filed the bug to let the maintainer know the details, in hopes that the seccomp filters could be adjusted. However, it seems he took it as the "final straw" with regard to some of the other problems he was hearing about, and decided to revert the seccomp change altogether.

    Once the change was reverted, I requested a rebuild of the failed passwordsafe package on arm64 so it could be rebuilt against the fixed dependency without doing another full upload.

  • I updated django-cas-server in unstable to 1.1.0, which is the latest upstream version. I also did some miscellaneous cleanup/maintenance on the packaging.

  • I attended DebConf19 in Curitiba, Brazil.

    This was my 3rd DebConf, and my first trip to Brazil. Actually, it was my first trip to anywhere in the Southern Hemisphere.

    As usual, DebConf was quite enjoyable. From a technical perspective, there were lots of interesting talks. I learned some new things, and was also exposed to some new (to me) projects and techniques, as well as some new ideas in general. It also gave me some ideas of other ways/places I could potentially contribute to Debian.

    From a social perspective, it was a good opportunity to see and spend time with people that I normally only get to interact with via email or irc. I also enjoyed being able to put faces/voices to names that I only see on mailing lists. Even if I don't know or interact with them much, it really helps my mental picture when I'm reading things they wrote. And of course, I met some new people, too. It was nice to share stories and experiences over food and drinks, or in the hacklabs.

    If any of the DebConf team read this, thanks for your hard work. It was another great DebConf.

Steve McIntyre: DebConf in Brazil again!

Saturday 10th of August 2019 06:33:00 PM

I was lucky enough to meet up with my extended Debian family again this year. We went back to Brazil for the first time since 2004, this time in Curitiba. And this time I didn't lose anybody's clothes! :-)

I had a very busy time, as usual - lots of sessions to take part in, and lots of conversations with people from all over. As part of the Community Team (ex-AH Team), I had a lot of things to catch up on too, and a sprint report to send. Despite all that, I even managed to do some technical things too!

I ran sessions about UEFI Secure Boot, the Arm ports and the Community Team. I was meant to be running a session for the web team too, but the dreaded DebConf 'flu took me out for a day. It's traditional - bring hundreds of people together from all over the world, mix them up with too much alcohol and not enough sleep and many people get ill... :-( Once I'm back from vacation, I'll be doing my usual task of sending session summaries to the Debian mailing lists to describe what happened in my sessions.

Maddog showed a group of us round the micro-brewery at Hop'n'Roll which was extra fun. I'm sure I wasn't the only experienced guy there, but it's always nice to listen to geeky people talking about their passion.

Of course, I could't get to all the sessions I wanted to - there's just too many things going on in DebConf week, and sessions clash at the best of times. So I have a load of videos on my laptop to watch while I'm away. Heartfelt thanks to our always-awesome video team for their efforts to make that possible. And I know that I had at least one follower at home watching the live streams too!

Daniel Lange: Cleaning a broken GnuPG (gpg) key

Saturday 10th of August 2019 03:38:55 PM

I've long said that the main tools in the Open Source security space, OpenSSL and GnuPG (gpg), are broken and only a complete re-write will solve this. And that is still pending as nobody came forward with the funding. It's not a sexy topic, so it has to get really bad before it'll get better.

Gpg has a UI that is close to useless. That won't substantially change with more bolted-on improvements.

Now Robert J. Hansen and Daniel Kahn Gillmor had somebody add ~50k signatures (read 1, 2, 3, 4 for the g{l}ory details) to their keys and - oops - they say that breaks gpg.

But does it?

I downloaded Robert J. Hansen's key off the SKS-Keyserver network. It's a nice 45MB file when de-ascii-armored (gpg --dearmor broken_key.asc ; mv broken_key.asc.gpg broken_key.gpg).

Now a friendly:

$ /usr/bin/time -v gpg --no-default-keyring --keyring ./broken_key.gpg --batch --quiet --edit-key 0x1DCBDC01B44427C7 clean save quit

pub  rsa3072/0x1DCBDC01B44427C7
     erzeugt: 2015-07-16  verfällt: niemals     Nutzung: SC  
     Vertrauen: unbekannt     Gültigkeit: unbekannt
sub  ed25519/0xA83CAE94D3DC3873
     erzeugt: 2017-04-05  verfällt: niemals     Nutzung: S  
sub  cv25519/0xAA24CC81B8AED08B
     erzeugt: 2017-04-05  verfällt: niemals     Nutzung: E  
sub  rsa3072/0xDC0F82625FA6AADE
     erzeugt: 2015-07-16  verfällt: niemals     Nutzung: E  
[ unbekannt ] (1). Robert J. Hansen <rjh@sixdemonbag.org>
[ unbekannt ] (2)  Robert J. Hansen <rob@enigmail.net>
[ unbekannt ] (3)  Robert J. Hansen <rob@hansen.engineering>

User-ID "Robert J. Hansen <rjh@sixdemonbag.org>": 49705 Signaturen entfernt
User-ID "Robert J. Hansen <rob@enigmail.net>": 49704 Signaturen entfernt
User-ID "Robert J. Hansen <rob@hansen.engineering>": 49701 Signaturen entfernt

pub  rsa3072/0x1DCBDC01B44427C7
     erzeugt: 2015-07-16  verfällt: niemals     Nutzung: SC  
     Vertrauen: unbekannt     Gültigkeit: unbekannt
sub  ed25519/0xA83CAE94D3DC3873
     erzeugt: 2017-04-05  verfällt: niemals     Nutzung: S  
sub  cv25519/0xAA24CC81B8AED08B
     erzeugt: 2017-04-05  verfällt: niemals     Nutzung: E  
sub  rsa3072/0xDC0F82625FA6AADE
     erzeugt: 2015-07-16  verfällt: niemals     Nutzung: E  
[ unbekannt ] (1). Robert J. Hansen <rjh@sixdemonbag.org>
[ unbekannt ] (2)  Robert J. Hansen <rob@enigmail.net>
[ unbekannt ] (3)  Robert J. Hansen <rob@hansen.engineering>

        Command being timed: "gpg --no-default-keyring --keyring ./broken_key.gpg --batch --quiet --edit-key 0x1DCBDC01B44427C7 clean save quit"
        User time (seconds): 3911.14
        System time (seconds): 2442.87
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:45:56
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 107660
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 1
        Minor (reclaiming a frame) page faults: 26630
        Voluntary context switches: 43
        Involuntary context switches: 59439
        Swaps: 0
        File system inputs: 112
        File system outputs: 48
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
 

And the result is a nicely useable 3835 byte file of the clean public key. If you supply a keyring instead of --no-default-keyring it will also keep the non-self signatures that are useful for you (as you apparently know the signing party).

So it does not break gpg. It does break things that call gpg at runtime and not asynchronously. I heard Enigmail is affected, quelle surprise.

Now the main problem here is the runtime. 1h45min is just ridiculous. As Filippo Valsorda puts it:

Someone added a few thousand entries to a list that lets anyone append to it. GnuPG, software supposed to defeat state actors, suddenly takes minutes to process entries. How big is that list you ask? 17 MiB. Not GiB, 17 MiB. Like a large picture. https://dev.gnupg.org/T4592

If I were a gpg / SKS keyserver developer, I'd

  • speed this up so the edit-key run above completes in less than 10 s (just getting rid of the lseek/read dance and deferring all time-based decisions should get close)
  • (ideally) make the drop-sig import-filter syntax useful (date-ranges, non-reciprocal signatures, ...)
  • clean affected keys on the SKS keyservers (needs coordination of sysops, drop servers from unreachable people)
  • (ideally) use the opportunity to clean all keyserver filesystem and the message board over pgp key servers keys, too
  • only accept new keys and new signatures on keys extending the strong set (rather small change to the existing codebase)

That way another key can only be added to the keyserver network if it contains at least one signature from a previously known strong-set key. Attacking the keyserver network would become at least non-trivial. And the web-of-trust thing may make sense again.

Updates

09.07.2019

GnuPG 2.2.17 has been released with another set of quickly bolted together fixes:

* gpg: Ignore all key-signatures received from keyservers. This change is required to mitigate a DoS due to keys flooded with faked key-signatures. The old behaviour can be achieved by adding keyserver-options no-self-sigs-only,no-import-clean to your gpg.conf. [#4607] * gpg: If an imported keyblocks is too large to be stored in the keybox (pubring.kbx) do not error out but fallback to an import using the options "self-sigs-only,import-clean". [#4591] * gpg: New command --locate-external-key which can be used to refresh keys from the Web Key Directory or via other methods configured with --auto-key-locate. * gpg: New import option "self-sigs-only". * gpg: In --auto-key-retrieve prefer WKD over keyservers. [#4595] * dirmngr: Support the "openpgpkey" subdomain feature from draft-koch-openpgp-webkey-service-07. [#4590]. * dirmngr: Add an exception for the "openpgpkey" subdomain to the CSRF protection. [#4603] * dirmngr: Fix endless loop due to http errors 503 and 504. [#4600] * dirmngr: Fix TLS bug during redirection of HKP requests. [#4566] * gpgconf: Fix a race condition when killing components. [#4577]

Bug T4607 shows that these changes are all but well thought-out. They introduce artificial limits, like 64kB for WKD-distributed keys or 5MB for local signature imports (Bug T4591) which weaken the web-of-trust further.

I recommend to not run gpg 2.2.17 in production environments without extensive testing as these limits and the unverified network traffic may bite you. Do validate your upgrade with valid and broken keys that have segments (packet groups) surpassing the above mentioned limits. You may be surprised what gpg does. On the upside: you can now refresh keys (sans signatures) via WKD. So if your buddies still believe in limiting their subkey validities, you can more easily update them bypassing the SKS keyserver network. NB: I have not tested that functionality. So test before deploying.

10.08.2019

Christopher Wellons (skeeto) has released his pgp-poisoner tool. It is a go program that can add thousands of malicious signatures to a GNUpg key per second. He comments "[pgp-poisoner is] proof that such attacks are very easy to pull off. It doesn't take a nation-state actor to break the PGP ecosystem, just one person and couple evenings studying RFC 4880. This system is not robust." He also hints at the next likely attack vector, public subkeys can be bound to a primary key of choice.

Petter Reinholdtsen: Legal to share more than 16,000 movies listed on IMDB?

Saturday 10th of August 2019 10:00:00 AM

The recent announcement of from the New York Public Library on its results in identifying books published in the USA that are now in the public domain, inspired me to update the scripts I use to track down movies that are in the public domain. This involved updating the script used to extract lists of movies believed to be in the public domain, to work with the latest version of the source web sites. In particular the new edition of the Retro Film Vault web site now seem to list all the films available from that distributor, bringing the films identified there to more than 12.000 movies, and I was able to connect 46% of these to IMDB titles.

The new total is 16307 IMDB IDs (aka films) in the public domain or creative commons licensed, and unknown status for 31460 movies (possibly duplicates of the 16307).

The complete data set is available from a public git repository, including the scripts used to create it.

Anyway, this is the summary of the 28 collected data sources so far:

2361 entries ( 50 unique) with and 22472 without IMDB title ID in free-movies-archive-org-search.json 2363 entries ( 146 unique) with and 0 without IMDB title ID in free-movies-archive-org-wikidata.json 299 entries ( 32 unique) with and 93 without IMDB title ID in free-movies-cinemovies.json 88 entries ( 52 unique) with and 36 without IMDB title ID in free-movies-creative-commons.json 3190 entries ( 1532 unique) with and 13 without IMDB title ID in free-movies-fesfilm-xls.json 620 entries ( 24 unique) with and 283 without IMDB title ID in free-movies-fesfilm.json 1080 entries ( 165 unique) with and 651 without IMDB title ID in free-movies-filmchest-com.json 830 entries ( 13 unique) with and 0 without IMDB title ID in free-movies-icheckmovies-archive-mochard.json 19 entries ( 19 unique) with and 0 without IMDB title ID in free-movies-imdb-c-expired-gb.json 7410 entries ( 7101 unique) with and 0 without IMDB title ID in free-movies-imdb-c-expired-us.json 1205 entries ( 41 unique) with and 0 without IMDB title ID in free-movies-imdb-pd.json 163 entries ( 22 unique) with and 88 without IMDB title ID in free-movies-infodigi-pd.json 158 entries ( 103 unique) with and 0 without IMDB title ID in free-movies-letterboxd-looney-tunes.json 113 entries ( 4 unique) with and 0 without IMDB title ID in free-movies-letterboxd-pd.json 182 entries ( 71 unique) with and 0 without IMDB title ID in free-movies-letterboxd-silent.json 248 entries ( 85 unique) with and 0 without IMDB title ID in free-movies-manual.json 158 entries ( 4 unique) with and 64 without IMDB title ID in free-movies-mubi.json 85 entries ( 1 unique) with and 23 without IMDB title ID in free-movies-openflix.json 520 entries ( 22 unique) with and 244 without IMDB title ID in free-movies-profilms-pd.json 343 entries ( 14 unique) with and 10 without IMDB title ID in free-movies-publicdomainmovies-info.json 701 entries ( 16 unique) with and 560 without IMDB title ID in free-movies-publicdomainmovies-net.json 74 entries ( 13 unique) with and 60 without IMDB title ID in free-movies-publicdomainreview.json 698 entries ( 16 unique) with and 118 without IMDB title ID in free-movies-publicdomaintorrents.json 5506 entries ( 2941 unique) with and 6585 without IMDB title ID in free-movies-retrofilmvault.json 16 entries ( 0 unique) with and 0 without IMDB title ID in free-movies-thehillproductions.json 110 entries ( 2 unique) with and 29 without IMDB title ID in free-movies-two-movies-net.json 73 entries ( 20 unique) with and 131 without IMDB title ID in free-movies-vodo.json 16307 unique IMDB title IDs in total, 12509 only in one list, 31460 without IMDB title ID

New this time is a list of all the identified IMDB titles, with title, year and running time, provided in free-complete.json. this file also indiciate which source is used to conclude the video is free to distribute.

As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

Andy Simpkins: gov.uk paperwork

Friday 9th of August 2019 01:11:36 PM

[Edit: Removed accusation of non UK hosting – thank you to Richard Mortimer & Philipp Edelmann for pointing out I had incorrectly looked up the domain “householdresponce.com” in place of  “householdresponse.com”.  Learn to spell…]

I live in England, the government keeps an Electoral Roll, a list of people registered to vote.  This list needs to be maintained, so once a year we are required to update the database.  To make sure we don’t forget we get sent a handy letter through the door looking like this:

Is it a scam?

Well that’s the first page anyway.   Correctly addressed to the “Current Occupier”.   So why am I posting about this?

Phishing emails land in our inbox all the time (hopefully only a few because our spam filters eat the rest).  These are unsolicited emails trying to trick us into doing something, usually they look like something official and warn us about something that we should take action about, for example an email that looks like it has come from your bank warning about suspicious activity in your account, they then ask you to follow a link to the ‘banks website’ where you can login and confirm if the activity is genuine – obviously taking you through a ‘man in the middle’ website that harvests your account credentials.

The government is justifiably concerned about this (as to are banks and other businesses that are impersonated in this way) and so run media campaigns to educate the public in the dangers of such scams and what to look out for.

So back to the “Household Enquiry” form…

How do I know that this is genuine?  Well I don’t.  I can’t easily verify the letter, I can’t be sure who sent it, It arrived through my letterbox unbidden, and even if I was expecting it wouldn’t the perfect time to send such a scam letter be at the same time genuine letters are being distributed?

All I can do read the letter carefully and apply the same rational tests that I would to any unsolicited (e)mail.

1) Does it claim to come from a source I would have dealings with (bulk mailing is so cheep that sending to huge numbers of people is still effective even if most of the recipients will know it is a scam because they wouldn’t have dealings with the alleged sender).  Yes it claims to have been sent by South Cambridge District Council and They are my county council and would send be this letter.

2) Do all the communication links point to the sender?  No.  Stop this is probably a scam.

 

Alarm bells should now be ringing – their preferred method of communication is for me to visits the website www.householdresponse.com/southcambs.  Sure they have gov.uk website mentioned and they claim to be south Cambridgeshire District Council and they have an email address elections@scambs.gov.uk  but all the fake emails claiming to come from my bank look like the come from my bank as well – the only thing that doesn’t is the link they want you to follow.  Just like this letter….

Ok Time for a bit of detective work

:~$whois householdresponse.com Domain Name: HOUSEHOLDRESPONSE.COM Registry Domain ID: 2036860356_DOMAIN_COM-VRSN Registrar WHOIS Server: whois.easyspace.com Registrar URL: http://www.easyspace.com Updated Date: 2018-05-23T05:56:38Z Creation Date: 2016-06-22T09:24:15Z Registry Expiry Date: 2020-06-22T09:24:15Z Registrar: EASYSPACE LIMITED Registrar IANA ID: 79 Registrar Abuse Contact Email: abuse@easyspace.com Registrar Abuse Contact Phone: +44.3707555066 Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited Domain Status: clientUpdateProhibited https://icann.org/epp#clientUpdateProhibited Name Server: NS1.NAMECITY.COM Name Server: NS2.NAMECITY.COM DNSSEC: unsigned URL of the ICANN Whois Inaccuracy Complaint Form: https://www.icann.org/wicf/ >>> Last update of whois database: 2019-08-09T17:05:57Z <<< <snip>

Really?  just a hosting companies details for a domain claiming to be from local government?

:~$nslookup householdresponse.com <snip> Name: householdresponse.com Address: 62.25.101.164 :~$ whois 62.25.101.164 <snip> % Information related to '62.25.64.0 - 62.25.255.255' % Abuse contact for '62.25.64.0 - 62.25.255.255' is 'ipabuse@vodafone.co.uk' inetnum: 62.25.64.0 - 62.25.255.255 netname: UK-VODAFONE-WORLDWIDE-20000329 country: GB org: ORG-VL225-RIPE admin-c: GNOC4-RIPE tech-c: GNOC4-RIPE status: ALLOCATED PA mnt-by: RIPE-NCC-HM-MNT mnt-by: VODAFONE-WORLDWIDE-MNTNER mnt-lower: VODAFONE-WORLDWIDE-MNTNER mnt-domains: VODAFONE-WORLDWIDE-MNTNER mnt-routes: VODAFONE-WORLDWIDE-MNTNER created: 2017-10-18T09:50:20Z last-modified: 2017-10-18T09:50:20Z source: RIPE # Filtered organisation: ORG-VL225-RIPE org-name: Vodafone Limited org-type: LIR address: Vodafone House, The Connection address: RG14 2FN address: Newbury address: UNITED KINGDOM phone: +44 1635 33251 admin-c: GSOC-RIPE tech-c: GSOC-RIPE abuse-c: AR40377-RIPE mnt-ref: CW-EUROPE-GSOC mnt-by: RIPE-NCC-HM-MNT mnt-by: CW-EUROPE-GSOC created: 2017-05-11T14:35:11Z last-modified: 2018-01-03T15:48:36Z source: RIPE # Filtered role: Cable and Wireless IP GNOC Munich remarks: Cable&Wireless Worldwide Hostmaster address: Smale House address: London SE1 address: UK admin-c: DOM12-RIPE admin-c: DS3356-RIPE admin-c: EJ343-RIPE admin-c: FM1414-RIPE admin-c: MB4 tech-c: AB14382-RIPE tech-c: MG10145-RIPE tech-c: DOM12-RIPE tech-c: JO361-RIPE tech-c: DS3356-RIPE tech-c: SA79-RIPE tech-c: EJ343-RIPE tech-c: MB4 tech-c: FM1414-RIPE abuse-mailbox: ipabuse@vodafone.co.uk nic-hdl: GNOC4-RIPE mnt-by: CW-EUROPE-GSOC created: 2004-02-03T16:44:58Z last-modified: 2017-05-25T12:03:34Z source: RIPE # Filtered % Information related to '62.25.64.0/18AS1273' route: 62.25.64.0/18 descr: Vodafone Hosting origin: AS1273 mnt-by: ENERGIS-MNT created: 2019-02-28T08:50:03Z last-modified: 2019-02-28T08:57:04Z source: RIPE % Information related to '62.25.64.0/18AS2529' route: 62.25.64.0/18 descr: Energis UK origin: AS2529 mnt-by: ENERGIS-MNT created: 2014-03-26T16:21:40Z last-modified: 2014-03-26T16:21:40Z source: RIPE

Is this a scam…
I only wish it was :-(

A quick search of https://www.scambs.gov.uk/elections/electoral-registration-faqs/ and the very first thing on the webpage is a link to www.householdresponse.com/southcambs…

A phone call to the council, just to confirm that they haven’t been hacked and I am told yes this is for real.

OK lets look at the Privacy statement (on the same letter)

Right a link to a uk gov website… https://www.scambs.gov.uk/privacynotice

A Copy of this page as of 2019-08-09 because websites have a habit of changing can be found here
http://koipond.org.uk/photo/Screenshot_2019-08-09_CustomerPrivacyNotice.png

[Edit
I originally thought that this was being hosted outside the UK (on a US based server) which would be outside of GPDR.  I am still pissed off that this looks and feels ‘spammy’ and that the site is being hosted outside of a gov.uk based server, but this is not the righteous rage that I previously felt]

Summary Of Issue
  1. UK Government, District and Local Councils should be an exemplar of best practice.  Any correspondence from any part of UK government should only use websites within the subdomain gov.uk  (fraud prevention)
Actions taken
  • 2019-08-09
    • I spoke with South Cambridgeshire District Council and confirmed that this was genuine
    • Spoke with South Cambridgeshire District Council Electoral Services Team and made them aware of both issues (and sent follow up email)
    • Spoke with the ICO and asked for advice.  The will take up the issue if South Cambs do not resolve this within 20 working days.
    • Spoken again with ICO – even though I had mistakenly believed this was being hosted outside UK and this is not the case, they are still interested in pushing for a move to a .gov.uk domain

 

Mike Gabriel: Cudos to the Rspamd developers

Friday 9th of August 2019 01:02:23 PM

I just migrated the first / a customer's mail server site away from Amavis+SpamAssassin to Rspamd. Main reasons for the migration were speed and the setup needed a polish up anyway. People on site had been complaining about too much SPAM for quite a while. Plus, it is always good to dive into something new. Mission accomplished.

Implemented functionalities:

  • Sophos AV (savdi) antivirus checks backend
  • Clam AV antivirus backend as fallback
  • Auto-Learner CRON Job for SPAM mails published by https://artinvoice.hu
  • Work-around lacking http proxy support

Unfortunately, I could not enable the full scope of Rspamd features, as that specific site I worked on is on a private network, behind a firewall, etc. Some features don't make sense there (e.g. greylisting) or are hard-disabled in Rspamd once it detects that the mail host is on some local network infrastructure (local as in RFC-1918, or the corresponding fd00:: RFC for IPv6 I currently can't remember).

Cudos + Thanks!

Rspamd is just awesome!!! I am really really pleased with the result (and so is the customer, I heard). Thanks to the upstream developers, thanks to the Debian maintainers of the rspamd Debian package. [1]

Credits + Thanks for sharing your Work

The main part of the work had already been documented in a blog post [2] by someome with the nick "zac" (no real name found). Thanks for that!

The Sophos AV integration was a little tricky at the start, but worked out well, after some trial and error, log reading, Rspamd code studies, etc.

On half way through, there was popped up one tricky part, that could be avoided by the Rspamd upstream maintainers in future releases. As far as I took from [3], Rspamd lacks support for retrieving its map files and such (hosted on *.rspamd.com, or other 3rd party providers) via a http proxy server. This was nearly a full blocker for my last project, as the customer's mail gateway is part of a larger infrastructure and hosted inside a double ring of firewalls. Only access to the internet leads over a non-transparent squid proxy server (one which I don't have control over).

To work around this, I set up a transparent https proxy on "localhost", using a neat Python script [4]. Thanks for sharing this script.

I love all the sharing we do in FLOSS

Working on projects like this is just pure fun. And deeply interesting, as well. Such a project is fun as this one has been 98% FLOSS and 100% in the spirit of FLOSS and the correlated sharing mentality. I love this spirit of sharing ones work with the rest of the world, may someone find what I have to share useful or not.

I invite everyone to join in with sharing and in fact, for the IT business, I dearly recommend it.

I did not post config snippets here and such (as some of them are really customer specific), but if you stumble over similar issues when setting up your anti-SPAM gateway mail site using Rspamd, feel free to poke me and I'll see how I can help.

light+love
Mike (aka sunweaver at debian.org)

References

Dirk Eddelbuettel: RQuantLib 0.4.10: Pure maintenance

Wednesday 7th of August 2019 02:32:00 PM

A new version 0.4.10 of RQuantLib just got onto CRAN; a Debian upload will follow in due course.

QuantLib is a very comprehensice free/open-source library for quantitative finance; RQuantLib connects it to the R environment and language.

This version does two things related to the new upstream QuantLib release 1.16. First, it updates the Windows build script in two ways: it uses binaries for the brand new 1.16 release as prepapred by Jeroen, and it sets win-builder up for the current and “prospective next version”, also set up by Jeroen. I also updated the Dockerfile used for CI to pick QuantLib 1.16 from Debian’s unstable repo as it is too new to have moved to testing (which the r-base container we build on defaults to). The complete set of changes is listed below:

Changes in RQuantLib version 0.4.10 (2019-08-07)
  • Changes in RQuantLib build system:

    • The src/Makevars.win and tools/winlibs.R file get QuantLib 1.16 for either toolchain (Jeroes in #136).

    • The custom Docker container now downloads QuantLib from Debian unstable to get release 1.16 (from yesterday, no less)

Courtesy of CRANberries, there is also a diffstat report for the this release. As always, more detailed information is on the RQuantLib page. Questions, comments etc should go to the new rquantlib-devel mailing list. Issue tickets can be filed at the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Jonas Meurer: debian lts report 2019.07

Tuesday 6th of August 2019 10:17:04 AM
Debian LTS report for July 2019

This month I was allocated 17 hours. I also had 2 hours left over from Juney, which makes a total of 19 hours. I spent all of them on the following tasks/ issues.

  • DLA-1843-1: Fixed CVE-2019-10162 and CVE-2019-10163 in pdns.
  • DLA-1852-1: Fixed CVE-2019-9948 in python3.4. Also found, debugged and fixed several further regressions in the former CVE-2019-9740 patches.
  • Improved testing of LTS uploads: We had some internal discussion in the Debian LTS team on how to improve the overall quality of LTS security uploads by doing more (semi-)automated testing of the packages before uploading them to jessie-security. I tried to summarize the internal discussion, bringing it to the public debian-lts mailinglist. I also did a lot of testing and worked on Jessie support in Salsa-CI. Now that salsa-ci-team/images MR !74 and ci-team/debci MR !89 got merged, we only have to wait for a new debci release in order to enable autopkgtest Jessie support in Salsa-CI. Afterwards, we can use the Salsa-CI pipeline for (semi-)automatic testing of packages targeted at jessie-security.
Links

More in Tux Machines

Events: LibreOffice Conference 2020, MariaDB's Thomas Boyd and Upcoming Linux Foundation’s Open Source Summit

  • LibreOffice Conference 2020 Proposals

    The Document Foundation has received two different proposals for the organization of LibOCon 2020 from the Turkish and German communities. When this has happened in the past, in 2012 (Berlin vs Zaragoza) and 2013 (Milan vs Montreal), TDF Members have been asked to decide by casting their vote. This document provides an outline of the two proposals, which are attached in their original format.

  • Thomas Boyd Discusses Which Open Source Database is the Best Fit for the Business

    The world's largest and most innovative businesses are turning to enterprise open source databases for mission-critical applications, with the most popular open source relational databases being MariaDB, MySQL, and Postgres. However, while all three of these databases are open source, mature, and available in enterprise editions, there are significant differences between them — both in terms of application development as well as database administration and operations. DBTA recently held a webinar featuring Thomas Boyd, director of technical marketing, MariaDB Corporation, who discussed the differences between MariaDB, MySQL, and Postgres. [...] EnterpriseDB is heap only while MySQL and MariaDB offer InnoDB, Columnar, Aria, MyRocks, and more.

  • Open Source Summit welcomes Platform9 experts

    Cloud-native experts share tips and practical learnings for Kubernetes in the enterprise, Kubernetes on bare metal or with stateful MySQL databases, and optimizing the cost and performance of Serverless applications.

  • Transform Your Career: Attend Open Source Summit North America this August in San Diego

    For the last decade, The Linux Foundation’s Open Source Summit has proven to be invaluable for attendees.  A 2018 participant recently wrote an article on OpenSource.com stating “Last August, I arrived at the Vancouver Convention Centre to give a lightning talk and speak on a panel at Open Source Summit North America 2018. It’s no exaggeration to say that this conference—and applying to speak at it—transformed my career.” We encourage you to read the article and discover why attending Open Source Summit can be a game changer for you as well.

OSS Leftovers

  • Intervalometerator: Open Source Code for a Remote Timelapse DSLR

    Want to set up a remote DSLR for shooting a time-lapse? The Intervalometerator (AKA ‘intvlm8r’) is an open-source intervalometer that can help you do so at minimal hardware cost (as long as you’re comfortable tinkering with hardware and software). Created by Sydney-based coder Greig Sheridan and his photographer partner Rocky over the course of a year, the Intervalometerator is designed to be both cheap and easy to build with familiar tools and using Raspberry Pi and Arduino microcontrollers. “My partner and I have been working for over twelve months now on an intervalometer in order to shoot a DSLR-based time-lapse of the construction of our friends’ home in NZ,” Sheridan tells PetaPixel. “It was at the time a seemingly clever idea for a house-warming present, but it grew like tribbles to consume an incredible amount of effort).

  • Open Source Tools & Framework: Microservices Perspective
  • Open Source flexiWAN SD-WAN Software Beta Ships
  • Agile and open source can complement each other

    Despite the growing popularity of both Agile development and open-source practices, it’s not often that they come up in the same conversation. When these two concepts do intersect, it’s often to highlight the contradicting viewpoints that these two models supposedly represent. While there are core differences, Agile doesn’t have to be the enemy of open source—in fact, I would argue the opposite.

  • SD Times Open-Source Project of the Week: Twilio CLI

    In an effort to help its developers be more productive, Twilio has announced the beta version of Twilio CLI. It is an open-source command line interface that enables developers to access Twilio through their command prompt. “It’s hard to beat the flexibility and power that a CLI provides at development time. Until now, there was no CLI designed for typical communications requirements,” Ashley Roach, the product manager for developer interfaces at Twilio, wrote in a post.

  • Using open source in your enterprise? What to look out for

    According to Statista, the open source market was valued at $11.4 billion in 2017 and is estimated to grow to $32.95 billion by 2022, showing it has no intention of slowing down anytime soon. Founded on the belief that collaboration and cooperation build better software, open source sounds closer to a utopian dream than to the cold digital world of programming. Research showed that open source code takes over proprietary one in applications at 57%. This has numerous benefits, such as speeding up the software development process or creating more effective and innovative software. For example, open source frontend development frameworks, such as Angular, are often found in custom web apps, which allows companies to get their products to market at ever-increasing rates. In addition, companies tend to engage open source when at the cusp of technological innovation, especially when it comes to AR, blockchain, IoT, and AI.

  • Open Source Technology: What's It All About?

    To understand how open source works, it is important to appreciate where it all began. The very idea behind its inception isn’t exactly a new one. It’s been adopted by scientists for decades. Let’s imagine a scientist working on a project to develop a cure for an illness. If this scientist only published the results and kept the methods a secret, this would undoubtedly inhibit scientific discovery and further research in this area. On the other hand, teaming up with other researchers and making results and methodologies visible allows for greater and faster innovation. This is the premise from which open source was originally born. Open source refers to software that has an open source code so it can be viewed, modified for a particular need, and importantly, shared (under license). One of the first well known open source initiatives was developed in 1998 by Netscape, which released its Navigator browser as free software and demonstrated the benefits of taking an open source approach. Since then, there have been a number of pivotal moments in open source history that have shaped the technology industry as we know it today. Nowadays, some of the latest technology you use on a daily basis, like your smartphone or laptop, will have been built using open source software. [...] Recent research found that 60 percent of organizations are already using open source software. Many businesses are realizing the benefits that the technology can bring in relation to driving innovation and reducing costs. This in turn is seeing a growing number of organizations integrate open source into their IT operations or even building entire businesses around it. With emerging technologies such as cloud, AI and machine learning only driving this adoption further, open source will continue to play a central and growing role throughout the technology landscape.

  • How to Take Your Open Source Project from Good to Great

    Whether or not you expect anyone to contribute to your project, you should be prepared for the possibility of others wanting to help your cause. And when that happens, your contributing guide will show those helpers exactly how they can get involved. This guide, usually in the form of a CONTRIBUTING.md file, should include information on how one should submit a pull request or open an issue for your project and what kinds of help you’re looking for (bug fixes, design direction, feature requests, etc.).

  • ForgeRock Delivers Open Source IoT Edge Controller for Device Identity

    According to a recent announcement, ForgeRock, a platform provider of digital identity management solutions, has launched its IoT Edge Controller, which is designed to provide consumer and industrial manufacturers the ability to deliver trusted identity at the device level.

  • Browser Settings Too Complex? Let Firefox Handle That for You

    Firefox SVP David Camp doesn't want internet users wasting time 'understanding how the internet is watching you.'

  • Exclusive: Automattic CEO Matt Mullenweg on what’s next for Tumblr

    It’s been a long and winding road for Tumblr, the blogging site that launched a thousand writing careers. It sold to Yahoo for $1.1 billion in 2013, then withered as Yahoo sold itself to AOL, AOL sold itself to Verizon, and Verizon realized it was a phone company after all. Through all that, the site’s fierce community hung on: it’s still Taylor Swift’s go-to social media platform, and fandoms of all kinds have homes there. Verizon sold Tumblr for a reported $3 million this week, a far cry from the billion-dollar valuation it once had. But to Verizon’s credit, it chose to sell Tumblr to Automattic, the company behind WordPress, the publishing platform that runs some 34 percent of the world’s websites. Automattic CEO Matt Mullenweg thinks the future of Tumblr is bright. He wants the platform to bring back the best of old-school blogging, reinvented for mobile and connected to Tumblr’s still-vibrant community, and he’s retaining all 200 Tumblr employees to build that future. It’s the most exciting vision for Tumblr in years. Matt joined Verge reporter Julia Alexander and me on a special Vergecast interview episode to chat about the deal, how it came together, what Automattic’s plans for Tumblr look like, and whether Tumblr might become an open-source project, like WordPress itself. (“That would be pretty cool,” said Matt.) Oh, and that porn ban.

Apache: Self Assessment and Security

  • The Apache® Software Foundation Announces Annual Report for 2019 Fiscal Year

    The Apache® Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of the annual report for its 2019 fiscal year, which ended 30 April 2019.

  • Open Source at the ASF: A Year in Numbers

    332 active projects, 71 million lines of code changed, 7,000+ committers… The Apache Software Foundation has published its annual report for fiscal 2019. The hub of a sprawling, influential open source community, the ASF remains in rude good health, despite challenges this year including the need for “an outsized amount of effort” dealing with trademark infringements, and “some in the tech industry trying to exploit the goodwill earned by the larger Open Source community.” [...] The ASF names 10 “platinum” sponsors: AWS, Cloudera, Comcast, Facebook, Google, LeaseWeb, Microsoft, the Pineapple Fund, Tencent Cloud, and Verizon Media

  • Apache Software Foundation Is Worth $20 Billion

    Yes, Apache is worth $20 billion by its own valuation of the software it offers for free. But what price can you realistically put on open source code? If you only know the name Apache in connection with the web server then you are missing out on some interesting software. The Apache Software Foundation ASF, grew out of the Apache HTTP Server project in 1999 with the aim of furthering open source software. It provides a licence, the Apache licence, a decentralized governance and requires projects to be licensed to the ASF so that it can protect the intellectual property rights.

  • Apache Security Advisories Red Flag Wrong Versions in Patching Gaffe

    Researchers have pinpointed errors in two dozen Apache Struts security advisories, which warn users of vulnerabilities in the popular open-source web app development framework. They say that the security advisories listed incorrect versions impacted by the vulnerabilities. The concern from this research is that security administrators in companies using the actual impacted versions would incorrectly think that their versions weren’t affected – and would thus refrain from applying patches, said researchers with Synopsys who made the discovery, Thursday. “The real question here from this research is whether there remain unpatched versions of the newly disclosed versions in production scenarios,” Tim Mackey, principal security strategist for the Cybersecurity Research Center at Synopsys, told Threatpost. “In all cases, the Struts community had already issued patches for the vulnerabilities so the patches exist, it’s just a question of applying them.”

Google and Android Code

  • Google releases source code for I/O 2019 app with Android Q gesture nav, dark theme

    The Google I/O companion app for Android often takes advantage of the latest design stylings and OS features. It demoed Android Q’s gesture navigation and dark theme this year, with the company today releasing the I/O 2019 source code.

  • Introducing Coil, an open-source Android image loading library backed by Kotlin Coroutines

    Yesterday, Colin White, a Senior Android Engineer at Instacart, introduced Coroutine Image Loader (Coil). It is a fast, lightweight, and modern image loading library for Android backed by Kotlin.

  • Google open-sources Live Transcribe’s speech engine

    Google today open-sourced the speech engine that powers its Android speech recognition transcription tool Live Transcribe. The company hopes doing so will let any developer deliver captions for long-form conversations. The source code is available now on GitHub. Google released Live Transcribe in February. The tool uses machine learning algorithms to turn audio into real-time captions. Unlike Android’s upcoming Live Caption feature, Live Transcribe is a full-screen experience, uses your smartphone’s microphone (or an external microphone), and relies on the Google Cloud Speech API. Live Transcribe can caption real-time spoken words in over 70 languages and dialects. You can also type back into it — Live Transcribe is really a communication tool. The other main difference: Live Transcribe is available on 1.8 billion Android devices. (When Live Caption arrives later this year, it will only work on select Android Q devices.)