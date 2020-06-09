AMD has shared with us that they have published a video to explain in basic terms for the audience at large "What is ROCm?", a.k.a. the Radeon Open Compute stack. The video is arguably long overdue with ROCm being several years old, but it has been evolving a lot lately with new features and capabilities for better taking on the likes of NVIDIA CUDA and Intel oneAPI. With AMD increasing securing super-computing wins, they have also been ramping up their efforts on this standards-based GPU compute stack.

In the kernel graphics world, there has been a longstanding "line in the sand" that disallows merging kernel drivers without a corresponding free-software user-space driver. The idea is that not having a way to test the full functionality means that the kernel developers cannot verify the proper functioning and security of the driver; changes to the kernel driver may lead to unforeseen (and untestable) problems on the user-space side. More recently, though, we have seen other types of devices with complex drivers, but no useful free user-space piece, that have been proposed for inclusion into the kernel; at least one was merged, but the tide has perhaps turned against those types of drivers at this point—or some of them, anyway. In mid-May, Jeffrey Hugo posted an RFC patch for the "Qualcomm Cloud AI 100" device, which is a PCIe card with an application-specific integrated circuit (ASIC) that targets "deep learning" workloads. The device is also referred to as a QAIC device; it presents a modem host interface (MHI) control path and a DMA engine for the data path. These are exposed in the driver as a Linux character device with ioctl() commands to access the data path.

There’s no much in terms of software apart from an Arduino Sketch initializing Ethernet and connecting to Baidu. I’m not sure why the 6-pin programming interface is needed, but a separate CH340C based “Downloader” board with Micro USB and USB-C ports is sold as an option with the board. ESP32 should be programmable via the USB-C port unless the specs are wrong, and there isn’t any on-board CP2104 chip… The board photos are not clear enough to confirm…

Orange Pi 4 SBC is one of the most cost-effective Rockchip RK3399 SBC’s, as it sells for as low as $50 with 4GB RAM, Gigabit Ethernet, WiFi, and Bluetooth, HDMI 2.0 output, etc.. The board also comes with a 24-pin PCIe connector that’s not of much use on its own, so the company introduced a $4 PCIe adapter board providing access to a standard mPCIe socket and a SIM card slot so you could install your own. 4G mini PCIe cards can easily cost around $50 or more, but Shenzhen Xunlong Software has now launched its own 4G LTE mini PCIe card based on Rockchip RM310 module and sold for $16 on Aliexpress, excluding shipping.

Martin Peres who is known for his decade plus in the X.Org community for his longstanding work on the open-source Nouveau driver and in recent years working on Intel's open-source graphics driver team has been brewing a new hobby project around generic open-source Linux drivers for FPGAs. Peres this week wrote a blog post regarding his personal opinions on why there are so few open-source drivers for FPGAs / open hardware especially when it comes to upstream support.

“Almost all” turned out to be a key phrase. Since that post, we discovered a new antenna issue outside of the GNSS one we reported before, along with a microphone regression (in both cases something we weren’t expecting, but that were related to the new PCB design). This set us back a couple of weeks as we dove into troubleshooting these unexpected issues. Now though, we have firm ship dates. We will manufacture all Dogwood phones this week and next, begin individual order packaging and fulfillment immediately with first shipments going out the first week of July. [...] As far as Evergreen and Librem 5 USA shipping dates go, while there are parts of that process that are running in parallel to Dogwood, there are other parts (such as moulds and FCC/CE testing on the final mass-produced PCB) which must wait until after the final Dogwood phones have arrived and have been thoroughly evaluated. Before we commit to a revised shipping date for Evergreen and Librem 5 USA, we’d like a few more weeks to complete the evaluation of the final Dogwood phones.

Linux Kernel: Development Statistics for 5.7, FSGSBASE, Schedulers and Cgroup v2 in UEK5 Development statistics for the 5.7 kernel The 5.7 kernel was released on May 31. By all appearances this was a normal development cycle, unaffected by the troubles in the wider world. Still, there are things to be learned by looking at where the code came from this time around. Read on for LWN's traditional look at who contributed to 5.7, who supported that work, and the paths by which it got into the mainline. Work on 5.7 arrived in the form of 13,901 non-merge changesets contributed by 1,878 developers; that makes it rather busier than the 5.6 cycle was. It's notable that 281 of those developers made their first contribution to the kernel for 5.7, the highest number since 5.0; that is a distinct contrast from 5.6, which saw the lowest number of new contributors since 2013. Perhaps being made to stay at home has inspired more people to put together and send in that first kernel patch.

A possible end to the FSGSBASE saga The FSGSBASE patch series is up to its thirteenth version as of late May. It enables some "new" instructions for the x86 architecture, opening the way for a number of significant performance improvements. One might think that such a patch series would be a shoo-in, but FSGSBASE has had a troubled history; meanwhile, the delays in getting it merged may have led to a number of users installing root holes on their Linux systems in the hope of improving security. "Segments" are a holdover from ancient versions of the x86 architecture; they once were distinct regions of memory used to get around the addressing limitations of that era. Virtual memory has done away with the need for segments, but the concept persists; x86_64 processors only implement two of the original segments (called "FS" and "GS"). In these processors, a "segment" is really just an offset into virtual memory with little other meaning; their remaining value comes from the segment-based addressing mode supported by the CPU. Historic or not, these segment registers are still used. A common use for FS in user space is thread-local storage; each thread has a unique value of the FS base register pointing to its own storage area. Code running in threads can then use segment-based addressing to access local storage without having to worry about where that storage is. The kernel, instead, uses GS in a similar way for per-CPU data. There are some relics of the kernel's one-time use of FS to indicate the address range accessible to user space, but the kernel's get_fs() and set_fs() functions no longer use that segment. Modifying the segment registers has always been a privileged operation. There is value, though, in letting user space make use of the FS and GS base registers, so the kernel provides that functionality via the arch_prctl() system call. Since the base registers are actually set by the kernel, privileged code can count on knowing what their contents will be (and that said contents make sense).

Capacity awareness for the deadline scheduler The Linux deadline scheduler supports realtime systems where applications need to be sure of getting their work done within a specific period of time. It allocates CPU time to deadline tasks in such a way as to ensure that each task's specific timing constraints are met. However, the current implementation does not work well on asymmetric CPU configurations like Arm's big.LITTLE. Dietmar Eggemann recently posted a patch set to address this problem by adding the notion of CPU capacity to the deadline scheduler. In realtime systems, tasks need to meet certain timing requirements. The Linux kernel includes two realtime scheduling classes to meet the needs of these systems: POSIX realtime (often called just "realtime") and deadline. The POSIX realtime scheduler uses task priorities as the basis of its decisions; the task with the highest priority will be run first. The deadline scheduler, instead, dispenses with priorities and describes tasks using three parameters: the run time, period, and deadline. The run time is the CPU time that the task requires to finish its immediate work, the period defines the time between two activations of the task, and the deadline is the time by which the task must be able to use its CPU time. Interested readers can find more explanation of the theory behind the Linux realtime schedulers and the differences between them in an earlier article.

Cgroup v2 Checkpoint With the release of UEK5 in 2018, Oracle embarked on the long journey to fully transition to cgroup v2. UEK6 is the latest major milestone on the path to this significant upgrade. In UEK5, we added the cpu, cpuset, io, memory, pids, and rdma cgroup v2 controllers. While no new controllers were added for UEK6, emphasis was placed on reliability, usability, and security. Furthermore, we continue to focus on defining and implementing a holistic solution that once adopted by applications will allow them to seamlessly operate on a cgroup-v1 system or a cgroup-v2 system. [...] Cgroup v1 was a jack-of-all-trades and master-of-none solution. It provided the user with tremendous flexibility and a myriad of configuration options. This came at the cost of complexity, performance, and (at least within the kernel code itself) maintainability. In practice most users only utilized cgroup v1 in a couple different fashions, yet the kernel still needed to support the possibility of the many, many other quirky and now nonstandard v1 configurations. With cgroup v2, these nonstandard and unintuitive usages were removed, and a much more streamlined hierarchy was established.