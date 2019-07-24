Manuel Stoeckl and Eric Anholt on Graphics
Diff selection, AArch64 Pi4, caching
The result of all the diff construction testing and sample implementations ... is an unchanged diff format. In my opinion, the scenario that most needs optimization is when a small change is made to a large buffer, and detailed damage tracking is unavailable. Then the main source of delay in the program is the time needed to scan through the unchanged portion of the buffer. On the other hand, when most of the buffer has changed, the data transfer time (and any compression/decompression stages) will take enough time that a 5% increase in diff runtime will be hidden by the other operations. In essence, it's better to optimize the diff for text input than for games. Based on the target scenario, the bitset format was ruled out, because a 1/64th control data overhead is huge when only 1/1000th of the buffer changed, and the time needed to convert the bitset to another format added too much slowdown, even in the case where no data changed. The split variation on the standard diff format was also discarded, as it required a bit more complexity to manage two buffers, while not significantly improving performance.
A key optimization used by the standard diff method is "windowing": small unchanged gaps in the data stream are still copied into the diff. This speeds up diff application, by reducing the number of chunks that must be memcpy'd, as well as the number of branch mispredictions, and makes it possible to limit the number of times that the diff construction routine must switch between copying data and not copying data. The maximal size gap to be skipped is still kept relatively small, to minimize both the total diff size and the total amount of data written. (It's currently at 256 bytes, and can't go any lower than 64 bytes without breaking a key optimization for the SIMD diff routines.)
Broadcom's VC4/V3D Driver Developer Parts Ways To Join Google
Eric Anholt who has near single-handedly been developing the V3D driver stack (formerly known as "VC5") for use by the Raspberry Pi 4 and other newer Broadcom boards as well as maintaining the mature VC4 driver stack he developed for previous Raspberry Pi boards has left Broadcom. But Broadcom's loss is to Google's open-source gain.
Eric Anholt had been working for Broadcom the past five years on the VC4 driver stack as the Mesa Gallium3D driver paired with the in-kernel DRM/KMS driver and then more recently the V3D driver stack that for months now is mainline in Mesa and the Linux kernel. The V3D driver stack is now in use most notably by the recently launched Raspberry Pi 4.
Eric Anholt: Raspberry Pi 4, moving on
Recently the Raspberry Pi Foundation released the Raspberry Pi 4, which shipped with the V3D driver I wrote as its GLES driver.
I’m pretty proud of the work I did on the project. I was a solo developer building a GLES3 graphics driver based on Mesa, splitting my time between the new V3D and maintaining VC4, while also fixing issues in the X server and building a kernel driver. I didn’t finish everything (the hardware should be able to do GLES 3.2, and I almost made it to CTS-complete on 3.1 before shipping), but I feel like this is clear proof of how productive graphics driver developers can be working on the Mesa stack.
