Linux KMS testing improvements

Graphics support in Linux has gone a long way. When the Direct Rendering Manager (DRM) framework was first introduced, its main point was to allow concurrent submission of rendering jobs to a GPU.

Since then, it gained:

  • A display API and framework, Kernel Mode-Setting (KMS);
  • Memory management capabilities through Translation Table Maps (TTM) and Graphics Execution Manager (GEM);
  • An atomic display API.

Doing so, it also deprecated earlier mechanisms such as fbdev or X.org User Mode-Setting to offer a single framework to deal with GPUs.

On embedded systems, and ARM in particular, the last decade has been even better. We started it with no upstream GPU drivers, competing, sometimes vendor or project specific, display APIs. We now have upstream GPU drivers for all families, sometimes with the direct vendor involvement. We start to get upstream NPUs support. All competing display interfaces are no longer relevant, all users targeting KMS. Even small, secondary, displays that projects such as fbtft used to support by now have KMS drivers, and analog display outputs got some love too.

Linux 3.18, released 10 years ago, had less than 30 DRM drivers. Linux 6.13 has 85.

That growth was sustainable thanks to a policy of aggressively reusing code between drivers and documenting things. It also brought a few challenges that I feel we need to discuss, and eventually fix.

Indeed, it looks to me we have two categories of drivers, roughly based on the contributors demographics. On one side, there’s the “big” well-maintained drivers, with entire teams working almost only on that driver. In that category, we can find the AMD, Intel and nouveau drivers, and to a smaller extent msm (Qualcomm) and vc4 (RaspberryPi). On the other side, we have drivers maintained by people who are doing it as a part-time task.

Due to the current organisation of most teams dedicated to contributing embedded-ish platforms support, the drivers for these platforms tend to fall in the latter category.

As I’ve alluded to before, it’s great that this category exists in the first place. But KMS being a pretty large framework, and despite the great documentation, it has its corner cases, obscure features, and undocumented behaviour.

I’m confident most of these will gradually reduce over time, but until then, it creates subtle issues that only the experience of the people maintaining that driver can solve. And if you only have a couple of contributors maintaining a driver on the side, that experience isn’t going to accumulate and spread as much as we’d hope.

There’s two possible to that answer:

  • Wishing that the industry was different and we had large and committed teams to contribute hardware support to Linux;
  • Or reduce the experience needed for a contributor to write a reasonably functional driver.

I believe the latter to be a more practical solution. And while we have done well already, we can do much better. And doing so, improving the general quality of the KMS drivers.

Lowering the bar

The complexity of a KMS driver, compared to other Linux drivers, comes from different factors:

  • Display is a complicated field, even more so if you want it to be fast;
  • Hardware is hard, and there’s many design variations;
  • The framework is massive;
  • ARM systems typically run a single workload, e.g. Android, which will exercise a single use-case and code path;
  • And thus, it’s difficult to discover and exercise all the features.

The first ones are constraints more than they are problems. We will not be able to solve them, but we can mitigate them. The last one is solvable.

And we can do so through a few complementary efforts.

Improving the Common Infrastructure

Historically speaking, KMS started on Intel’s i915 driver. The infrastructure was non-existent then. As we added new drivers, we added helpers to deal with the boilerplate needed to create a driver. I believe this part is working well nowadays, and we can have a functional driver with around 200-300 lines of code if the hardware is simple enough.

But we kind of stopped there.

If you want to make a somewhat modern HDMI driver for example, then you need to deal with things like scrambling, InfoFrames, YUV or hot-plug in your driver. The specification is large enough that it creates many variations, subtle bugs, unhandled cases, etc. in every driver. For example, i915 was using a given algorithm to select its output format, vc4 was using a slightly different one, and most other HDMI drivers have their own logic.

It’s hard to get right for a driver author, even an experienced one, and it creates inconsistencies across systems for user-space. And it’s a shame, because most of the behaviour comes from either the HDMI specification, or the KMS API/ABI. Either way, they both are stable, predictable and driver-agnostic.

I started to work on a set of HDMI helpers to make it easier to write an HDMI driver. I initially started to work on HDMI 1.4 video only, and used the occasion to provide tons of unit tests to make sure it also works, and keeps working, as we expect. Since then, Dmitry Baryshkov from Linaro has been working to extend that interface to more drivers, and to add audio and CEC support.

HDMI is just the beginning, DisplayPort, and to a lower extent MIPI-DSI, are in a similar situation and would have the use of such helpers.

Providing Hints

As we already alluded to, KMS relies heavily on helpers to keep the drivers boilerplate (and bug!) free. As the framework grew, we often gained a few variants of those helpers with subtle differences. We try to keep the documentation updated, but deprecated variants are not always documented as such. Similarly, we have identified some dark patterns over the years, which are not really documented anywhere, or worse, known by everyone. As any other large, old enough, code base, we have a large number of drivers that are still using deprecated helpers or patterns.

As such, figuring out the best approach when writing a new driver can be difficult, and we cannot expect reviewers to know all those issues by heart and catch them every time.

This is typically done in other projects and languages by using a set of lints authors, reviewers and CI can run to make sure new patches follow a set of project specific conventions. Unfortunately, Linux doesn’t really use a linter (officially, anyway). Coccinelle could fit that bill though, and I believe providing a set of coccinelle scripts to match known dark patterns and deprecated function would be a great addition.

Improving Testing

DRM and KMS are arguably well tested already, even more so compared to the rest of the Linux kernel. We have unit tests, integration tests, and a test suite for the user-space API. We also have CI farms making sure these tests run.

But none of these test suites makes it any easier for someone to write a new driver, or fix an existing one. The unit tests and integration tests often target the framework itself and its helpers. It proves us that the functions the driver will rely on are working well, but we don’t know whether the driver itself behaves properly.

igt-gpu-tools (IGT), the user-space test suite I was mentioning earlier, could play that role. But it’s a pretty large test suite: there’s dozens of thousands of tests. To make things worse, not all tests are generic, and there’s no canonical list of tests that should pass on any given driver. Thus, figuring out the tests to run is full time job by itself. And so far, with Intel’s drivers exception, it’s main use is regression testing, i.e., making sure the driver keeps working as it used to, not making sure it works correctly.

Compliance Testing

The first helpful test improvement would be to work on a compliance test suite. By that, I mean a test suite that, instead of checking the relative quality of a driver like regression testing does, will test the absolute quality of it. In other terms, a test suite that assesses if a driver behaves like we expect a Linux KMS driver to behave.

It’s nothing new: most standards have these, Khronos has some for OpenGL and Vulkan, the v4l2 media framework has a v4l2-compliance tool that they require to run for any new driver.

The benefits such a tool would offer are two-folds: it allows a driver author to easily test their new driver for any bug or mistake, and it allows end users to quickly assess if the driver will work.

Now, we have first to agree on how a KMS driver has to behave. We never had this kind of discussion and agreement, and while we mostly agree, that mostly is load-bearing. After discussing it with different people, it seems reasonable to start by looking at the code path used by an established user such as Mutter, and start writing a test suite for it. That test suite should be part of IGT, and the best driver to start that effort would be VKMS. Once we’re there, we can then build up on top of that test suite and expand the testing to other features and APIs. And make more drivers compliant.

Output Testing

Another thing worth testing is the actual output. IGT, unit tests, regression tests, only ever test the software side of things. Drivers are the interface between the hardware and the software, so how the hardware behaves is important.

In other terms, you could have a perfect driver, validated by every test I described, and yet if it doesn’t program the hardware, it’s not going to work.

Some bugs are also pretty difficult to notice. I’ve had bugs where the HDMI output was offset by one pixel on the right because of an internal timings bugs. Or another one where some frames would end up displayed out of order.

This is pretty difficult to achieve, since we need dedicated hardware to achieve that kind of testing. A device that has been out for a while is Google’s Chamelium, and IGT has support for it. It has a few shortcomings though. First, it’s expensive and pretty difficult to get. It’s also based on an FPGA, so extending its capabilities isn’t the easiest for people with CS background.

We have other options though. One is the Auvidea B102 HDMI to MIPI-CSI bridge. It allows any system with a MIPI-CSI controller to capture an HDMI input, while controlling the EDIDs, hot-plug, InfoFrames, and other important features. MIPI-CSI is pretty ubiquitous these days on embedded platforms, so I’ve been able to build a prototype based on a RaspberryPi4. It’s entirely based on Linux, and thus drivers, interfaces, languages and tools Linux developers are familiar with. Its only main limitation is its max resolution: it can only capture up to a 1080p at 60Hz feed.

Another candidate is the Rockchip RK3588 which has an internal HDMI capture controller that can capture 4k streams. Collabora is working on a driver for that controller, so this is also a promising platform to examine and base our work on.

Conclusion

Any of these items would be valuable additions to KMS already. But if we want to make KMS grow into a more robust, consistent framework for all our drivers, we’ll need them all.

Some of them are already a work in progress, but we’ll need as many people as possible to cross the finish line.