Home > Writing >

Adventures In Linux


I just received the first closure of an issue I had with GNOME. The conclusion is that the GNOME project only supports a small subset of functioning Linux kernels. I figured it was time to start documenting all the edge cases I'm encountering by simply rolling my own kernels.

But Why?

Because of a motherboard failure on my main desktop, I was forced onto an old frankenstein of a machine: a Core2Duo rig from 2007. It was rather poky. To assuage slow boot-ups, I spent a little time playing with kernel builds and initramfs stuff. I cut my boot times down in half or more.

The Status Quo

The current status quo in Linux land is a little weird to me. Kernels are designed to support a maximum number of potential cases. In order to achieve this, Linux has developed a literally schizophrenic solution where a little bit of the OS is pre-loaded by the bootloader, then that little bit picks the rest of the system up by the bootloader. This strategy is called two different things by different distributions, initramfs and initrd. Additionally, the Linux kernel supports modules. The practical upshot of all of this is simply put: unnecessary "bloat". It's not like the distributions can really avoid the bloat, especially if they're looking at producing a package that works everywhere.

The more surprising part is that the conversation ends there. Because this is the status quo for kernel builds, scarce few people ever venture beyond this. The situation is so extreme that the GNOME project does not support Bluetooth on kernels with RFKILL support disabled. All I had to do to make their code work was to comment out the part of the logic that disabled Bluetooth support. There is not even any interest in dealing with such a kernel. Patching the software to work right doesn't seem that hard in theory, but working with D-Bus is a challenge until you know how to think in D-Bus.

What Can Be Done

The Linux kernel is really constructed around the idea that people will build their own kernels. It provides a lot of functionality so that the skilled hacker can do surprising things. The problem is that the documentation is not there, or when it is, it's aimed at people who have highly-specialized domain-specific knowledge. Part of the point of this page is to try put all of this together into a more approachable context for users with less expertise who want to run a highly-custom system. Note that part of the fun, at least for me, is discovering these weird corner cases. You will continue to run into these! This morning's issue was documented here: libbpf error on booting a custom kernel.

But what do I get out of it? Blazing fast load times, though these are increasingly irrelevant. Removal of complicated boot-time logic, which is largely irrelevant after boot. Now that I've got my old desktop running again, my kernel is loaded directly by the EFI BIOS. I can add and select new kernels from BIOS now. The kernels I've been building are more compact than the initramfs that Arch and Debian use to help bootstrap their bloaty kernels! I also get a new source of weird error messages, all of which are tractable and enlightening to explore.

My Approach

As you might have gathered from all of this, I am a hardline minimalist. I want to minimize the complexity of the kernel I run while retaining as much functionality as I desire. I am not averse to tracking down weird edge-case bugs. Such bugs help me learn things I wouldn't have known otherwise!

Technical Details

Preemption Model

This does not work as advertised on the box. The contemporary situation is very different from how things were when this construct was developed. There are three choices: Server, Desktop, and Low-Latency Desktop. In the help on the Desktop option, we read this:

This option reduces the latency of the kernel by adding more "explicit preemption points" to the kernel code. These new preemption points have been selected to reduce the maximum latency of rescheduling, providing faster application reactions, at the cost of slightly lower throughput.

This gives you a clearer picture of what's going on when you select this option. This section is about dealing with the kernel (and just the kernel) taking longer than expected. These situations are rare!

The proper low-complexity option here is Server. Unfortunately, in a system-taxing scenario like building a large software package, it causes media stutters.

LVM Drive Failure

I've recently had an LVM drive failure. The correct way to handle this is apparently to create a new PV with pvcreate and ensure the UUID matches your old one. I didn't do this, to my chagrin.

Additionally, because I took my server down improperly, I forgot to deactivate two volumes, then hit a SATA cable, so two of the five RAID6 drives got marked as failed for those logical volumes. Fortunately, RAID6. Anyhow, after purging the old disk, I was able to restore the missing info with the lvconvert --repair VG/LV command.

There was some corruption on an XFS volume of mine. xfs_rescue took serious persuasion to restore my filesystem. It would hang during the process. Eventually, I found a solution. The -P flag disables prefetching which can cause the hang I experienced. Even with that flag, xfs_rescue failed the first time and had no counsel for me except to rerun. I reran it and it finally completed successfully.