Language Selection

English French German Italian Portuguese Spanish

Understanding a Kernel OOPS

Filed under
Linux
HowTos

Understanding a kernel panic and doing forensics to catch the bug is considered a hacker‘s job. This is one of those complex tasks that requires a sound knowledge of the architecture you are working on, and the internals of the Linux kernel. Depending the on the type of error detected by the kernel, panics in the Linux kernel are classified as hard panics (Aiee!) and soft panics (Oops!). This article explains a sample Linux kernel oops, and helps to create a simple oops and debug it. It is mainly intended for developers getting into kernel development, who need to debug the panics that the kernel throws at them. Knowledge of the Linux kernel, and C programming, is assumed.

An oops is what the kernel throws at us when it finds something faulty, or an exception, in the kernel code. Oopses are somewhat like the segfaults of user-space. An oops dumps its message on the console; it contains processor status, and CPU registers when the fault occurred. The offending process which triggered this oops gets killed without releasing locks or cleaning up structures. The system may not even resume its normal operations sometimes; this is called an unstable state. Once an oops has occurred, the system cannot be trusted any further.

Let’s try to generate an oops message with sample code, and try to understand the dump.

Setting up the machine to capture an oops

The running kernel should be compiled with CONFIG_DEBUG_INFO, and syslogd should be running. To generate and understand an oops message, let’s throw together a sample kernel module, oops.c:

rest here




More in Tux Machines

Leftovers: Gaming

Leftovers: Software

today's howtos

ACPI, kernels and contracts with firmware

This ends up being a pain in the neck in the x86 world, but it could be much worse. Way back in 2008 I wrote something about why the Linux kernel reports itself to firmware as "Windows" but refuses to identify itself as Linux. The short version is that "Linux" doesn't actually identify the behaviour of the kernel in a meaningful way. "Linux" doesn't tell you whether the kernel can deal with buffers being passed when the spec says it should be a package. "Linux" doesn't tell you whether the OS knows how to deal with an HPET. "Linux" doesn't tell you whether the OS can reinitialise graphics hardware. Read more