Understanding a Kernel OOPS

Filed under

Understanding a kernel panic and doing forensics to catch the bug is considered a hacker‘s job. This is one of those complex tasks that requires a sound knowledge of the architecture you are working on, and the internals of the Linux kernel. Depending the on the type of error detected by the kernel, panics in the Linux kernel are classified as hard panics (Aiee!) and soft panics (Oops!). This article explains a sample Linux kernel oops, and helps to create a simple oops and debug it. It is mainly intended for developers getting into kernel development, who need to debug the panics that the kernel throws at them. Knowledge of the Linux kernel, and C programming, is assumed.

An oops is what the kernel throws at us when it finds something faulty, or an exception, in the kernel code. Oopses are somewhat like the segfaults of user-space. An oops dumps its message on the console; it contains processor status, and CPU registers when the fault occurred. The offending process which triggered this oops gets killed without releasing locks or cleaning up structures. The system may not even resume its normal operations sometimes; this is called an unstable state. Once an oops has occurred, the system cannot be trusted any further.

Let’s try to generate an oops message with sample code, and try to understand the dump.

Setting up the machine to capture an oops

The running kernel should be compiled with CONFIG_DEBUG_INFO, and syslogd should be running. To generate and understand an oops message, let’s throw together a sample kernel module, oops.c:

rest here