Add the promised "Making the most out of a kernel panic" section based

on one of Bill Paul's postings on -current.

Hope I didn't screw up to badly since this is my first contact with
SGML apart from HTML :)

Encouraged-By:	jkh
This commit is contained in:
Dag-Erling Smørgrav 1998-09-22 22:09:54 +00:00
parent cee8640c93
commit 43688d005c
Notes: svn2git 2020-12-08 03:00:23 +00:00
svn path=/head/; revision=3534

View file

@ -1,4 +1,4 @@
<!-- $Id: hackers.sgml,v 1.5 1998-09-06 10:54:05 wosch Exp $ -->
<!-- $Id: hackers.sgml,v 1.6 1998-09-22 22:09:54 des Exp $ -->
<!-- The FreeBSD Documentation Project -->
<sect>
@ -288,4 +288,182 @@
<p>Kirk McKusick, September 1998</p>
<sect1>
<heading>Making the most of a kernel panic</heading>
<p>
<em>[This section was extracted from a mail written by <url
url="mailto:<wpaul@FreeBSD.ORG" name="Bill Paul"> on the
freebsd-current <ref id="mailing" name="mailing list"> by <url
url="mailto:des@FreeBSD.ORG" name="Dag-Erling Co&iuml;dan
Sm&oslash;rgrav">, who fixed a few typos and added the bracketed
comments]</em>
<p>
<verb>
From: Bill Paul <wpaul@skynet.ctr.columbia.edu>
Subject: Re: the fs fun never stops
To: ben@rosengart.com
Date: Sun, 20 Sep 1998 15:22:50 -0400 (EDT)
Cc: current@FreeBSD.ORG
</verb>
<p>
<em>[&lt;ben@rosengart.com&gt; posted the following panic
message]</em>
<verb>
> Fatal trap 12: page fault while in kernel mode
> fault virtual address = 0x40
> fault code = supervisor read, page not present
> instruction pointer = 0x8:0xf014a7e5
^^^^^^^^^^
> stack pointer = 0x10:0xf4ed6f24
> frame pointer = 0x10:0xf4ed6f28
> code segment = base 0x0, limit 0xfffff, type 0x1b
> = DPL 0, pres 1, def32 1, gran 1
> processor eflags = interrupt enabled, resume, IOPL = 0
> current process = 80 (mount)
> interrupt mask =
> trap number = 12
> panic: page fault
</verb>
<p> [When] you see a message like this, it's not enough to just
reproduce it and send it in. The instruction pointer value that
I highlighted up there is important; unfortunately, it's also
configuration dependent. In other words, the value varies
depending on the exact kernel image that you're using. If you're
using a GENERIC kernel image from one of the snapshots, then
it's possible for somebody else to track down the offending
function, but if you're running a custom kernel then only
<em/you/ can tell us where the fault occured.
<p> What you should do is this:
<itemize>
<item>Write down the instruction pointer value. Note that the
<tt/0x8:/ part at the begining is not significant in this case:
it's the <tt/0xf0xxxxxx/ part that we want.
<item>When the system reboots, do the following:
<verb>
% nm /kernel.that.caused.the.panic | grep f0xxxxxx
</verb>
where <tt/f0xxxxxx/ is the instruction pointer value. The
odds are you will not get an exact match since the symbols
in the kernel symbol table are for the entry points of
functions and the instruction pointer address will be
somewhere inside a function, not at the start. If you don't
get an exact match, omit the last digit from the instruction
pointer value and try again, i.e.:
<verb>
% nm /kernel.that.caused.the.panic | grep f0xxxxx
</verb>
If that doesn't yield any results, chop off another digit.
Repeat until you get some sort of output. The result will be
a possible list of functions which caused the panic. This is
a less than exact mechanism for tracking down the point of
failure, but it's better than nothing.
</itemize>
<p> I see people constantly show panic messages like this but
rarely do I see someone take the time to match up the
instruction pointer with a function in the kernel symbol table.
<p> The best way to track down the cause of a panic is by
capturing a crash dump, then using <tt/gdb(1)/ to to a stack
trace on the crash dump. Of course, this depends on <tt/gdb(1)/
in -current working correctly, which I can't guarantee (I recall
somebody saying that the new ELF-ized <tt/gdb(1)/ didn't handle
kernel crash dumps correctly: somebody should check this before
3.0 goes out of beta or there'll be a lot of red faces after the
CDs ship).
<p>
In any case, the method I nornally use is this:
<itemize>
<item>Set up a kernel config file, optionally adding 'options DDB' if you
think you need the kernel debugger for something. (I use this mainly
for setting beakpoints if I suspect an infinite loop condition of
some kind.)
<item>Use <tt/config -g KERNELCONFIG/ to set up the build directory.
<item><tt>cd /sys/compile/KERNELCONFIG; make</tt>
<item>Wait for kernel to finish compiling.
<item><tt/cp kernel kernel.debug/
<item><tt/strip -d kernel/
<item><tt/mv /kernel /kernel.orig/
<item><tt>cp kernel /</tt>
<item>reboot
</itemize>
<p> <em>[Note: currently, on 3.0-BETA, you must use <tt/strip
-aout -d/ instead of <tt/strip -d/]</em>
<p> Note that YOU DO <em/NOT/ WANT TO ACTUALLY BOOT THE KERNEL
WITH ALL THE DEBUG SYMBOLS IN IT. A kernel compiled with <tt/-g/
can easily be close to 10MB in size. You don't have to actually
boot this massive image: you only need it later for <tt/gdb(1)/
(<tt/gdb(1)/ wants the symbol table). Instead, you want to keep
a copy of the full image and create a second image with the
debug symbols stripped out using <tt/strip -d/. It is this
second stripped image that you want to boot.
<p> To make sure you capture a crash dump, you need edit
<tt>/etc/rc.conf</tt> and set <tt/dumpdev/ to point to your swap
partition. This will cause the <tt/rc(8)/ scripts to use the
<tt/dumpon(8)/ command to enable crash dumps. You can also run
<tt/dumpon(8)/ manually. After a panic, the crash dump can be
recovered using <tt/savecore(8)/; if <tt/dumpdev/ is set in
<tt>/etc/rc.conf</tt>, the <tt/rc(8)/ scripts will run
<tt/savecore(8)/ automatically and put the crash dump in
<tt>/var/crash</tt>.
<p> NOTE: FreeBSD crash dumps are usually the same size as the
physical RAM size of your machine. That is, if you have 64MB of
RAM, you will geta 64MB crash dump. Therefore you must make sure
there's enough space in <tt>/var/crash</tt> to hold the dump.
Alternatively, you run <tt/savecore(8)/ manually and have it
recover the crash dump to another directory where you have more
room. It's possible to limit the size of the crash dump by using
<tt/options MAXMEM=(foo)/ to set the amount of memory the kernel
will use to something a little more sensible. For example, if
you have 128MB of RAM, you can limit the kernel's memory usage
to 16MB so that your crash dump size will be 16MB instead of
128MB.
<p> Once you have recovered the crash dump, you can get a stack
trace with <tt/gdb(1)/ as follows:
<p>
<verb>
% gdb -k /sys/compile/KERNELCONFIG/kernel.debug /var/crash/vmcore.0
(gdb) where
</verb>
<p> Note that there may be several screens worth of information;
ideally you should use <tt/script(1)/ to capture all of them.
Using the unstripped kernel image with all the debug symbols
should show the exact line of kernel source code where the panic
occured. Usually you have to read the stack trace from the
bottom up in order to trace the exact sequence of events that
lead to the crash. You can also use <tt/gdb(1)/ to print out the
contents of various variables or structures in order to examine
the system state at the time of the crash.
<p> Now, if you're really insane and have a second computer, you
can also configure <tt/gdb(1)/ to do remote debugging such that
you can use <tt/gdb(1)/ on one system to debug the kernel on
another system, including setting breakpoints, single-stepping
through the kernel code, just like you can do with a normal
user-mode program. I haven't played with this yet as I don't
often have the chance to set up two machines side by side for
debugging purposes.
<p> <em>[Bill adds: "I forgot to mention one thing: if you have
DDB enabled and the kernel drops into the debugger, you can
force a panic (and a crash dump) just by typing 'panic' at the
ddb prompt. It may stop in the debugger again during the panic
phase. If it does, type 'continue' and it will finish the crash
dump." -ed]</em>
</sect>