Update Question 18.14, a major rework:
- Dissolve the original e-mail style description - Add a procedure on how to make (install|build)kernel - Recommend kgdb(1) instead of gdb(1) (based on the Developer's Handbook) - Improve markup (suggested by gabor) - Update path names - Turn "FreeBSD" into &os; - Merge wpaul's original comment into the answer Reviewed by: trhodes Approved by: gabor
This commit is contained in:
parent
0411abd4e6
commit
b909f782f9
Notes:
svn2git
2020-12-08 03:00:23 +00:00
svn path=/head/; revision=32488
1 changed files with 107 additions and 112 deletions
|
@ -10688,42 +10688,28 @@ hint.sio.7.irq="12"</programlisting>
|
|||
</question>
|
||||
|
||||
<answer>
|
||||
<para><emphasis>[This section was extracted from a mail
|
||||
written by &a.wpaul; on the freebsd-current
|
||||
<link linkend="mailing">mailing list</link> by &a.des;, who
|
||||
fixed a few typos and added the bracketed comments]
|
||||
</emphasis></para>
|
||||
<para>Here is typical kernel panic:</para>
|
||||
|
||||
<programlisting>From: Bill Paul <wpaul@skynet.ctr.columbia.edu>
|
||||
Subject: Re: the fs fun never stops
|
||||
To: Ben Rosengart
|
||||
Date: Sun, 20 Sep 1998 15:22:50 -0400 (EDT)
|
||||
Cc: current@FreeBSD.org</programlisting>
|
||||
<programlisting>Fatal trap 12: page fault while in kernel mode
|
||||
fault virtual address = 0x40
|
||||
fault code = supervisor read, page not present
|
||||
instruction pointer = 0x8:0xf014a7e5
|
||||
stack pointer = 0x10:0xf4ed6f24
|
||||
frame pointer = 0x10:0xf4ed6f28
|
||||
code segment = base 0x0, limit 0xfffff, type 0x1b
|
||||
= DPL 0, pres 1, def32 1, gran 1
|
||||
processor eflags = interrupt enabled, resume, IOPL = 0
|
||||
current process = 80 (mount)
|
||||
interrupt mask =
|
||||
trap number = 12
|
||||
panic: page fault</programlisting>
|
||||
|
||||
<para><emphasis>Ben Rosengart posted the following
|
||||
panic message]</emphasis></para>
|
||||
|
||||
<programlisting>> Fatal trap 12: page fault while in kernel mode
|
||||
> fault virtual address = 0x40
|
||||
> fault code = supervisor read, page not present
|
||||
> instruction pointer = 0x8:0xf014a7e5
|
||||
^^^^^^^^^^
|
||||
> stack pointer = 0x10:0xf4ed6f24
|
||||
> frame pointer = 0x10:0xf4ed6f28
|
||||
> code segment = base 0x0, limit 0xfffff, type 0x1b
|
||||
> = DPL 0, pres 1, def32 1, gran 1
|
||||
> processor eflags = interrupt enabled, resume, IOPL = 0
|
||||
> current process = 80 (mount)
|
||||
> interrupt mask =
|
||||
> trap number = 12
|
||||
> panic: page fault</programlisting>
|
||||
|
||||
<para>[When] you see a message like this, it is not enough to just
|
||||
reproduce it and send it in. The instruction pointer value that
|
||||
<para>When you see a message like this, it is not enough to just
|
||||
reproduce it and send it in. The instruction pointer value that
|
||||
I highlighted up there is important; unfortunately, it is also
|
||||
configuration dependent. In other words, the value varies
|
||||
depending on the exact kernel image that you are using. If
|
||||
you are using a GENERIC kernel image from one of the snapshots,
|
||||
configuration dependent. In other words, the value varies
|
||||
depending on the exact kernel image that you are using. If
|
||||
you are using a <filename>GENERIC</filename> kernel image from one of the snapshots,
|
||||
then it is possible for somebody else to track down the
|
||||
offending function, but if you are running a custom kernel then
|
||||
only <emphasis>you</emphasis> can tell us where the fault
|
||||
|
@ -10733,93 +10719,98 @@ Cc: current@FreeBSD.org</programlisting>
|
|||
|
||||
<procedure>
|
||||
<step>
|
||||
<para>Write down the instruction pointer value. Note that
|
||||
<para>Write down the instruction pointer value. Note that
|
||||
the <literal>0x8:</literal> part at the beginning is not
|
||||
significant in this case: it is the
|
||||
<literal>0xf0xxxxxx</literal> part that we want.</para>
|
||||
</step>
|
||||
|
||||
<step>
|
||||
<para>When the system reboots, do the following:
|
||||
<para>When the system reboots, do the following:</para>
|
||||
|
||||
<screen>&prompt.user; <userinput>nm -n /kernel.that.caused.the.panic | grep f0xxxxxx</userinput></screen>
|
||||
<screen>&prompt.user; <userinput><command>nm</command> <option>-n</option> <replaceable>kernel.that.caused.the.panic</replaceable> | <command>grep</command> f0xxxxxx</userinput></screen>
|
||||
|
||||
where <literal>f0xxxxxx</literal> is the instruction
|
||||
pointer value. The odds are you will not get an exact
|
||||
<para>where <literal>f0xxxxxx</literal> is the instruction
|
||||
pointer value. The odds are you will not get an exact
|
||||
match since the symbols in the kernel symbol table are
|
||||
for the entry points of functions and the instruction
|
||||
pointer address will be somewhere inside a function, not
|
||||
at the start. If you do not get an exact match, omit the
|
||||
last digit from the instruction pointer value and try
|
||||
again, i.e.:
|
||||
again, i.e.:</para>
|
||||
|
||||
<screen>&prompt.user; <userinput>nm -n /kernel.that.caused.the.panic | grep f0xxxxx</userinput></screen>
|
||||
<screen>&prompt.user; <userinput><command>nm</command> <option>-n</option> <replaceable>kernel.that.caused.the.panic</replaceable> | <command>grep</command> f0xxxxx</userinput></screen>
|
||||
|
||||
If that does not yield any results, chop off another
|
||||
digit. Repeat until you get some sort of output. The
|
||||
<para>If that does not yield any results, chop off another
|
||||
digit. Repeat until you get some sort of output. The
|
||||
result will be a possible list of functions which caused
|
||||
the panic. This is a less than exact mechanism for
|
||||
the panic. This is a less than exact mechanism for
|
||||
tracking down the point of failure, but it is better than
|
||||
nothing.</para>
|
||||
</step>
|
||||
</procedure>
|
||||
|
||||
<para>I see people constantly show panic messages like this
|
||||
but rarely do I see someone take the time to match up the
|
||||
instruction pointer with a function in the kernel symbol
|
||||
table.</para>
|
||||
|
||||
<para>The best way to track down the cause of a panic is by
|
||||
capturing a crash dump, then using &man.gdb.1; to generate
|
||||
<para>However, the best way to track down the cause of a panic is by
|
||||
capturing a crash dump, then using &man.kgdb.1; to generate
|
||||
a stack trace on the crash dump.</para>
|
||||
|
||||
<para>In any case, the method I normally use is this:</para>
|
||||
<para>In any case, the method is this:</para>
|
||||
|
||||
<procedure>
|
||||
<step>
|
||||
<para>Set up a kernel config file, optionally adding
|
||||
<literal>options DDB</literal> if you think you need
|
||||
the kernel debugger for something. (I use this mainly
|
||||
for setting breakpoints if I suspect an infinite loop
|
||||
condition of some kind.)</para>
|
||||
</step>
|
||||
<step>
|
||||
<para>Make sure that the following line is included in
|
||||
your kernel configuration file
|
||||
(/usr/src/sys/<replaceable>arch</replaceable>/conf/<replaceable>MYKERNEL</replaceable>):</para>
|
||||
|
||||
<step>
|
||||
<para>Use <command>config -g
|
||||
<replaceable>KERNELCONFIG</replaceable></command> to set
|
||||
up the build directory.</para>
|
||||
</step>
|
||||
<programlisting>makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols</programlisting>
|
||||
</step>
|
||||
|
||||
<step>
|
||||
<para><command>cd /sys/compile/<replaceable>KERNELCONFIG</replaceable>; make</command></para>
|
||||
</step>
|
||||
<step>
|
||||
<para>Change to the <filename
|
||||
role="directory">/usr/src</filename> directory:</para>
|
||||
|
||||
<step>
|
||||
<para>Wait for kernel to finish compiling.</para>
|
||||
</step>
|
||||
<screen>&prompt.root; <command>cd</command> <filename role="directory">/usr/src</filename></screen>
|
||||
</step>
|
||||
|
||||
<step>
|
||||
<para><command>make install</command></para>
|
||||
</step>
|
||||
<step>
|
||||
<para>Compile the kernel:</para>
|
||||
|
||||
<step>
|
||||
<para>reboot</para>
|
||||
</step>
|
||||
<screen>&prompt.root; <command>make</command> <maketarget>buildkernel</maketarget> <makevar>KERNCONFIG</makevar>=<replaceable>MYKERNEL</replaceable></screen>
|
||||
</step>
|
||||
|
||||
<step>
|
||||
<para>Wait for &man.make.1; to finish compiling.</para>
|
||||
</step>
|
||||
|
||||
<step>
|
||||
<screen>&prompt.root; <command>make</command> <maketarget>installkernel</maketarget> <makevar>KERNCONFIG</makevar>=<replaceable>MYKERNEL</replaceable></screen>
|
||||
</step>
|
||||
|
||||
<step>
|
||||
<para>Reboot.</para>
|
||||
</step>
|
||||
</procedure>
|
||||
|
||||
<para>The &man.make.1; process will have built two kernels.
|
||||
<filename>kernel</filename> and
|
||||
<filename>kernel.debug</filename>.
|
||||
<filename>kernel</filename> was installed as
|
||||
<filename>/kernel</filename>, while
|
||||
<filename>kernel.debug</filename> can be used as the
|
||||
source of debugging symbols for &man.gdb.1;.</para>
|
||||
<note>
|
||||
<para>If you do not use the <makevar>KERNCONFIG</makevar>
|
||||
make variable a <filename>GENERIC</filename> kernel will
|
||||
be built and installed.</para>
|
||||
</note>
|
||||
|
||||
<para>To make sure you capture a crash dump, you need edit
|
||||
<para>The &man.make.1; process will have built two kernels.
|
||||
<filename>/usr/obj/usr/src/sys/<replaceable>MYKERNEL</replaceable>/kernel</filename>
|
||||
and
|
||||
<filename>/usr/obj/usr/src/sys/<replaceable>MYKERNEL</replaceable>/kernel.debug</filename>.
|
||||
<filename>kernel</filename> was installed as
|
||||
<filename>/boot/kernel/kernel</filename>, while
|
||||
<filename>kernel.debug</filename> can be used as the source
|
||||
of debugging symbols for &man.kgdb.1;.</para>
|
||||
|
||||
<para>To make sure you capture a crash dump, you need edit
|
||||
<filename>/etc/rc.conf</filename> and set
|
||||
<literal>dumpdev</literal> to point to your swap
|
||||
partition. This will cause the &man.rc.8; scripts to use
|
||||
the &man.dumpon.8; command to enable crash dumps. You can
|
||||
partition (or <literal>AUTO</literal>). This will cause the &man.rc.8; scripts to use
|
||||
the &man.dumpon.8; command to enable crash dumps. You can
|
||||
also run &man.dumpon.8; manually. After a panic, the
|
||||
crash dump can be recovered using &man.savecore.8;; if
|
||||
<literal>dumpdev</literal> is set in
|
||||
|
@ -10828,27 +10819,28 @@ Cc: current@FreeBSD.org</programlisting>
|
|||
dump in <filename>/var/crash</filename>.</para>
|
||||
|
||||
<note>
|
||||
<para>FreeBSD crash dumps are usually the same size as the
|
||||
physical RAM size of your machine. That is, if you have
|
||||
64MB of RAM, you will get a 64MB crash dump. Therefore you
|
||||
<para>&os; crash dumps are usually the same size as the
|
||||
physical RAM size of your machine. That is, if you have
|
||||
512 MB of RAM, you will get a 512 MB crash dump. Therefore you
|
||||
must make sure there is enough space in
|
||||
<filename>/var/crash</filename> to hold the dump.
|
||||
Alternatively, you run &man.savecore.8;
|
||||
manually and have it recover the crash dump to another
|
||||
directory where you have more room. It is possible to limit
|
||||
directory where you have more room. It is possible to limit
|
||||
the size of the crash dump by using <literal>options
|
||||
MAXMEM=(foo)</literal> to set the amount of memory the
|
||||
kernel will use to something a little more sensible. For
|
||||
example, if you have 128MB of RAM, you can limit the
|
||||
kernel's memory usage to 16MB so that your crash dump size
|
||||
will be 16MB instead of 128MB.</para>
|
||||
MAXMEM=<replaceable>N</replaceable></literal> where
|
||||
<replaceable>N</replaceable> is the size of kernel's memory
|
||||
usage in KBs.
|
||||
For example, if you have 1 GB of RAM, you can limit the
|
||||
kernel's memory usage to 128 MB by this way, so that your crash dump size
|
||||
will be 128 MB instead of 1 GB.</para>
|
||||
</note>
|
||||
|
||||
<para>Once you have recovered the crash dump, you can get a
|
||||
stack trace with &man.gdb.1; as follows:</para>
|
||||
stack trace with &man.kgdb.1; as follows:</para>
|
||||
|
||||
<screen>&prompt.user; <userinput>gdb -k /sys/compile/KERNELCONFIG/kernel.debug /var/crash/vmcore.0</userinput>
|
||||
<prompt>(gdb)</prompt> <userinput>where</userinput></screen>
|
||||
<screen>&prompt.user; <userinput><command>kgdb</command> <filename>/usr/obj/usr/src/sys/<replaceable>MYKERNEL</replaceable>/kernel.debug</filename> <filename>/var/crash/<replaceable>vmcore.0</replaceable></filename></userinput>
|
||||
<prompt>(kgdb)</prompt> <userinput>backtrace</userinput></screen>
|
||||
|
||||
<para>Note that there may be several screens worth of
|
||||
information; ideally you should use
|
||||
|
@ -10857,25 +10849,28 @@ Cc: current@FreeBSD.org</programlisting>
|
|||
the exact line of kernel source code where the panic occurred.
|
||||
Usually you have to read the stack trace from the bottom up in
|
||||
order to trace the exact sequence of events that lead to the
|
||||
crash. You can also use &man.gdb.1; to print out
|
||||
crash. You can also use &man.kgdb.1; to print out
|
||||
the contents of various variables or structures in order to
|
||||
examine the system state at the time of the crash.</para>
|
||||
|
||||
<para>Now, if you are really insane and have a second computer,
|
||||
you can also configure &man.gdb.1; to do remote
|
||||
debugging such that you can use &man.gdb.1; on
|
||||
one system to debug the kernel on another system, including
|
||||
setting breakpoints, single-stepping through the kernel code,
|
||||
just like you can do with a normal user-mode program. I have not
|
||||
played with this yet as I do not often have the chance to set up
|
||||
two machines side by side for debugging purposes.</para>
|
||||
<tip>
|
||||
<para>Now, if you are really insane and have a second
|
||||
computer, you can also configure &man.kgdb.1; to do remote
|
||||
debugging such that you can use &man.kgdb.1; on one system
|
||||
to debug the kernel on another system, including setting
|
||||
breakpoints, single-stepping through the kernel code, just
|
||||
like you can do with a normal user-mode program.</para>
|
||||
</tip>
|
||||
|
||||
<para><emphasis>[Bill adds: "I forgot to mention one thing: if
|
||||
you have DDB enabled and the kernel drops into the debugger,
|
||||
you can force a panic (and a crash dump) just by typing 'panic'
|
||||
at the ddb prompt. It may stop in the debugger again during the
|
||||
panic phase. If it does, type 'continue' and it will finish the
|
||||
crash dump." -ed]</emphasis></para>
|
||||
<note>
|
||||
<para>If you have <literal>DDB</literal> enabled and the
|
||||
kernel drops into the debugger, you can force a panic (and a
|
||||
crash dump) just by typing <literal>panic</literal> at the
|
||||
<literal>ddb</literal> prompt. It may stop in the
|
||||
debugger again during the panic phase. If it does, type
|
||||
<literal>continue</literal> and it will finish the crash
|
||||
dump.</para>
|
||||
</note>
|
||||
</answer>
|
||||
</qandaentry>
|
||||
|
||||
|
|
Loading…
Reference in a new issue