957 lines
43 KiB
XML
957 lines
43 KiB
XML
<?xml version="1.0" encoding="ISO8859-1" standalone="no"?>
|
|
<!--
|
|
The FreeBSD Documentation Project
|
|
|
|
$FreeBSD$
|
|
-->
|
|
|
|
<chapter id="kerneldebug">
|
|
<chapterinfo>
|
|
<authorgroup>
|
|
<author>
|
|
<firstname>Paul</firstname>
|
|
<surname>Richards</surname>
|
|
<contrib>Contributed by </contrib>
|
|
</author>
|
|
<author>
|
|
<firstname>Jörg</firstname>
|
|
<surname>Wunsch</surname>
|
|
</author>
|
|
<author>
|
|
<firstname>Robert</firstname>
|
|
<surname>Watson</surname>
|
|
</author>
|
|
</authorgroup>
|
|
</chapterinfo>
|
|
|
|
<title>Kernel Debugging</title>
|
|
|
|
<sect1 id="kerneldebug-obtain">
|
|
<title>Obtaining a Kernel Crash Dump</title>
|
|
|
|
<para>When running a development kernel (e.g., &os.current;), such as a
|
|
kernel under extreme conditions (e.g., very high load averages,
|
|
tens of thousands of connections, exceedingly high number of
|
|
concurrent users, hundreds of &man.jail.8;s, etc.), or using a
|
|
new feature or device driver on &os.stable; (e.g.,
|
|
<acronym>PAE</acronym>), sometimes a kernel will panic. In the
|
|
event that it does, this chapter will demonstrate how to extract
|
|
useful information out of a crash.</para>
|
|
|
|
<para>A system reboot is inevitable once a kernel panics. Once a
|
|
system is rebooted, the contents of a system's physical memory
|
|
(<acronym>RAM</acronym>) is lost, as well as any bits that are
|
|
on the swap device before the panic. To preserve the bits in
|
|
physical memory, the kernel makes use of the swap device as a
|
|
temporary place to store the bits that are in RAM across a
|
|
reboot after a crash. In doing this, when &os; boots after a
|
|
crash, a kernel image can now be extracted and debugging can
|
|
take place.</para>
|
|
|
|
<note><para>A swap device that has been configured as a dump
|
|
device still acts as a swap device. Dumps to non-swap devices
|
|
(such as tapes or CDRWs, for example) are not supported at this time. A
|
|
<quote>swap device</quote> is synonymous with a <quote>swap
|
|
partition.</quote></para></note>
|
|
|
|
<para>Several types of kernel crash dumps are available: full memory
|
|
dumps, which hold the complete contents of physical memory,
|
|
minidumps, which hold only memory pages in use by the kernel
|
|
(&os; 6.2 and higher), and textdumps, which hold captured
|
|
scripted or interactive debugger output (&os; 7.1 and higher).
|
|
Minidumps are the default dump type as of &os; 7.0, and in most
|
|
cases will capture all necessary information present in a full
|
|
memory dump, as most problems can be isolated only using kernel
|
|
state.</para>
|
|
|
|
<sect2 id="config-dumpdev">
|
|
<title>Configuring the Dump Device</title>
|
|
|
|
<para>Before the kernel will dump the contents of its physical
|
|
memory to a dump device, a dump device must be configured. A
|
|
dump device is specified by using the &man.dumpon.8; command
|
|
to tell the kernel where to save kernel crash dumps. The
|
|
&man.dumpon.8; program must be called after the swap partition
|
|
has been configured with &man.swapon.8;. This is normally
|
|
handled by setting the <varname>dumpdev</varname> variable in
|
|
&man.rc.conf.5; to the path of the swap device (the
|
|
recommended way to extract a kernel dump) or
|
|
<literal>AUTO</literal> to use the first configured swap
|
|
device. The default for <varname>dumpdev</varname> is
|
|
<literal>AUTO</literal> in HEAD, and changed to
|
|
<literal>NO</literal> on RELENG_* branches (except for RELENG_7,
|
|
which was left set to <literal>AUTO</literal>).
|
|
On &os; 9.0-RELEASE and later versions,
|
|
<application>bsdinstall</application> will ask whether crash dumps
|
|
should be enabled on the target system during the install process.</para>
|
|
|
|
<tip><para>Check <filename>/etc/fstab</filename> or
|
|
&man.swapinfo.8; for a list of swap devices.</para></tip>
|
|
|
|
<important><para>Make sure the <varname>dumpdir</varname>
|
|
specified in &man.rc.conf.5; exists before a kernel
|
|
crash!</para>
|
|
|
|
<screen>&prompt.root; <userinput>mkdir /var/crash</userinput>
|
|
&prompt.root; <userinput>chmod 700 /var/crash</userinput></screen>
|
|
|
|
<para>Also, remember that the contents of
|
|
<filename>/var/crash</filename> is sensitive and very likely
|
|
contains confidential information such as passwords.</para>
|
|
</important>
|
|
</sect2>
|
|
|
|
<sect2 id="extract-dump">
|
|
<title>Extracting a Kernel Dump</title>
|
|
|
|
<para>Once a dump has been written to a dump device, the dump
|
|
must be extracted before the swap device is mounted.
|
|
To extract a dump
|
|
from a dump device, use the &man.savecore.8; program. If
|
|
<varname>dumpdev</varname> has been set in &man.rc.conf.5;,
|
|
&man.savecore.8; will be called automatically on the first
|
|
multi-user boot after the crash and before the swap device
|
|
is mounted. The location of the extracted core is placed in
|
|
the &man.rc.conf.5; value <varname>dumpdir</varname>, by
|
|
default <filename>/var/crash</filename> and will be named
|
|
<filename>vmcore.0</filename>.</para>
|
|
|
|
<para>In the event that there is already a file called
|
|
<filename>vmcore.0</filename> in
|
|
<filename>/var/crash</filename> (or whatever
|
|
<varname>dumpdir</varname> is set to), the kernel will
|
|
increment the trailing number for every crash to avoid
|
|
overwriting an existing <filename>vmcore</filename> (e.g.,
|
|
<filename>vmcore.1</filename>). While debugging, it is
|
|
highly likely that you will want to use the highest version
|
|
<filename>vmcore</filename> in
|
|
<filename>/var/crash</filename> when searching for the right
|
|
<filename>vmcore</filename>.</para>
|
|
|
|
<tip>
|
|
<para>If you are testing a new kernel but need to boot a different one in
|
|
order to get your system up and running again, boot it only into single
|
|
user mode using the <option>-s</option> flag at the boot prompt, and
|
|
then perform the following steps:</para>
|
|
|
|
<screen>&prompt.root; <userinput>fsck -p</userinput>
|
|
&prompt.root; <userinput>mount -a -t ufs</userinput> # make sure /var/crash is writable
|
|
&prompt.root; <userinput>savecore /var/crash /dev/ad0s1b</userinput>
|
|
&prompt.root; <userinput>exit</userinput> # exit to multi-user</screen>
|
|
|
|
<para>This instructs &man.savecore.8; to extract a kernel dump
|
|
from <filename>/dev/ad0s1b</filename> and place the contents in
|
|
<filename>/var/crash</filename>. Do not forget to make sure the
|
|
destination directory <filename>/var/crash</filename> has enough
|
|
space for the dump. Also, do not forget to specify the correct path to your swap
|
|
device as it is likely different than
|
|
<filename>/dev/ad0s1b</filename>!</para></tip>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1 id="kerneldebug-gdb">
|
|
<title>Debugging a Kernel Crash Dump with <command>kgdb</command></title>
|
|
|
|
<note>
|
|
<para>This section covers &man.kgdb.1; as found in &os; 5.3
|
|
and later. In previous versions, one must use
|
|
<command>gdb -k</command> to read a core dump file.</para>
|
|
</note>
|
|
|
|
<para>Once a dump has been obtained, getting useful information
|
|
out of the dump is relatively easy for simple problems. Before
|
|
launching into the internals of &man.kgdb.1; to debug
|
|
the crash dump, locate the debug version of your kernel
|
|
(normally called <filename>kernel.debug</filename>) and the path
|
|
to the source files used to build your kernel (normally
|
|
<filename>/usr/obj/usr/src/sys/<replaceable>KERNCONF</replaceable></filename>,
|
|
where <filename><replaceable>KERNCONF</replaceable></filename>
|
|
is the <varname>ident</varname> specified in a kernel
|
|
&man.config.5;). With those two pieces of info, let the
|
|
debugging commence!</para>
|
|
|
|
<para>To enter into the debugger and begin getting information
|
|
from the dump, the following steps are required at a minimum:</para>
|
|
|
|
<screen>&prompt.root; <userinput>cd /usr/obj/usr/src/sys/<replaceable>KERNCONF</replaceable></userinput>
|
|
&prompt.root; <userinput>kgdb kernel.debug /var/crash/vmcore.0</userinput></screen>
|
|
|
|
<para>You can debug the crash dump using the kernel sources just like
|
|
you can for any other program.</para>
|
|
|
|
<para>This first dump is from a 5.2-BETA kernel and the crash
|
|
comes from deep within the kernel. The output below has been
|
|
modified to include line numbers on the left. This first trace
|
|
inspects the instruction pointer and obtains a back trace. The
|
|
address that is used on line 41 for the <command>list</command>
|
|
command is the instruction pointer and can be found on line
|
|
17. Most developers will request having at least this
|
|
information sent to them if you are unable to debug the problem
|
|
yourself. If, however, you do solve the problem, make sure that
|
|
your patch winds its way into the source tree via a problem
|
|
report, mailing lists, or by being able to commit it!</para>
|
|
|
|
<screen> 1:&prompt.root; <userinput>cd /usr/obj/usr/src/sys/<replaceable>KERNCONF</replaceable></userinput>
|
|
2:&prompt.root; <userinput>kgdb kernel.debug /var/crash/vmcore.0</userinput>
|
|
3:GNU gdb 5.2.1 (FreeBSD)
|
|
4:Copyright 2002 Free Software Foundation, Inc.
|
|
5:GDB is free software, covered by the GNU General Public License, and you are
|
|
6:welcome to change it and/or distribute copies of it under certain conditions.
|
|
7:Type "show copying" to see the conditions.
|
|
8:There is absolutely no warranty for GDB. Type "show warranty" for details.
|
|
9:This GDB was configured as "i386-undermydesk-freebsd"...
|
|
10:panic: page fault
|
|
11:panic messages:
|
|
12:---
|
|
13:Fatal trap 12: page fault while in kernel mode
|
|
14:cpuid = 0; apic id = 00
|
|
15:fault virtual address = 0x300
|
|
16:fault code: = supervisor read, page not present
|
|
17:instruction pointer = 0x8:0xc0713860
|
|
18:stack pointer = 0x10:0xdc1d0b70
|
|
19:frame pointer = 0x10:0xdc1d0b7c
|
|
20:code segment = base 0x0, limit 0xfffff, type 0x1b
|
|
21: = DPL 0, pres 1, def32 1, gran 1
|
|
22:processor eflags = resume, IOPL = 0
|
|
23:current process = 14394 (uname)
|
|
24:trap number = 12
|
|
25:panic: page fault
|
|
26 cpuid = 0;
|
|
27:Stack backtrace:
|
|
28
|
|
29:syncing disks, buffers remaining... 2199 2199 panic: mi_switch: switch in a critical section
|
|
30:cpuid = 0;
|
|
31:Uptime: 2h43m19s
|
|
32:Dumping 255 MB
|
|
33: 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240
|
|
34:---
|
|
35:Reading symbols from /boot/kernel/snd_maestro3.ko...done.
|
|
36:Loaded symbols for /boot/kernel/snd_maestro3.ko
|
|
37:Reading symbols from /boot/kernel/snd_pcm.ko...done.
|
|
38:Loaded symbols for /boot/kernel/snd_pcm.ko
|
|
39:#0 doadump () at /usr/src/sys/kern/kern_shutdown.c:240
|
|
40:240 dumping++;
|
|
41:<prompt>(kgdb)</prompt> <userinput>list *0xc0713860</userinput>
|
|
42:0xc0713860 is in lapic_ipi_wait (/usr/src/sys/i386/i386/local_apic.c:663).
|
|
43:658 incr = 0;
|
|
44:659 delay = 1;
|
|
45:660 } else
|
|
46:661 incr = 1;
|
|
47:662 for (x = 0; x < delay; x += incr) {
|
|
48:663 if ((lapic->icr_lo & APIC_DELSTAT_MASK) == APIC_DELSTAT_IDLE)
|
|
49:664 return (1);
|
|
50:665 ia32_pause();
|
|
51:666 }
|
|
52:667 return (0);
|
|
53:<prompt>(kgdb)</prompt> <userinput>backtrace</userinput>
|
|
54:#0 doadump () at /usr/src/sys/kern/kern_shutdown.c:240
|
|
55:#1 0xc055fd9b in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:372
|
|
56:#2 0xc056019d in panic () at /usr/src/sys/kern/kern_shutdown.c:550
|
|
57:#3 0xc0567ef5 in mi_switch () at /usr/src/sys/kern/kern_synch.c:470
|
|
58:#4 0xc055fa87 in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:312
|
|
59:#5 0xc056019d in panic () at /usr/src/sys/kern/kern_shutdown.c:550
|
|
60:#6 0xc0720c66 in trap_fatal (frame=0xdc1d0b30, eva=0)
|
|
61: at /usr/src/sys/i386/i386/trap.c:821
|
|
62:#7 0xc07202b3 in trap (frame=
|
|
63: {tf_fs = -1065484264, tf_es = -1065484272, tf_ds = -1065484272, tf_edi = 1, tf_esi = 0, tf_ebp = -602076292, tf_isp = -602076324, tf_ebx = 0, tf_edx = 0, tf_ecx = 1000000, tf_eax = 243, tf_trapno = 12, tf_err = 0, tf_eip = -1066321824, tf_cs = 8, tf_eflags = 65671, tf_esp = 243, tf_ss = 0})
|
|
64: at /usr/src/sys/i386/i386/trap.c:250
|
|
65:#8 0xc070c9f8 in calltrap () at {standard input}:94
|
|
66:#9 0xc07139f3 in lapic_ipi_vectored (vector=0, dest=0)
|
|
67: at /usr/src/sys/i386/i386/local_apic.c:733
|
|
68:#10 0xc0718b23 in ipi_selected (cpus=1, ipi=1)
|
|
69: at /usr/src/sys/i386/i386/mp_machdep.c:1115
|
|
70:#11 0xc057473e in kseq_notify (ke=0xcc05e360, cpu=0)
|
|
71: at /usr/src/sys/kern/sched_ule.c:520
|
|
72:#12 0xc0575cad in sched_add (td=0xcbcf5c80)
|
|
73: at /usr/src/sys/kern/sched_ule.c:1366
|
|
74:#13 0xc05666c6 in setrunqueue (td=0xcc05e360)
|
|
75: at /usr/src/sys/kern/kern_switch.c:422
|
|
76:#14 0xc05752f4 in sched_wakeup (td=0xcbcf5c80)
|
|
77: at /usr/src/sys/kern/sched_ule.c:999
|
|
78:#15 0xc056816c in setrunnable (td=0xcbcf5c80)
|
|
79: at /usr/src/sys/kern/kern_synch.c:570
|
|
80:#16 0xc0567d53 in wakeup (ident=0xcbcf5c80)
|
|
81: at /usr/src/sys/kern/kern_synch.c:411
|
|
82:#17 0xc05490a8 in exit1 (td=0xcbcf5b40, rv=0)
|
|
83: at /usr/src/sys/kern/kern_exit.c:509
|
|
84:#18 0xc0548011 in sys_exit () at /usr/src/sys/kern/kern_exit.c:102
|
|
85:#19 0xc0720fd0 in syscall (frame=
|
|
86: {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 0, tf_esi = -1, tf_ebp = -1077940712, tf_isp = -602075788, tf_ebx = 672411944, tf_edx = 10, tf_ecx = 672411600, tf_eax = 1, tf_trapno = 12, tf_err = 2, tf_eip = 671899563, tf_cs = 31, tf_eflags = 642, tf_esp = -1077940740, tf_ss = 47})
|
|
87: at /usr/src/sys/i386/i386/trap.c:1010
|
|
88:#20 0xc070ca4d in Xint0x80_syscall () at {standard input}:136
|
|
89:---Can't read userspace from dump, or kernel process---
|
|
90:<prompt>(kgdb)</prompt> <userinput>quit</userinput></screen>
|
|
|
|
|
|
<para>This next trace is an older dump from the FreeBSD 2 time
|
|
frame, but is more involved and demonstrates more of the
|
|
features of <command>gdb</command>. Long lines have been folded
|
|
to improve readability, and the lines are numbered for
|
|
reference. Despite this, it is a real-world error trace taken
|
|
during the development of the pcvt console driver.</para>
|
|
|
|
<screen> 1:Script started on Fri Dec 30 23:15:22 1994
|
|
2:&prompt.root; <userinput>cd /sys/compile/URIAH</userinput>
|
|
3:&prompt.root; <userinput>gdb -k kernel /var/crash/vmcore.1</userinput>
|
|
4:Reading symbol data from /usr/src/sys/compile/URIAH/kernel
|
|
...done.
|
|
5:IdlePTD 1f3000
|
|
6:panic: because you said to!
|
|
7:current pcb at 1e3f70
|
|
8:Reading in symbols for ../../i386/i386/machdep.c...done.
|
|
9:<prompt>(kgdb)</prompt> <userinput>backtrace</userinput>
|
|
10:#0 boot (arghowto=256) (../../i386/i386/machdep.c line 767)
|
|
11:#1 0xf0115159 in panic ()
|
|
12:#2 0xf01955bd in diediedie () (../../i386/i386/machdep.c line 698)
|
|
13:#3 0xf010185e in db_fncall ()
|
|
14:#4 0xf0101586 in db_command (-266509132, -266509516, -267381073)
|
|
15:#5 0xf0101711 in db_command_loop ()
|
|
16:#6 0xf01040a0 in db_trap ()
|
|
17:#7 0xf0192976 in kdb_trap (12, 0, -272630436, -266743723)
|
|
18:#8 0xf019d2eb in trap_fatal (...)
|
|
19:#9 0xf019ce60 in trap_pfault (...)
|
|
20:#10 0xf019cb2f in trap (...)
|
|
21:#11 0xf01932a1 in exception:calltrap ()
|
|
22:#12 0xf0191503 in cnopen (...)
|
|
23:#13 0xf0132c34 in spec_open ()
|
|
24:#14 0xf012d014 in vn_open ()
|
|
25:#15 0xf012a183 in open ()
|
|
26:#16 0xf019d4eb in syscall (...)
|
|
27:<prompt>(kgdb)</prompt> <userinput>up 10</userinput>
|
|
28:Reading in symbols for ../../i386/i386/trap.c...done.
|
|
29:#10 0xf019cb2f in trap (frame={tf_es = -260440048, tf_ds = 16, tf_\
|
|
30:edi = 3072, tf_esi = -266445372, tf_ebp = -272630356, tf_isp = -27\
|
|
31:2630396, tf_ebx = -266427884, tf_edx = 12, tf_ecx = -266427884, tf\
|
|
32:_eax = 64772224, tf_trapno = 12, tf_err = -272695296, tf_eip = -26\
|
|
33:6672343, tf_cs = -266469368, tf_eflags = 66066, tf_esp = 3072, tf_\
|
|
34:ss = -266427884}) (../../i386/i386/trap.c line 283)
|
|
35:283 (void) trap_pfault(&frame, FALSE);
|
|
36:<prompt>(kgdb)</prompt> <userinput>frame frame->tf_ebp frame->tf_eip</userinput>
|
|
37:Reading in symbols for ../../i386/isa/pcvt/pcvt_drv.c...done.
|
|
38:#0 0xf01ae729 in pcopen (dev=3072, flag=3, mode=8192, p=(struct p\
|
|
39:roc *) 0xf07c0c00) (../../i386/isa/pcvt/pcvt_drv.c line 403)
|
|
40:403 return ((*linesw[tp->t_line].l_open)(dev, tp));
|
|
41:<prompt>(kgdb)</prompt> <userinput>list</userinput>
|
|
42:398
|
|
43:399 tp->t_state |= TS_CARR_ON;
|
|
44:400 tp->t_cflag |= CLOCAL; /* cannot be a modem (:-) */
|
|
45:401
|
|
46:402 #if PCVT_NETBSD || (PCVT_FREEBSD >= 200)
|
|
47:403 return ((*linesw[tp->t_line].l_open)(dev, tp));
|
|
48:404 #else
|
|
49:405 return ((*linesw[tp->t_line].l_open)(dev, tp, flag));
|
|
50:406 #endif /* PCVT_NETBSD || (PCVT_FREEBSD >= 200) */
|
|
51:407 }
|
|
52:<prompt>(kgdb)</prompt> <userinput>print tp</userinput>
|
|
53:Reading in symbols for ../../i386/i386/cons.c...done.
|
|
54:$1 = (struct tty *) 0x1bae
|
|
55:<prompt>(kgdb)</prompt> <userinput>print tp->t_line</userinput>
|
|
56:$2 = 1767990816
|
|
57:<prompt>(kgdb)</prompt> <userinput>up</userinput>
|
|
58:#1 0xf0191503 in cnopen (dev=0x00000000, flag=3, mode=8192, p=(st\
|
|
59:ruct proc *) 0xf07c0c00) (../../i386/i386/cons.c line 126)
|
|
60: return ((*cdevsw[major(dev)].d_open)(dev, flag, mode, p));
|
|
61:<prompt>(kgdb)</prompt> <userinput>up</userinput>
|
|
62:#2 0xf0132c34 in spec_open ()
|
|
63:<prompt>(kgdb)</prompt> <userinput>up</userinput>
|
|
64:#3 0xf012d014 in vn_open ()
|
|
65:<prompt>(kgdb)</prompt> <userinput>up</userinput>
|
|
66:#4 0xf012a183 in open ()
|
|
67:<prompt>(kgdb)</prompt> <userinput>up</userinput>
|
|
68:#5 0xf019d4eb in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi =\
|
|
69: 2158592, tf_esi = 0, tf_ebp = -272638436, tf_isp = -272629788, tf\
|
|
70:_ebx = 7086, tf_edx = 1, tf_ecx = 0, tf_eax = 5, tf_trapno = 582, \
|
|
71:tf_err = 582, tf_eip = 75749, tf_cs = 31, tf_eflags = 582, tf_esp \
|
|
72:= -272638456, tf_ss = 39}) (../../i386/i386/trap.c line 673)
|
|
73:673 error = (*callp->sy_call)(p, args, rval);
|
|
74:<prompt>(kgdb)</prompt> <userinput>up</userinput>
|
|
75:Initial frame selected; you cannot go up.
|
|
76:<prompt>(kgdb)</prompt> <userinput>quit</userinput></screen>
|
|
<para>Comments to the above script:</para>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>line 6:</term>
|
|
|
|
<listitem>
|
|
<para>This is a dump taken from within DDB (see below), hence the
|
|
panic comment <quote>because you said to!</quote>, and a rather
|
|
long stack trace; the initial reason for going into DDB has been a
|
|
page fault trap though.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>line 20:</term>
|
|
|
|
<listitem>
|
|
<para>This is the location of function <function>trap()</function>
|
|
in the stack trace.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>line 36:</term>
|
|
|
|
<listitem>
|
|
<para>Force usage of a new stack frame; this is no longer necessary.
|
|
The stack frames are supposed to point to the right
|
|
locations now, even in case of a trap.
|
|
From looking at the code in source line 403, there is a
|
|
high probability that either the pointer access for
|
|
<quote>tp</quote> was messed up, or the array access was out of
|
|
bounds.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>line 52:</term>
|
|
|
|
<listitem>
|
|
<para>The pointer looks suspicious, but happens to be a valid
|
|
address.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>line 56:</term>
|
|
|
|
<listitem>
|
|
<para>However, it obviously points to garbage, so we have found our
|
|
error! (For those unfamiliar with that particular piece of code:
|
|
<literal>tp->t_line</literal> refers to the line discipline of
|
|
the console device here, which must be a rather small integer
|
|
number.)</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
|
|
<tip><para>If your system is crashing regularly and you are running
|
|
out of disk space, deleting old <filename>vmcore</filename>
|
|
files in <filename>/var/crash</filename> could save a
|
|
considerable amount of disk space!</para></tip>
|
|
</sect1>
|
|
|
|
<sect1 id="kerneldebug-ddd">
|
|
<title>Debugging a Crash Dump with DDD</title>
|
|
|
|
<para>Examining a kernel crash dump with a graphical debugger like
|
|
<command>ddd</command> is also possible (you will need to install
|
|
the <filename role="package">devel/ddd</filename> port in order to use the
|
|
<command>ddd</command> debugger). Add the <option>-k</option>
|
|
option to the <command>ddd</command> command line you would use
|
|
normally. For example;</para>
|
|
|
|
<screen>&prompt.root; <userinput>ddd --debugger kgdb kernel.debug /var/crash/vmcore.0</userinput></screen>
|
|
|
|
<para>You should then be able to go about looking at the crash dump using
|
|
<command>ddd</command>'s graphical interface.</para>
|
|
</sect1>
|
|
|
|
<sect1 id="kerneldebug-online-ddb">
|
|
<title>On-Line Kernel Debugging Using DDB</title>
|
|
|
|
<para>While <command>kgdb</command> as an off-line debugger provides a very
|
|
high level of user interface, there are some things it cannot do. The
|
|
most important ones being breakpointing and single-stepping kernel
|
|
code.</para>
|
|
|
|
<para>If you need to do low-level debugging on your kernel, there is an
|
|
on-line debugger available called DDB. It allows setting of
|
|
breakpoints, single-stepping kernel functions, examining and changing
|
|
kernel variables, etc. However, it cannot access kernel source files,
|
|
and only has access to the global and static symbols, not to the full
|
|
debug information like <command>gdb</command> does.</para>
|
|
|
|
<para>To configure your kernel to include DDB, add the options
|
|
|
|
<programlisting>options KDB</programlisting>
|
|
<programlisting>options DDB</programlisting>
|
|
|
|
to your config file, and rebuild. (See <ulink
|
|
url="&url.books.handbook;/index.html">The FreeBSD Handbook</ulink> for details on
|
|
configuring the FreeBSD kernel).</para>
|
|
|
|
<note>
|
|
<para>If you have an older version of the boot blocks, your
|
|
debugger symbols might not be loaded at all. Update the boot blocks;
|
|
the recent ones load the DDB symbols automatically.</para>
|
|
</note>
|
|
|
|
<para>Once your DDB kernel is running, there are several ways to enter
|
|
DDB. The first, and earliest way is to type the boot flag
|
|
<option>-d</option> right at the boot prompt. The kernel will start up
|
|
in debug mode and enter DDB prior to any device probing. Hence you can
|
|
even debug the device probe/attach functions. Users of &os.current;
|
|
will need to use the boot menu option, six, to escape to a command
|
|
prompt.</para>
|
|
|
|
<para>The second scenario is to drop to the debugger once the
|
|
system has booted. There are two simple ways to accomplish
|
|
this. If you would like to break to the debugger from the
|
|
command prompt, simply type the command:</para>
|
|
|
|
<screen>&prompt.root; <userinput>sysctl debug.kdb.enter=1</userinput></screen>
|
|
<note>
|
|
<para>To force a panic on the fly, issue the following command:</para>
|
|
<screen>&prompt.root; <userinput>sysctl debug.kdb.panic=1</userinput></screen>
|
|
</note>
|
|
|
|
<para>Alternatively, if you are at the system console, you may use
|
|
a hot-key on the keyboard. The default break-to-debugger
|
|
sequence is <keycombo action="simul"><keycap>Ctrl</keycap>
|
|
<keycap>Alt</keycap><keycap>ESC</keycap></keycombo>. For
|
|
syscons, this sequence can be remapped and some of the
|
|
distributed maps out there do this, so check to make sure you
|
|
know the right sequence to use. There is an option available
|
|
for serial consoles that allows the use of a serial line BREAK on the
|
|
console line to enter DDB (<literal>options BREAK_TO_DEBUGGER</literal>
|
|
in the kernel config file). It is not the default since there are a lot
|
|
of serial adapters around that gratuitously generate a BREAK
|
|
condition, for example when pulling the cable.</para>
|
|
|
|
<para>The third way is that any panic condition will branch to DDB if the
|
|
kernel is configured to use it. For this reason, it is not wise to
|
|
configure a kernel with DDB for a machine running unattended.</para>
|
|
|
|
<para>To obtain the unattended functionality, add:</para>
|
|
|
|
<programlisting>options KDB_UNATTENDED</programlisting>
|
|
|
|
<para>to the kernel configuration file and rebuild/reinstall.</para>
|
|
|
|
<para>The DDB commands roughly resemble some <command>gdb</command>
|
|
commands. The first thing you probably need to do is to set a
|
|
breakpoint:</para>
|
|
|
|
<screen><userinput>break function-name address</userinput></screen>
|
|
|
|
<para>Numbers are taken hexadecimal by default, but to make them distinct
|
|
from symbol names; hexadecimal numbers starting with the letters
|
|
<literal>a-f</literal> need to be preceded with <literal>0x</literal>
|
|
(this is optional for other numbers). Simple expressions are allowed,
|
|
for example: <literal>function-name + 0x103</literal>.</para>
|
|
|
|
<para>To exit the debugger and continue execution,
|
|
type:</para>
|
|
|
|
<screen><userinput>continue</userinput></screen>
|
|
|
|
<para>To get a stack trace, use:</para>
|
|
|
|
<screen><userinput>trace</userinput></screen>
|
|
|
|
<note>
|
|
<para>Note that when entering DDB via a hot-key, the kernel is currently
|
|
servicing an interrupt, so the stack trace might be not of much use
|
|
to you.</para>
|
|
</note>
|
|
|
|
<para>If you want to remove a breakpoint, use</para>
|
|
|
|
<screen><userinput>del</userinput>
|
|
<userinput>del address-expression</userinput></screen>
|
|
|
|
<para>The first form will be accepted immediately after a breakpoint hit,
|
|
and deletes the current breakpoint. The second form can remove any
|
|
breakpoint, but you need to specify the exact address; this can be
|
|
obtained from:</para>
|
|
|
|
<screen><userinput>show b</userinput></screen>
|
|
|
|
<para>or:</para>
|
|
|
|
<screen><userinput>show break</userinput></screen>
|
|
|
|
<para>To single-step the kernel, try:</para>
|
|
|
|
<screen><userinput>s</userinput></screen>
|
|
|
|
<para>This will step into functions, but you can make DDB trace them until
|
|
the matching return statement is reached by:</para>
|
|
|
|
<screen><userinput>n</userinput></screen>
|
|
|
|
<note>
|
|
<para>This is different from <command>gdb</command>'s
|
|
<command>next</command> statement; it is like <command>gdb</command>'s
|
|
<command>finish</command>. Pressing <keycap>n</keycap> more than once
|
|
will cause a continue.</para>
|
|
</note>
|
|
|
|
<para>To examine data from memory, use (for example):
|
|
|
|
<screen><userinput>x/wx 0xf0133fe0,40</userinput>
|
|
<userinput>x/hd db_symtab_space</userinput>
|
|
<userinput>x/bc termbuf,10</userinput>
|
|
<userinput>x/s stringbuf</userinput></screen>
|
|
|
|
for word/halfword/byte access, and hexadecimal/decimal/character/ string
|
|
display. The number after the comma is the object count. To display
|
|
the next 0x10 items, simply use:</para>
|
|
|
|
<screen><userinput>x ,10</userinput></screen>
|
|
|
|
<para>Similarly, use
|
|
|
|
<screen><userinput>x/ia foofunc,10</userinput></screen>
|
|
|
|
to disassemble the first 0x10 instructions of
|
|
<function>foofunc</function>, and display them along with their offset
|
|
from the beginning of <function>foofunc</function>.</para>
|
|
|
|
<para>To modify memory, use the write command:</para>
|
|
|
|
<screen><userinput>w/b termbuf 0xa 0xb 0</userinput>
|
|
<userinput>w/w 0xf0010030 0 0</userinput></screen>
|
|
|
|
<para>The command modifier
|
|
(<literal>b</literal>/<literal>h</literal>/<literal>w</literal>)
|
|
specifies the size of the data to be written, the first following
|
|
expression is the address to write to and the remainder is interpreted
|
|
as data to write to successive memory locations.</para>
|
|
|
|
<para>If you need to know the current registers, use:</para>
|
|
|
|
<screen><userinput>show reg</userinput></screen>
|
|
|
|
<para>Alternatively, you can display a single register value by e.g.
|
|
|
|
<screen><userinput>p $eax</userinput></screen>
|
|
|
|
and modify it by:</para>
|
|
|
|
<screen><userinput>set $eax new-value</userinput></screen>
|
|
|
|
<para>Should you need to call some kernel functions from DDB, simply
|
|
say:</para>
|
|
|
|
<screen><userinput>call func(arg1, arg2, ...)</userinput></screen>
|
|
|
|
<para>The return value will be printed.</para>
|
|
|
|
<para>For a &man.ps.1; style summary of all running processes, use:</para>
|
|
|
|
<screen><userinput>ps</userinput></screen>
|
|
|
|
<para>Now you have examined why your kernel failed, and you wish to
|
|
reboot. Remember that, depending on the severity of previous
|
|
malfunctioning, not all parts of the kernel might still be working as
|
|
expected. Perform one of the following actions to shut down and reboot
|
|
your system:</para>
|
|
|
|
<screen><userinput>panic</userinput></screen>
|
|
|
|
<para>This will cause your kernel to dump core and reboot, so you can
|
|
later analyze the core on a higher level with <command>gdb</command>.
|
|
This command
|
|
usually must be followed by another <command>continue</command>
|
|
statement.</para>
|
|
|
|
<screen><userinput>call boot(0)</userinput></screen>
|
|
|
|
<para>Might be a good way to cleanly shut down the running system,
|
|
<function>sync()</function> all disks, and finally, in some cases,
|
|
reboot. As long as
|
|
the disk and filesystem interfaces of the kernel are not damaged, this
|
|
could be a good way for an almost clean shutdown.</para>
|
|
|
|
<screen><userinput>call cpu_reset()</userinput></screen>
|
|
|
|
<para>This is the final way out of disaster and almost the same as hitting the
|
|
Big Red Button.</para>
|
|
|
|
<para>If you need a short command summary, simply type:</para>
|
|
|
|
<screen><userinput>help</userinput></screen>
|
|
|
|
<para>It is highly recommended to have a printed copy of the
|
|
&man.ddb.4; manual page ready for a debugging
|
|
session. Remember that it is hard to read the on-line manual while
|
|
single-stepping the kernel.</para>
|
|
</sect1>
|
|
|
|
<sect1 id="kerneldebug-online-gdb">
|
|
<title>On-Line Kernel Debugging Using Remote GDB</title>
|
|
|
|
<para>This feature has been supported since FreeBSD 2.2, and it is
|
|
actually a very neat one.</para>
|
|
|
|
<para>GDB has already supported <emphasis>remote debugging</emphasis> for
|
|
a long time. This is done using a very simple protocol along a serial
|
|
line. Unlike the other methods described above, you will need two
|
|
machines for doing this. One is the host providing the debugging
|
|
environment, including all the sources, and a copy of the kernel binary
|
|
with all the symbols in it, and the other one is the target machine that
|
|
simply runs a similar copy of the very same kernel (but stripped of the
|
|
debugging information).</para>
|
|
|
|
<para>You should configure the kernel in question with <command>config
|
|
-g</command> if building the <quote>traditional</quote> way. If
|
|
building the <quote>new</quote> way, make sure that
|
|
<literal>makeoptions DEBUG=-g</literal> is in the configuration.
|
|
In both cases, include <option>DDB</option> in the configuration, and
|
|
compile it as usual. This gives a large binary, due to the
|
|
debugging information. Copy this kernel to the target machine, strip
|
|
the debugging symbols off with <command>strip -x</command>, and boot it
|
|
using the <option>-d</option> boot option. Connect the serial line
|
|
of the target machine that has "flags 080" set on its sio device
|
|
to any serial line of the debugging host. See &man.sio.4; for
|
|
information on how to set the flags on an sio device.
|
|
Now, on the debugging machine, go to the compile directory of the target
|
|
kernel, and start <command>gdb</command>:</para>
|
|
|
|
<screen>&prompt.user; <userinput>kgdb kernel</userinput>
|
|
GDB is free software and you are welcome to distribute copies of it
|
|
under certain conditions; type "show copying" to see the conditions.
|
|
There is absolutely no warranty for GDB; type "show warranty" for details.
|
|
GDB 4.16 (i386-unknown-freebsd),
|
|
Copyright 1996 Free Software Foundation, Inc...
|
|
<prompt>(kgdb)</prompt> </screen>
|
|
|
|
<para>Initialize the remote debugging session (assuming the first serial
|
|
port is being used) by:</para>
|
|
|
|
<screen><prompt>(kgdb)</prompt> <userinput>target remote /dev/cuaa0</userinput></screen>
|
|
|
|
<para>Now, on the target host (the one that entered DDB right before even
|
|
starting the device probe), type:</para>
|
|
|
|
<screen>Debugger("Boot flags requested debugger")
|
|
Stopped at Debugger+0x35: movb $0, edata+0x51bc
|
|
<prompt>db></prompt> <userinput>gdb</userinput></screen>
|
|
|
|
<para>DDB will respond with:</para>
|
|
|
|
<screen>Next trap will enter GDB remote protocol mode</screen>
|
|
|
|
<para>Every time you type <command>gdb</command>, the mode will be toggled
|
|
between remote GDB and local DDB. In order to force a next trap
|
|
immediately, simply type <command>s</command> (step). Your hosting GDB
|
|
will now gain control over the target kernel:</para>
|
|
|
|
<screen>Remote debugging using /dev/cuaa0
|
|
Debugger (msg=0xf01b0383 "Boot flags requested debugger")
|
|
at ../../i386/i386/db_interface.c:257
|
|
<prompt>(kgdb)</prompt></screen>
|
|
|
|
<para>You can use this session almost as any other GDB session, including
|
|
full access to the source, running it in gud-mode inside an Emacs window
|
|
(which gives you an automatic source code display in another Emacs
|
|
window), etc.</para>
|
|
</sect1>
|
|
|
|
<sect1 id="kerneldebug-console">
|
|
<title>Debugging a Console Driver</title>
|
|
|
|
<para>Since you need a console driver to run DDB on, things are more
|
|
complicated if the console driver itself is failing. You might remember
|
|
the use of a serial console (either with modified boot blocks, or by
|
|
specifying <option>-h</option> at the <prompt>Boot:</prompt> prompt),
|
|
and hook up a standard terminal onto your first serial port. DDB works
|
|
on any configured console driver, including a serial
|
|
console.</para>
|
|
</sect1>
|
|
|
|
<sect1 id="kerneldebug-deadlocks">
|
|
<title>Debugging Deadlocks</title>
|
|
|
|
<para>You may experience so called deadlocks, the situation where
|
|
a system stops doing useful work. To provide a helpful bug report
|
|
in this situation, use &man.ddb.4; as described above.
|
|
Include the output of <command>ps</command> and
|
|
<command>trace</command> for suspected processes in the
|
|
report.</para>
|
|
|
|
<para>If possible, consider doing further investigation. The receipt
|
|
below is especially useful if you suspect that a deadlock occurs in the
|
|
VFS layer. Add the following options
|
|
<programlisting>makeoptions DEBUG=-g
|
|
options INVARIANTS
|
|
options INVARIANT_SUPPORT
|
|
options WITNESS
|
|
options DEBUG_LOCKS
|
|
options DEBUG_VFS_LOCKS
|
|
options DIAGNOSTIC</programlisting>
|
|
|
|
to the kernel configuration file. When a deadlock occurs, in addition to the
|
|
output of the <command>ps</command> command, provide information
|
|
from the <command>show pcpu</command>, <command>show allpcpu</command>,
|
|
<command>show locks</command>, <command>show alllocks</command>,
|
|
<command>show lockedvnods</command> and <command>alltrace</command>.
|
|
</para>
|
|
|
|
<para>To obtain meaningful backtraces for threaded processes, use
|
|
<command>thread thread-id</command> to switch to the thread
|
|
stack, and do a backtrace with <command>where</command>.</para>
|
|
</sect1>
|
|
|
|
<sect1 id="kerneldebug-options">
|
|
<title>Glossary of Kernel Options for Debugging</title>
|
|
|
|
<para>This section provides a brief glossary of compile-time kernel
|
|
options used for debugging:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para><literal>options KDB</literal>: compiles in the kernel
|
|
debugger framework. Required for <literal>options DDB</literal>
|
|
and <literal>options GDB</literal>. Little or no performance
|
|
overhead. By default, the debugger will be entered on panic
|
|
instead of an automatic reboot.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><literal>options KDB_UNATTENDED</literal>: change the default
|
|
value of the <literal>debug.debugger_on_panic</literal> sysctl to
|
|
0, which controls whether the debugger is entered on panic. When
|
|
<literal>options KDB</literal> is not compiled into the kernel, the
|
|
behavior is to automatically reboot on panic; when it is compiled
|
|
into the kernel, the default behavior is to drop into the debugger
|
|
unless <literal>options KDB_UNATTENDED</literal> is compiled in.
|
|
If you want to leave the kernel debugger compiled into the kernel
|
|
but want the system to come back up unless you're on-hand to use
|
|
the debugger for diagnostics, use this option.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><literal>options KDB_TRACE</literal>: change the default value
|
|
of the <literal>debug.trace_on_panic</literal> sysctl to 1, which
|
|
controls whether the debugger automatically prints a stack trace
|
|
on panic. Especially if running with <literal>options
|
|
KDB_UNATTENDED</literal>, this can be helpful to gather basic
|
|
debugging information on the serial or firewire console while
|
|
still rebooting to recover.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><literal>options DDB</literal>: compile in support for the
|
|
console debugger, DDB. This interactive debugger runs on whatever
|
|
the active low-level console of the system is, which includes the
|
|
video console, serial console, or firewire console. It provides
|
|
basic integrated debugging facilities, such as stack tracing,
|
|
process and thread listing, dumping of lock state, VM state, file
|
|
system state, and kernel memory management. DDB does not require
|
|
software running on a second machine or being able to generate a
|
|
core dump or full debugging kernel symbols, and provides detailed
|
|
diagnostics of the kernel at run-time. Many bugs can be fully
|
|
diagnosed using only DDB output. This option depends on
|
|
<literal>options KDB</literal>.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><literal>options GDB</literal>: compile in support for the
|
|
remote debugger, GDB, which can operate over serial cable or
|
|
firewire. When the debugger is entered, GDB may be attached to
|
|
inspect structure contents, generate stack traces, etc. Some
|
|
kernel state is more awkward to access than in DDB, which is able
|
|
to generate useful summaries of kernel state automatically, such
|
|
as automatically walking lock debugging or kernel memory
|
|
management structures, and a second machine running the debugger
|
|
is required. On the other hand, GDB combines information from
|
|
the kernel source and full debugging symbols, and is aware of full
|
|
data structure definitions, local variables, and is scriptable.
|
|
This option is not required to run GDB on a kernel core dump.
|
|
This option depends on <literal>options KDB</literal>.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><literal>options BREAK_TO_DEBUGGER</literal>, <literal>options
|
|
ALT_BREAK_TO_DEBUGGER</literal>: allow a break signal or
|
|
alternative signal on the console to enter the debugger. If the
|
|
system hangs without a panic, this is a useful way to reach the
|
|
debugger. Due to the current kernel locking, a break signal
|
|
generated on a serial console is significantly more reliable at
|
|
getting into the debugger, and is generally recommended. This
|
|
option has little or no performance impact.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><literal>options INVARIANTS</literal>: compile into the kernel
|
|
a large number of run-time assertion checks and tests, which
|
|
constantly test the integrity of kernel data structures and the
|
|
invariants of kernel algorithms. These tests can be expensive, so
|
|
are not compiled in by default, but help provide useful "fail stop"
|
|
behavior, in which certain classes of undesired behavior enter the
|
|
debugger before kernel data corruption occurs, making them easier
|
|
to debug. Tests include memory scrubbing and use-after-free
|
|
testing, which is one of the more significant sources of overhead.
|
|
This option depends on <literal>options INVARIANT_SUPPORT</literal>.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><literal>options INVARIANT_SUPPORT</literal>: many of the tests
|
|
present in <literal>options INVARIANTS</literal> require modified
|
|
data structures or additional kernel symbols to be defined.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><literal>options WITNESS</literal>: this option enables run-time
|
|
lock order tracking and verification, and is an invaluable tool for
|
|
deadlock diagnosis. WITNESS maintains a graph of acquired lock
|
|
orders by lock type, and checks the graph at each acquire for
|
|
cycles (implicit or explicit). If a cycle is detected, a warning
|
|
and stack trace are generated to the console, indicating that a
|
|
potential deadlock might have occurred. WITNESS is required in
|
|
order to use the <command>show locks</command>, <command>show
|
|
witness</command> and <command>show alllocks</command> DDB
|
|
commands. This debug option has significant performance overhead,
|
|
which may be somewhat mitigated through the use of <literal>options
|
|
WITNESS_SKIPSPIN</literal>. Detailed documentation may be found in
|
|
&man.witness.4;.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><literal>options WITNESS_SKIPSPIN</literal>: disable run-time
|
|
checking of spinlock lock order with WITNESS. As spin locks are
|
|
acquired most frequently in the scheduler, and scheduler events
|
|
occur often, this option can significantly speed up systems
|
|
running with WITNESS. This option depends on <literal>options
|
|
WITNESS</literal>.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><literal>options WITNESS_KDB</literal>: change the default
|
|
value of the <literal>debug.witness.kdb</literal> sysctl to 1,
|
|
which causes WITNESS to enter the debugger when a lock order
|
|
violation is detected, rather than simply printing a warning. This
|
|
option depends on <literal>options WITNESS</literal>.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><literal>options SOCKBUF_DEBUG</literal>: perform extensive
|
|
run-time consistency checking on socket buffers, which can be
|
|
useful for debugging both socket bugs and race conditions in
|
|
protocols and device drivers that interact with sockets. This
|
|
option significantly impacts network performance, and may change
|
|
the timing in device driver races.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><literal>options DEBUG_VFS_LOCKS</literal>: track lock
|
|
acquisition points for lockmgr/vnode locks, expanding the amount
|
|
of information displayed by <command>show lockedvnods</command>
|
|
in DDB. This option has a measurable performance impact.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><literal>options DEBUG_MEMGUARD</literal>: a replacement for
|
|
the &man.malloc.9; kernel memory allocator that uses the VM system
|
|
to detect reads or writes from allocated memory after free.
|
|
Details may be found in &man.memguard.9;. This option has a
|
|
significant performance impact, but can be very helpful in
|
|
debugging kernel memory corruption bugs.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><literal>options DIAGNOSTIC</literal>: enable additional, more
|
|
expensive diagnostic tests along the lines of <literal>options
|
|
INVARIANTS</literal>.</para>
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
</sect1>
|
|
|
|
</chapter>
|