424 lines
		
	
	
	
		
			17 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			424 lines
		
	
	
	
		
			17 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
| <!-- $Id: kerneldebug.sgml,v 1.5 1995-10-22 00:42:10 jfieber Exp $ -->
 | |
| <!-- The FreeBSD Documentation Project -->
 | |
| 
 | |
| <chapt><heading>Kernel Debugging<label id="kerneldebug"></heading>
 | |
| 
 | |
| <p><em>Contributed by &a.paul; and &a.joerg;</em>
 | |
| 
 | |
| <sect><heading>Debugging a kernel crash dump with kgdb</heading>
 | |
| 
 | |
|   <p>Here are some instructions for getting kernel debugging
 | |
|   working on a crash dump, it assumes that you have enough swap
 | |
|   space for a crash dump.  If you have multiple swap
 | |
|   partitions and the first one is too small to hold the dump,
 | |
|   you can configure your kernel to use an alternate dump device
 | |
|   (in the <tt>config kernel</tt> line), or
 | |
|   you can specify an alternate using the dumpon(8) command.  
 | |
|   Dumps to non-swap devices,
 | |
|   tapes for example, are currently not supported.  Config your
 | |
|   kernel using <tt>config -g</tt>.
 | |
|   See <ref id="kernelconfig" name="Kernel Configuration"> for
 | |
|   details on configuring the FreeBSD kernel.
 | |
| 
 | |
|   Use the <tt>dumpon(8)</tt> command to tell the kernel where to dump
 | |
|   to (note that this will have to be done after configuring the
 | |
|   partition in question as swap space via <tt>swapon(8)</tt>).  This is
 | |
|   normally arranged via <tt>/etc/sysconfig</tt> and <tt>/etc/rc</tt>.  
 | |
|   Alternatively, you can
 | |
|   hard-code the dump device via the `dump' clause in the `config' line
 | |
|   of your kernel config file.  This is deprecated, use only if you
 | |
|   want a crash dump from a kernel that crashes during booting.
 | |
| 
 | |
|   <em><bf>Note:</bf> In the following, the term `<tt>kgdb</tt>' refers
 | |
|   to <tt>gdb</tt> run in `kernel debug mode'.  This can be accomplished by
 | |
|   either starting the <tt>gdb</tt> with the option <tt>-k</tt>, or by linking
 | |
|   and starting it under the name <tt>kgdb</tt>.  This is not being
 | |
|   done by default, however.</em>
 | |
| 
 | |
|   When the kernel has been built make a copy of it, say
 | |
|   <tt>kernel.debug</tt>, and then run <tt>strip -x</tt> on the
 | |
|   original. Install the original as normal.  You may also install
 | |
|   the unstripped kernel, but symbol table lookup time for some
 | |
|   programs will drastically increase, and since
 | |
|   the whole kernel is loaded entirely at boot time and cannot be
 | |
|   swapped out later, several megabytes of
 | |
|   physical memory will be wasted.
 | |
| 
 | |
|   If you are testing a new kernel, for example by typing the new
 | |
|   kernel's name at the boot prompt, but need to boot a different
 | |
|   one in order to get your system up and running again, boot it
 | |
|   only into single user state using the <tt>-s</tt> flag at the
 | |
|   boot prompt, and then perform the following steps:
 | |
| <tscreen><verb>
 | |
|   fsck -p
 | |
|   mount -a -t ufs       # so your file system for /var/crash is writable
 | |
|   savecore -N /kernel.panicked /var/crash
 | |
|   exit                  # ...to multi-user
 | |
| </verb></tscreen>
 | |
|   This instructs <tt>savecore(8)</tt> to use another kernel for symbol name
 | |
|   extraction.  It would otherwise default to the currently running kernel
 | |
|   and most likely not do anything at all since the crash dump and the
 | |
|   kernel symbols differ.
 | |
| 
 | |
|   Now, after a crash dump, go to <tt>/sys/compile/WHATEVER</tt> and run
 | |
|   <tt>kgdb</tt>.  From <tt>kgdb</tt> do:
 | |
| <tscreen><verb>
 | |
|   symbol-file kernel.debug
 | |
|   exec-file /var/crash/system.0
 | |
|   core-file /var/crash/ram.0
 | |
| </verb></tscreen>
 | |
|   and voila, you can debug the crash dump using the kernel sources
 | |
|   just like you can for any other program.
 | |
| 
 | |
|   Here's a script log of a <tt>kgdb</tt> session illustrating the
 | |
|   procedure.  Long
 | |
|   lines have been folded to improve readability, and the lines are
 | |
|   numbered for reference.  Despite of this, it's a real-world error
 | |
|   trace taken during the development of the pcvt console driver.
 | |
| <tscreen><verb>
 | |
|    1:Script started on Fri Dec 30 23:15:22 1994
 | |
|    2:uriah # cd /sys/compile/URIAH
 | |
|    3:uriah # kgdb kernel /var/crash/vmcore.1 
 | |
|    4:Reading symbol data from /usr/src/sys/compile/URIAH/kernel...done.
 | |
|    5:IdlePTD 1f3000
 | |
|    6:panic: because you said to!
 | |
|    7:current pcb at 1e3f70
 | |
|    8:Reading in symbols for ../../i386/i386/machdep.c...done.
 | |
|    9:(kgdb) where
 | |
|   10:#0  boot (arghowto=256) (../../i386/i386/machdep.c line 767)
 | |
|   11:#1  0xf0115159 in panic ()
 | |
|   12:#2  0xf01955bd in diediedie () (../../i386/i386/machdep.c line 698)
 | |
|   13:#3  0xf010185e in db_fncall ()
 | |
|   14:#4  0xf0101586 in db_command (-266509132, -266509516, -267381073)
 | |
|   15:#5  0xf0101711 in db_command_loop ()
 | |
|   16:#6  0xf01040a0 in db_trap ()
 | |
|   17:#7  0xf0192976 in kdb_trap (12, 0, -272630436, -266743723)
 | |
|   18:#8  0xf019d2eb in trap_fatal (...)
 | |
|   19:#9  0xf019ce60 in trap_pfault (...)
 | |
|   20:#10 0xf019cb2f in trap (...)
 | |
|   21:#11 0xf01932a1 in exception:calltrap ()
 | |
|   22:#12 0xf0191503 in cnopen (...)
 | |
|   23:#13 0xf0132c34 in spec_open ()
 | |
|   24:#14 0xf012d014 in vn_open ()
 | |
|   25:#15 0xf012a183 in open ()
 | |
|   26:#16 0xf019d4eb in syscall (...)
 | |
|   27:(kgdb) up 10
 | |
|   28:Reading in symbols for ../../i386/i386/trap.c...done.
 | |
|   29:#10 0xf019cb2f in trap (frame={tf_es = -260440048, tf_ds = 16, tf_\
 | |
|   30:edi = 3072, tf_esi = -266445372, tf_ebp = -272630356, tf_isp = -27\
 | |
|   31:2630396, tf_ebx = -266427884, tf_edx = 12, tf_ecx = -266427884, tf\
 | |
|   32:_eax = 64772224, tf_trapno = 12, tf_err = -272695296, tf_eip = -26\
 | |
|   33:6672343, tf_cs = -266469368, tf_eflags = 66066, tf_esp = 3072, tf_\
 | |
|   34:ss = -266427884}) (../../i386/i386/trap.c line 283)
 | |
|   35:283                             (void) trap_pfault(&frame, FALSE);
 | |
|   36:(kgdb) frame frame->tf_ebp frame->tf_eip
 | |
|   37:Reading in symbols for ../../i386/isa/pcvt/pcvt_drv.c...done.
 | |
|   38:#0  0xf01ae729 in pcopen (dev=3072, flag=3, mode=8192, p=(struct p\
 | |
|   39:roc *) 0xf07c0c00) (../../i386/isa/pcvt/pcvt_drv.c line 403)
 | |
|   40:403             return ((*linesw[tp->t_line].l_open)(dev, tp));
 | |
|   41:(kgdb) list
 | |
|   42:398        
 | |
|   43:399             tp->t_state |= TS_CARR_ON;
 | |
|   44:400             tp->t_cflag |= CLOCAL;  /* cannot be a modem (:-) */
 | |
|   45:401     
 | |
|   46:402     #if PCVT_NETBSD || (PCVT_FREEBSD >= 200)
 | |
|   47:403             return ((*linesw[tp->t_line].l_open)(dev, tp));
 | |
|   48:404     #else
 | |
|   49:405             return ((*linesw[tp->t_line].l_open)(dev, tp, flag));
 | |
|   50:406     #endif /* PCVT_NETBSD || (PCVT_FREEBSD >= 200) */
 | |
|   51:407     }
 | |
|   52:(kgdb) print tp
 | |
|   53:Reading in symbols for ../../i386/i386/cons.c...done.
 | |
|   54:$1 = (struct tty *) 0x1bae
 | |
|   55:(kgdb) print tp->t_line
 | |
|   56:$2 = 1767990816
 | |
|   57:(kgdb) up
 | |
|   58:#1  0xf0191503 in cnopen (dev=0x00000000, flag=3, mode=8192, p=(st\
 | |
|   59:ruct proc *) 0xf07c0c00) (../../i386/i386/cons.c line 126)
 | |
|   60:       return ((*cdevsw[major(dev)].d_open)(dev, flag, mode, p));
 | |
|   61:(kgdb) up
 | |
|   62:#2  0xf0132c34 in spec_open ()
 | |
|   63:(kgdb) up
 | |
|   64:#3  0xf012d014 in vn_open ()
 | |
|   65:(kgdb) up
 | |
|   66:#4  0xf012a183 in open ()
 | |
|   67:(kgdb) up
 | |
|   68:#5  0xf019d4eb in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi =\
 | |
|   69: 2158592, tf_esi = 0, tf_ebp = -272638436, tf_isp = -272629788, tf\
 | |
|   70:_ebx = 7086, tf_edx = 1, tf_ecx = 0, tf_eax = 5, tf_trapno = 582, \
 | |
|   71:tf_err = 582, tf_eip = 75749, tf_cs = 31, tf_eflags = 582, tf_esp \
 | |
|   72:= -272638456, tf_ss = 39}) (../../i386/i386/trap.c line 673)
 | |
|   73:673             error = (*callp->sy_call)(p, args, rval);
 | |
|   74:(kgdb) up
 | |
|   75:Initial frame selected; you cannot go up.
 | |
|   76:(kgdb) quit
 | |
|   77:uriah # exit
 | |
|   78:exit
 | |
|   79:
 | |
|   80:Script done on Fri Dec 30 23:18:04 1994
 | |
| </verb></tscreen>
 | |
|   Comments to the above script:
 | |
|   
 | |
| <descrip>
 | |
| <tag/line 6:/  This is a dump taken from within DDB (see below), hence the
 | |
|             panic comment ``because you said to!'', and a rather long
 | |
|             stack trace; the initial reason for going into DDB has been
 | |
|             a page fault trap though.
 | |
| <tag/line 20:/  This is the location of function <tt>trap()</tt>
 | |
| 	    in the stack trace. 
 | |
| <tag/line 36:/  Force usage of a new stack frame; this is no longer
 | |
|             necessary now.  The stack frames are supposed to point to
 | |
|             the right locations now, even in case of a trap.
 | |
|             (I don't have a new core dump handy <g>, my kernel
 | |
|             didn't panic for rather long.)
 | |
|             From looking at the code in source line 403,
 | |
|             there's a high probability that either the pointer
 | |
|             access for ``tp'' was messed up, or the array access was
 | |
|             out of bounds.
 | |
| <tag/line 52:/  The pointer looks suspicious, but happens to be a valid
 | |
|             address.
 | |
| <tag/line 56:/ However, it obviously points to garbage, so we have found our
 | |
|             error!  (For those unfamiliar with that particular piece
 | |
|             of code: <tt>tp->t_line</tt> refers to the line discipline 
 | |
| 	    of the console device here, which must be a rather small integer
 | |
|             number.)
 | |
| </descrip>  
 | |
| 
 | |
| 
 | |
| <sect><heading>Post-mortem analysis of a dump</heading>
 | |
| 
 | |
| <p>What do you do if a kernel dumped core but you did not expect
 | |
|   it, and it's therefore not compiled using <tt>config -g</tt>?
 | |
|   Not everything is lost here.  Don't panic!
 | |
| 
 | |
|   Of course, you still need to enable crash dumps.  See above
 | |
|   on the options you've got in order to do this.
 | |
| 
 | |
|   Go to your kernel compile directory, and edit the line
 | |
|   containing <tt>COPTFLAGS?=-O</tt>.  Add the <tt>-g</tt> option
 | |
|   there (but <em>don't</em> change anything on the level of
 | |
|   optimization).  If you do already know roughly the probable
 | |
|   location of the failing piece of code (e.g., the <tt>pcvt</tt>
 | |
|   driver in the example above), remove all the object files for
 | |
|   this code.  Rebuild the kernel. Due to the time stamp change on
 | |
|   the Makefile, there will be some other object files rebuild,
 | |
|   for example <tt>trap.o</tt>.  With a bit of luck, the added
 | |
|   <tt>-g</tt> option won't change anything for the generated
 | |
|   code, so you'll finally get a new kernel with similar code to
 | |
|   the faulting one but some debugging symbols.  You should at
 | |
|   least verify the old and new sizes with the <tt>size(1)</tt> command. If
 | |
|   there is a mismatch, you probably need to give up here.
 | |
| 
 | |
|   Go and examine the dump as described above.  The debugging
 | |
|   symbols might be incomplete for some places, as can be seen in
 | |
|   the stack trace in the example above where some functions are
 | |
|   displayed without line numbers and argument lists.  If you need
 | |
|   more debugging symbols, remove the appropriate object files and
 | |
|   repeat the <tt>kgdb</tt> session until you know enough.
 | |
| 
 | |
|   All this is not guaranteed to work, but it will do it fine in
 | |
|   most cases.
 | |
| 
 | |
| <sect><heading>On-line kernel debugging using DDB</heading>
 | |
| 
 | |
| <p>While <tt>kgdb</tt> as an offline debugger provides a very
 | |
|   high level of user interface, there are some things it cannot do.
 | |
|   The most important ones being breakpointing and single-stepping
 | |
|   kernel code.
 | |
| 
 | |
|   If you need to do low-level debugging on your kernel, there's
 | |
|   an on- line debugger available called DDB.  It allows to
 | |
|   setting breakpoints, single-steping kernel functions, examining
 | |
|   and changing kernel variables, etc.  However, it cannot not
 | |
|   access kernel source files, and only has access to the global
 | |
|   and static symbols, not to the full debug information like
 | |
|   <tt>kgdb</tt>.
 | |
| 
 | |
|   To configure your kernel to include DDB, add the option line
 | |
| <tscreen><verb>
 | |
|         options DDB
 | |
| </verb></tscreen>
 | |
|   to your config file, and rebuild.  (See <ref id="kernelconfig"
 | |
|   name="Kernel Configuration"> for details on configuring the
 | |
|   FreeBSD kernel. Note that if you have an older version of the
 | |
|   boot blocks, your debugger symbols might not be loaded at all.
 | |
|   Update the boot blocks, the recent ones do load the DDB symbols
 | |
|   automagically.)
 | |
| 
 | |
|   Once your DDB kernel is running, there are several ways to
 | |
|   enter DDB.  The first, and earliest way is to type the boot
 | |
|   flag <tt>-d</tt> right at the boot prompt.  The kernel will
 | |
|   start up in debug mode and enter DDB prior to any device
 | |
|   probing.  Hence you are able to even debug the device
 | |
|   probe/attach functions.
 | |
| 
 | |
|   The second scenario is a hot-key on the keyboard, usually
 | |
|   Ctrl-Alt-ESC.  For syscons, this can be remapped, and some of
 | |
|   the distributed maps do this, so watch out.
 | |
|   There's an option
 | |
|   available for serial consoles
 | |
|   that allows the use of a serial line BREAK on the console line to
 | |
|   enter DDB (``<tt>options BREAK_TO_DEBUGGER</tt>''
 | |
|   in the kernel config file).  It is not the default since there are a lot of
 | |
|   crappy serial adapters around that gratuitously generate a
 | |
|   BREAK condition for example when pulling the cable.
 | |
| 
 | |
|   The third way is that any panic condition will branch to DDB if
 | |
|   the kernel is configured to use it.  
 | |
|   For this reason, it is not wise to
 | |
|   configure a kernel with DDB for a machine running unattended.
 | |
| 
 | |
|   The DDB commands roughly resemble some <tt>gdb</tt> commands.  The first you
 | |
|   probably need is to set a breakpoint:
 | |
| <tscreen><verb>
 | |
|   b function-name
 | |
|   b address
 | |
| </verb></tscreen>
 | |
| 
 | |
|   Numbers are taken hexadecimal by default, but to make them
 | |
|   distinct from symbol names, hexadecimal numbers starting with the
 | |
|   letters <tt>a</tt>-<tt>f</tt> need to be preceded with
 | |
|   <tt>0x</tt> (for other numbers, this is optional).  Simple
 | |
|   expressions are allowed, for example: <tt>function-name + 0x103</tt>.
 | |
| 
 | |
|   To continue the operation of an interrupted kernel, simply type
 | |
| <tscreen><verb>
 | |
|   c
 | |
| </verb></tscreen>
 | |
|   To get a stack trace, use
 | |
| <tscreen><verb>
 | |
|   trace
 | |
| </verb></tscreen>
 | |
|   Note that when entering DDB via a hot-key, the kernel is currently
 | |
|   servicing an interrupt, so the stack trace might be not of much use
 | |
|   for you.
 | |
| 
 | |
|   If you want to remove a breakpoint, use
 | |
| <tscreen><verb>
 | |
|   del
 | |
|   del address-expression
 | |
| </verb></tscreen>
 | |
|   The first form will be accepted immediately after a breakpoint hit,
 | |
|   and deletes the current breakpoint.  The second form can remove any
 | |
|   breakpoint, but you need to specify the exact address, as it can be
 | |
|   obtained from
 | |
| <tscreen><verb>
 | |
|   show b
 | |
| </verb></tscreen>
 | |
|   To single-step the kernel, try
 | |
| <tscreen><verb>
 | |
|   s
 | |
| </verb></tscreen>
 | |
|   This will step into functions, but you can make DDB trace them until
 | |
|   the matching return statement is reached by
 | |
| <tscreen><verb>
 | |
|   n
 | |
| </verb></tscreen>
 | |
|   <bf>Note:</bf> this is different from <tt>gdb</tt>'s `next' statement, it's like
 | |
|   <tt>gdb</tt>'s `finish'.
 | |
| 
 | |
|   To examine data from memory, use (for example):
 | |
| <tscreen><verb>
 | |
|   x/wx 0xf0133fe0,40
 | |
|   x/hd db_symtab_space
 | |
|   x/bc termbuf,10
 | |
|   x/s stringbuf
 | |
| </verb></tscreen>
 | |
|   for word/halfword/byte access, and hexadecimal/decimal/character/
 | |
|   string display.  The number after the comma is the object count.
 | |
|   To display the next 0x10 items, simply use
 | |
| <tscreen><verb>
 | |
|   x ,10
 | |
| </verb></tscreen>
 | |
|   Similiarly, use
 | |
| <tscreen><verb>
 | |
|   x/ia foofunc,10
 | |
| </verb></tscreen>
 | |
|   to disassemble the first 0x10 instructions of <tt>foofunc</tt>, and display
 | |
|   them along with their offset from the beginning of <tt>foofunc</tt>.
 | |
| 
 | |
|   To modify the memory, use the write command:
 | |
| <tscreen><verb>
 | |
|   w/b termbuf 0xa 0xb 0
 | |
|   w/w 0xf0010030 0 0
 | |
| </verb></tscreen>
 | |
|   The command modifier (<tt>b</tt>/<tt>h</tt>/<tt>w</tt>)
 | |
|   specifies the size of the data to be written, the first
 | |
|   following expression is the address to write to, the remainder
 | |
|   is interpreted as data to write to successive memory locations.
 | |
| 
 | |
|   If you need to know the current registers, use
 | |
| <tscreen><verb>
 | |
|   show reg
 | |
| </verb></tscreen>
 | |
|   Alternatively, you can display a single register value by e.g.
 | |
| <tscreen><verb>
 | |
|   p $eax
 | |
| </verb></tscreen>
 | |
|   and modify it by
 | |
| <tscreen><verb>
 | |
|   set $eax new-value
 | |
| </verb></tscreen>
 | |
| 
 | |
|   Should you need to call some kernel functions from DDB, simply
 | |
|   say
 | |
| <tscreen><verb>
 | |
|   call func(arg1, arg2, ...)
 | |
| </verb></tscreen>
 | |
|   The return value will be printed.
 | |
| 
 | |
|   For a <tt>ps(1)</tt> style summary of all running processes, use
 | |
| <tscreen><verb>
 | |
|   ps
 | |
| </verb></tscreen>
 | |
| 
 | |
|   Now you have now examined why your kernel failed, and you wish to
 | |
|   reboot.  Remember that, depending on the severity of previous
 | |
|   malfunctioning, not all parts of the kernel might still be working
 | |
|   as expected.  Perform one of the following actions to shut down and
 | |
|   reboot your system:
 | |
| <tscreen><verb>
 | |
|   call diediedie()
 | |
| </verb></tscreen>
 | |
| 
 | |
|   will cause your kernel to dump core and reboot, so you can
 | |
|   later analyze the core on a higher level with kgdb.  This
 | |
|   command usually must be followed by another
 | |
|   `<tt>continue</tt>' statement.
 | |
|   There is now an alias for this: `<tt>panic</tt>'.
 | |
| 
 | |
| <tscreen><verb>
 | |
|   call boot(0)
 | |
| </verb></tscreen>
 | |
|   might be a good way to cleanly shut down the running system, <tt>sync()</tt>
 | |
|   all disks, and finally reboot.  As long as the disk and file system
 | |
|   interfaces of the kernel are not damaged, this might be a good way
 | |
|   for an almost clean shutdown.
 | |
| 
 | |
| <tscreen><verb>
 | |
|   call cpu_reset()
 | |
| </verb></tscreen>
 | |
|   is the final way out of disaster and almost the same as hitting
 | |
|   the Big Red Button.
 | |
| 
 | |
|   If you nead a short command summary, simply type
 | |
| <tscreen><verb>
 | |
|   help
 | |
| </verb></tscreen>
 | |
|   However, it's highly recommended to have a printed copy of the
 | |
|   <tt>ddb(4)</tt> manual page ready for a debugging session.
 | |
|   Remember that it's hard to read the on-line manual while
 | |
|   single-stepping the kernel.
 | |
| 
 | |
| 
 | |
| <sect><heading>Debugging a console driver</heading>
 | |
| 
 | |
| <p>Since you need a console driver to run DDB on, things are more
 | |
|   complicated if the console driver itself is failing.  You might
 | |
|   remember the use of a serial console (either with modified boot
 | |
|   blocks, or by specifying <tt><bf>-h</bf></tt> at the <tt>Boot:</tt>
 | |
|   prompt), and hook up a standard
 | |
|   terminal onto your first serial port.  DDB works on any configured
 | |
|   console driver, of course also on a serial console.
 | |
| 
 | |
| 
 |