Various updates to the kernel debugging chapter.

- Document vmcore.last and describe it as the way to find the most recent dump rather than the highest numbered dump. - Document crashinfo and that it automatically runs to generate a core.txt.N file if core dumps are enabled in rc.conf. - Add a section on testing kernel dumps via the debug.kdb.panic sysctl. Remove a later note about debug.kdb.panic from the DDB section. - Remove any mention of gdb -k (for pre 5-3 kernels) and just talk about kgdb. - Remove paragraph that talks about trying to find the kernel.debug file. Instead, recommand 'kgdb -n <N>' which does this lookup automatically, and specifically recommend 'kgb -n last' to open the most recent crash dump. Mention the fallback of specifying the kernel and vmcore directly if needed. - Remove example dump from FreeBSD 2. It is generally no longer relevant. It used gdb -k which uses a different stack trace format as well as including a 'frame' command that doesn't existing kgdb. (kgdb instead lets you switch to different threads and processes). - Remove mention of old boot blocks that don't load debug symbols. I think this was last relevant in FreeBSD 2.x or 3.x. - Rework the description of 'boot -d' to assume the boot menu and explicitly mention 'boot -d' at the loader prompt. - Document how to get stack traces of other threads in DDB. - Fix a few references to gdb to reference kgdb instead. - Replace 'call cpu_reset' with 'reset' for DDB. Differential Revision: https://reviews.freebsd.org/D14711
svn path=/head/; revision=51572
2018-04-18 23:48:42 +00:00 · 2018-04-18 23:48:42 +00:00 · d26b8ed87a · 2020-12-08 03:00:23 +00:00
commit d26b8ed87a
parent ba78e991f8
1 changed files with 66 additions and 218 deletions
--- a/en_US.ISO8859-1/books/developers-handbook/kerneldebug/chapter.xml
+++ b/en_US.ISO8859-1/books/developers-handbook/kerneldebug/chapter.xml
@ -136,11 +136,19 @@
          <varname>dumpdir</varname> is set to), the kernel will
          increment the trailing number for every crash to avoid
          overwriting an existing <filename>vmcore</filename> (e.g.,
-          <filename>vmcore.1</filename>).  While debugging, it is
-          highly likely that you will want to use the highest version
-          <filename>vmcore</filename> in
-          <filename>/var/crash</filename> when searching for the right
-          <filename>vmcore</filename>.</para>
+          <filename>vmcore.1</filename>).  &man.savecore.8; will always
+	  create a symbolic link to named <filename>vmcore.last</filename>
+	  in <filename>/var/crash</filename> after a dump is saved.
+	  This symbolic link can be used to locate the name of the most
+	  recent dump.</para>
+
+	<para>The &man.crashinfo.8; utility generates a text file
+	  containing a summary of information from a full memory dump
+	  or minidump.  If <varname>dumpdev</varname> has been set in
+	  &man.rc.conf.5;, &man.crashinfo.8; will be invoked
+	  automatically after &man.savecore.8;.  The output is saved
+	  to a file in <varname>dumpdir</varname> named
+	  <filename>core.txt.<replaceable>N</replaceable></filename>.</para>

    <tip>
      <para>If you are testing a new kernel but need to boot a different one in
@ -161,45 +169,61 @@
      device as it is likely different than
      <filename>/dev/ad0s1b</filename>!</para></tip>
    </sect2>
+
+    <sect2>
+      <title>Testing Kernel Dump Configuration</title>
+   
+      <para>The kernel includes a &man.sysctl.8; node that requests a
+       kernel panic.  This can be used to verify that your system is
+       properly configured to save kernel crash dumps.  You may wish
+       to remount existing file systems as read-only in single user
+       mode before triggering the crash to avoid data loss.</para>
+
+       <screen>&prompt.root; <userinput>shutdown now</userinput>
+...
+Enter full pathname of shell or RETURN for /bin/sh:
+&prompt.root; <userinput>mount -a -u -r</userinput>
+&prompt.root; <userinput>sysctl debug.kdb.panic=1</userinput>
+debug.kdb.panic:panic: kdb_sysctl_panic
+...</screen>
+
+      <para>After rebooting, your system should save a dump in
+        <filename>/var/crash</filename> along with a matching summary
+        from &man.crashinfo.8;.</para>
+    </sect2>
  </sect1>

  <sect1 xml:id="kerneldebug-gdb">
    <title>Debugging a Kernel Crash Dump with <command>kgdb</command></title>

    <note>
-      <para>This section covers &man.kgdb.1; as found in &os;&nbsp;5.3
-	and later.  In previous versions, one must use
-	<command>gdb -k</command> to read a core dump file.
-	Since &os;&nbsp;12 kgdb is acquired by installing
-	<package>devel/gdb</package>.</para>
+      <para>This section covers &man.kgdb.1;.  The latest version is
+        included in the <package>devel/gdb</package>.  An older version
+        is also present in &os;&nbsp;11 and earlier.</para>
    </note>

-    <para>Once a dump has been obtained, getting useful information
-      out of the dump is relatively easy for simple problems.  Before
-      launching into the internals of &man.kgdb.1; to debug
-      the crash dump, locate the debug version of your kernel
-      (normally called <filename>kernel.debug</filename>) and the path
-      to the source files used to build your kernel (normally
-      <filename>/usr/obj/usr/src/sys/<replaceable>KERNCONF</replaceable></filename>
-      or
-      <filename>/usr/obj/usr/src/<replaceable>amd64.amd64</replaceable>/sys/<replaceable>KERNCONF</replaceable></filename>,
-      where <filename><replaceable>amd64.amd64</replaceable></filename>
-      is the architecture and
-      <filename><replaceable>KERNCONF</replaceable></filename>
-      is the <varname>ident</varname> specified in a kernel
-      &man.config.5;).  With those two pieces of info, let the
-      debugging commence!</para>
-
    <para>To enter into the debugger and begin getting information
-      from the dump, the following steps are required at a minimum:</para>
+      from the dump, start kgdb:</para>

-    <screen>&prompt.root; <userinput>cd /usr/obj/usr/src/sys/<replaceable>KERNCONF</replaceable></userinput>
-&prompt.root; <userinput>kgdb kernel.debug /var/crash/vmcore.0</userinput></screen>
+    <screen>&prompt.root; <userinput>kgdb -n <replaceable>N</replaceable></userinput></screen>
+
+    <para>Where <replaceable>N</replaceable> is the suffix of the
+     <filename>vmcore.<replaceable>N</replaceable></filename> to
+     examine.  To open the most recent dump use:</para>
+
+    <screen>&prompt.root; <userinput>kgdb -n last</userinput></screen>
+
+    <para>Normally, &man.kgdb.1; should be able to locate the kernel
+     running at the time the dump was generated.  If it is not able to
+     locate the correct kernel, pass the pathname of the kernel and
+     dump as two arguments to kgdb:</para>
+
+    <screen>&prompt.root; <userinput>kgdb /boot/kernel/kernel /var/crash/vmcore.0</userinput></screen>

    <para>You can debug the crash dump using the kernel sources just like
      you can for any other program.</para>

-    <para>This first dump is from a 5.2-BETA kernel and the crash
+    <para>This dump is from a 5.2-BETA kernel and the crash
      comes from deep within the kernel.  The output below has been
      modified to include line numbers on the left.  This first trace
      inspects the instruction pointer and obtains a back trace.  The
@ -301,173 +325,12 @@
 88:#20 0xc070ca4d in Xint0x80_syscall () at {standard input}:136
 89:---Can't read userspace from dump, or kernel process---
 90:<prompt>(kgdb)</prompt> <userinput>quit</userinput></screen>
-
-
-    <para>This next trace is an older dump from the FreeBSD 2 time
-      frame, but is more involved and demonstrates more of the
-      features of <command>gdb</command>.  Long lines have been folded
-      to improve readability, and the lines are numbered for
-      reference. Despite this, it is a real-world error trace taken
-      during the development of the pcvt console driver.</para>
-
-<screen> 1:Script started on Fri Dec 30 23:15:22 1994
- 2:&prompt.root; <userinput>cd /sys/compile/URIAH</userinput>
- 3:&prompt.root; <userinput>gdb -k kernel /var/crash/vmcore.1</userinput>
- 4:Reading symbol data from /usr/src/sys/compile/URIAH/kernel
-...done.
- 5:IdlePTD 1f3000
- 6:panic: because you said to!
- 7:current pcb at 1e3f70
- 8:Reading in symbols for ../../i386/i386/machdep.c...done.
- 9:<prompt>(kgdb)</prompt> <userinput>backtrace</userinput>
-10:#0  boot (arghowto=256) (../../i386/i386/machdep.c line 767)
-11:#1  0xf0115159 in panic ()
-12:#2  0xf01955bd in diediedie () (../../i386/i386/machdep.c line 698)
-13:#3  0xf010185e in db_fncall ()
-14:#4  0xf0101586 in db_command (-266509132, -266509516, -267381073)
-15:#5  0xf0101711 in db_command_loop ()
-16:#6  0xf01040a0 in db_trap ()
-17:#7  0xf0192976 in kdb_trap (12, 0, -272630436, -266743723)
-18:#8  0xf019d2eb in trap_fatal (...)
-19:#9  0xf019ce60 in trap_pfault (...)
-20:#10 0xf019cb2f in trap (...)
-21:#11 0xf01932a1 in exception:calltrap ()
-22:#12 0xf0191503 in cnopen (...)
-23:#13 0xf0132c34 in spec_open ()
-24:#14 0xf012d014 in vn_open ()
-25:#15 0xf012a183 in open ()
-26:#16 0xf019d4eb in syscall (...)
-27:<prompt>(kgdb)</prompt> <userinput>up 10</userinput>
-28:Reading in symbols for ../../i386/i386/trap.c...done.
-29:#10 0xf019cb2f in trap (frame={tf_es = -260440048, tf_ds = 16, tf_\
-30:edi = 3072, tf_esi = -266445372, tf_ebp = -272630356, tf_isp = -27\
-31:2630396, tf_ebx = -266427884, tf_edx = 12, tf_ecx = -266427884, tf\
-32:_eax = 64772224, tf_trapno = 12, tf_err = -272695296, tf_eip = -26\
-33:6672343, tf_cs = -266469368, tf_eflags = 66066, tf_esp = 3072, tf_\
-34:ss = -266427884}) (../../i386/i386/trap.c line 283)
-35:283                             (void) trap_pfault(&amp;frame, FALSE);
-36:<prompt>(kgdb)</prompt> <userinput>frame frame-&gt;tf_ebp frame-&gt;tf_eip</userinput>
-37:Reading in symbols for ../../i386/isa/pcvt/pcvt_drv.c...done.
-38:#0  0xf01ae729 in pcopen (dev=3072, flag=3, mode=8192, p=(struct p\
-39:roc *) 0xf07c0c00) (../../i386/isa/pcvt/pcvt_drv.c line 403)
-40:403             return ((*linesw[tp-&gt;t_line].l_open)(dev, tp));
-41:<prompt>(kgdb)</prompt> <userinput>list</userinput>
-42:398
-43:399             tp-&gt;t_state |= TS_CARR_ON;
-44:400             tp-&gt;t_cflag |= CLOCAL;  /* cannot be a modem (:-) */
-45:401
-46:402     #if PCVT_NETBSD || (PCVT_FREEBSD &gt;= 200)
-47:403             return ((*linesw[tp-&gt;t_line].l_open)(dev, tp));
-48:404     #else
-49:405             return ((*linesw[tp-&gt;t_line].l_open)(dev, tp, flag));
-50:406     #endif /* PCVT_NETBSD || (PCVT_FREEBSD &gt;= 200) */
-51:407     }
-52:<prompt>(kgdb)</prompt> <userinput>print tp</userinput>
-53:Reading in symbols for ../../i386/i386/cons.c...done.
-54:$1 = (struct tty *) 0x1bae
-55:<prompt>(kgdb)</prompt> <userinput>print tp-&gt;t_line</userinput>
-56:$2 = 1767990816
-57:<prompt>(kgdb)</prompt> <userinput>up</userinput>
-58:#1  0xf0191503 in cnopen (dev=0x00000000, flag=3, mode=8192, p=(st\
-59:ruct proc *) 0xf07c0c00) (../../i386/i386/cons.c line 126)
-60:       return ((*cdevsw[major(dev)].d_open)(dev, flag, mode, p));
-61:<prompt>(kgdb)</prompt> <userinput>up</userinput>
-62:#2  0xf0132c34 in spec_open ()
-63:<prompt>(kgdb)</prompt> <userinput>up</userinput>
-64:#3  0xf012d014 in vn_open ()
-65:<prompt>(kgdb)</prompt> <userinput>up</userinput>
-66:#4  0xf012a183 in open ()
-67:<prompt>(kgdb)</prompt> <userinput>up</userinput>
-68:#5  0xf019d4eb in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi =\
-69: 2158592, tf_esi = 0, tf_ebp = -272638436, tf_isp = -272629788, tf\
-70:_ebx = 7086, tf_edx = 1, tf_ecx = 0, tf_eax = 5, tf_trapno = 582, \
-71:tf_err = 582, tf_eip = 75749, tf_cs = 31, tf_eflags = 582, tf_esp \
-72:= -272638456, tf_ss = 39}) (../../i386/i386/trap.c line 673)
-73:673             error = (*callp-&gt;sy_call)(p, args, rval);
-74:<prompt>(kgdb)</prompt> <userinput>up</userinput>
-75:Initial frame selected; you cannot go up.
-76:<prompt>(kgdb)</prompt> <userinput>quit</userinput></screen>
-    <para>Comments to the above script:</para>
-
-    <variablelist>
-      <varlistentry>
-	<term>line 6:</term>
-
-	<listitem>
-	  <para>This is a dump taken from within DDB (see below), hence the
-	    panic comment <quote>because you said to!</quote>, and a rather
-	    long stack trace; the initial reason for going into DDB has been a
-	    page fault trap though.</para>
-	</listitem>
-      </varlistentry>
-
-      <varlistentry>
-	<term>line 20:</term>
-
-	<listitem>
-	  <para>This is the location of function <function>trap()</function>
-	    in the stack trace.</para>
-	</listitem>
-      </varlistentry>
-
-      <varlistentry>
-	<term>line 36:</term>
-
-	<listitem>
-	  <para>Force usage of a new stack frame; this is no longer necessary.
-	    The stack frames are supposed to point to the right
-	    locations now, even in case of a trap.
-	    From looking at the code in source line 403, there is a
-	    high probability that either the pointer access for
-	    <quote>tp</quote> was messed up, or the array access was out of
-	    bounds.</para>
-	</listitem>
-      </varlistentry>
-
-      <varlistentry>
-	<term>line 52:</term>
-
-	<listitem>
-	  <para>The pointer looks suspicious, but happens to be a valid
-	    address.</para>
-	</listitem>
-      </varlistentry>
-
-      <varlistentry>
-	<term>line 56:</term>
-
-	<listitem>
-	  <para>However, it obviously points to garbage, so we have found our
-	    error! (For those unfamiliar with that particular piece of code:
-	    <literal>tp-&gt;t_line</literal> refers to the line discipline  of
-	    the console device here, which must be a rather small integer
-	    number.)</para>
-	</listitem>
-      </varlistentry>
-    </variablelist>
-
    <tip><para>If your system is crashing regularly and you are running
      out of disk space, deleting old <filename>vmcore</filename>
      files in <filename>/var/crash</filename> could save a
      considerable amount of disk space!</para></tip>
  </sect1>

-  <sect1 xml:id="kerneldebug-ddd">
-    <title>Debugging a Crash Dump with DDD</title>
-
-    <para>Examining a kernel crash dump with a graphical debugger like
-      <command>ddd</command> is also possible (you will need to install
-      the <package>devel/ddd</package> port in order to use the
-      <command>ddd</command> debugger).  Add the <option>-k</option>
-      option to the <command>ddd</command> command line you would use
-      normally.  For example;</para>
-
-    <screen>&prompt.root; <userinput>ddd --debugger kgdb kernel.debug /var/crash/vmcore.0</userinput></screen>
-
-    <para>You should then be able to go about looking at the crash dump using
-      <command>ddd</command>'s graphical interface.</para>
-  </sect1>
-
  <sect1 xml:id="kerneldebug-online-ddb">
    <title>On-Line Kernel Debugging Using DDB</title>

@ -481,7 +344,7 @@
      breakpoints, single-stepping kernel functions, examining and changing
      kernel variables, etc.  However, it cannot access kernel source files,
      and only has access to the global and static symbols, not to the full
-      debug information like <command>gdb</command> does.</para>
+      debug information like <command>kgdb</command> does.</para>

    <para>To configure your kernel to include DDB, add the options

@ -491,19 +354,13 @@
      to your config file, and rebuild.  (See <link xlink:href="&url.books.handbook;/index.html">The FreeBSD Handbook</link> for details on
      configuring the FreeBSD kernel).</para>

-    <note>
-      <para>If you have an older version of the boot blocks, your
-	debugger symbols might not be loaded at all.  Update the boot blocks;
-	the recent ones load the DDB symbols automatically.</para>
-    </note>
-
    <para>Once your DDB kernel is running, there are several ways to enter
-      DDB.  The first, and earliest way is to type the boot flag
-      <option>-d</option> right at the boot prompt.  The kernel will start up
+      DDB.  The first, and earliest way is to use the boot flag
+      <option>-d</option>.  The kernel will start up
      in debug mode and enter DDB prior to any device probing.  Hence you can
-      even debug the device probe/attach functions.  Users of &os.current;
-      will need to use the boot menu option, six, to escape to a command
-      prompt.</para>
+      even debug the device probe/attach functions.  To use this, exit
+      the loader's boot menu and enter <command>boot -d</command> at
+      the loader prompt.</para>

    <para>The second scenario is to drop to the debugger once the
      system has booted.  There are two simple ways to accomplish
@ -511,10 +368,6 @@
      command prompt, simply type the command:</para>

    <screen>&prompt.root; <userinput>sysctl debug.kdb.enter=1</userinput></screen>
-    <note>
-      <para>To force a panic on the fly, issue the following command:</para>
-      <screen>&prompt.root; <userinput>sysctl debug.kdb.panic=1</userinput></screen>
-    </note>

    <para>Alternatively, if you are at the system console, you may use
      a hot-key on the keyboard.  The default break-to-debugger
@ -556,15 +409,13 @@

    <screen><userinput>continue</userinput></screen>

-    <para>To get a stack trace, use:</para>
+    <para>To get a stack trace of the current thread, use:</para>

    <screen><userinput>trace</userinput></screen>

-    <note>
-      <para>Note that when entering DDB via a hot-key, the kernel is currently
-	servicing an interrupt, so the stack trace might be not of much use
-	to you.</para>
-    </note>
+    <para>To get a stack trace of an arbitrary thread, specify a
+      process ID or thread ID as a second argument to
+      <command>trace</command>.</para>

    <para>If you want to remove a breakpoint, use</para>

@ -662,10 +513,7 @@
    <screen><userinput>panic</userinput></screen>

    <para>This will cause your kernel to dump core and reboot, so you can
-      later analyze the core on a higher level with <command>gdb</command>.
-      This command
-      usually must be followed by another <command>continue</command>
-      statement.</para>
+      later analyze the core on a higher level with &man.kgdb.1;.</para>

    <screen><userinput>call boot(0)</userinput></screen>

@ -675,7 +523,7 @@
      the disk and filesystem interfaces of the kernel are not damaged, this
      could be a good way for an almost clean shutdown.</para>

-    <screen><userinput>call cpu_reset()</userinput></screen>
+    <screen><userinput>reset</userinput></screen>

    <para>This is the final way out of disaster and almost the same as hitting the
      Big Red Button.</para>