doc/FAQ/hackers.sgml

<!-- $Id: hackers.sgml,v 1.17 1999-07-28 20:26:06 nik Exp $ -->
<!-- The FreeBSD Documentation Project -->

  <sect>
    <heading>For serious FreeBSD hackers only<label id="hackers"></heading>

    <sect1>
      <heading>
        What are SNAPs and RELEASEs?
      </heading>

      <p>There are currently three active/semi-active branches in the FreeBSD
      <url url="http://www.FreeBSD.org/cgi/cvsweb.cgi" name="CVS Repository">:

      <itemize>
        <item><bf/RELENG_2_2/   AKA <bf/2.2-stable/ AKA <bf/"2.2 branch"/
        <item><bf/RELENG_3/     AKA <bf/3.x-stable/ AKA <bf/"3.0 branch"/
        <item><bf/HEAD/         AKA <bf/-current/ AKA <bf/4.0-current/
      </itemize>

      <p><bf/HEAD/ is not an actual branch tag, like the other two, it's
      simply a symbolic constant for
      <em/"the current, non-branched development stream"/ which we simply
      refer to as <bf/-current/.

      <p>Right now, <bf/-current/ is the 4.0 development stream and the
      <bf/3.0-stable/ branch, <bf/RELENG_3/, forked off from
      <bf/-current/ in Jan 1999.

      <p>The <bf/2.2-stable/ branch, <bf/RELENG_2_2/, departed -current in
      November 1996.

      <p>The <bf/2.1-stable/ branch, <bf/RELENG_2_1_0/, departed -current in
      September of 1994.  This branch has been fully retired.

    <sect1>
      <heading>
        How do I make my own custom release?<label id="custrel">
      </heading>

      <p>To make a release you need to do three things: First, you need to
      be running a kernel with the <htmlurl
      url="http://www.FreeBSD.org/cgi/man.cgi?vn" name="vn"> driver configured
      in.  Add this to your kernel config file and build a new kernel:

      <verb>
        pseudo-device vn         #Vnode driver (turns a file into a device)
      </verb>

      <p>Second, you have to have the whole CVS repository at hand.
      To get this you can use <url url="../handbook/synching.html#CVSUP" name="CVSUP">
      but in your supfile set the release name to cvs and remove any tag or
      date fields:

      <verb>
        *default prefix=/home/ncvs
        *default base=/a
        *default host=cvsup.FreeBSD.org
        *default release=cvs
        *default delete compress use-rel-suffix

        ## Main Source Tree
        src-all
        src-eBones
        src-secure

        # Other stuff
        ports-all
        www
        doc-all
      </verb>

      <p>Then run <tt/cvsup -g supfile/ to suck all the good bits onto your
      box...

      <p>Finally, you need a chunk of empty space to build into. Let's
      say it's in <tt>/some/big/filesystem</tt>, and from the example
      above you've got the CVS repository in <tt>/home/ncvs</tt>:

      <verb>
        setenv CVSROOT /home/ncvs        # or export CVSROOT=/home/ncvs
        cd /usr/src/release
        make release BUILDNAME=3.0-MY-SNAP CHROOTDIR=/some/big/filesystem/release
      </verb>

      <p>An entire release will be built in
      <tt>/some/big/filesystem/release</tt> and you will have a full FTP-type
      installation in <tt>/some/big/filesystem/release/R/ftp</tt> when you're
      done.  If you want to build your SNAP along some other branch than
      -current, you can also add <tt/RELEASETAG=SOMETAG/ to
      the make release command line above, e.g. <tt/RELEASETAG=RELENG_2_2/
      would build an up-to-the- minute 2.2-STABLE snapshot.

    <sect1>
      <heading>How do I create customized installation disks?</heading>

      <p>The entire process of creating installation disks and source and
      binary archives is automated by various targets in
      <tt>/usr/src/release/Makefile</tt>.  The information there should
      be enough to get you started.  However, it should be said that this
      involves doing a ``make world'' and will therefore take up a lot of
      time and disk space.

    <sect1>
      <heading>``make world'' clobbers my existing installed binaries.</heading>

      <p>Yes, this is the general idea; as its name might suggest,
      ``make world'' rebuilds every system binary from scratch, so you can be
      certain of having a clean and consistent environment at the end (which
      is why it takes so long).

      <p>If the environment variable <tt/DESTDIR/ is defined while running
      ``<tt/make world/'' or ``<tt/make install/'', the newly-created
      binaries will be deposited in a directory tree identical to the
      installed one, rooted at <tt>&dollar;&lcub;DESTDIR&rcub;</tt>.
      Some random combination of shared libraries modifications and
      program rebuilds can cause this to fail in ``<tt/make world/'',
      however.

    <sect1>
      <heading>
        When my system boots, it says ``(bus speed defaulted)''.
      </heading>

      <p>The Adaptec 1542 SCSI host adapters allow the user to configure
      their bus access speed in software.  Previous versions of the
      1542 driver tried to determine the fastest usable speed and set
      the adapter to that.  We found that this breaks some users'
      systems, so you now have to define the ``<tt/TUNE&lowbar;1542/'' kernel
      configuration option in order to have this take place.  Using it
      on those systems where it works may make your disks run faster,
      but on those systems where it doesn't, your data could be
      corrupted.

    <sect1>
      <heading>
        Can I follow current with limited Internet access?<label id="ctm">
      </heading>

      <p>Yes, you can do this <tt /without/ downloading the whole source tree
      by using the <url url="../handbook/synching.html#CTM" name="CTM facility.">

    <sect1>
      <heading>How did you split the distribution into 240k files?</heading>

      <p>Newer BSD based systems have a ``<tt/-b/'' option to split that
      allows them to split files on arbitrary byte boundaries.

      <p>Here is an example from <tt>/usr/src/Makefile</tt>.

      <verb>
        bin-tarball:
        (cd $&lcub;DISTDIR&rcub;; \
        tar cf - . \
        gzip --no-name -9 -c | \
        split -b 240640 - \
        $&lcub;RELEASEDIR&rcub;/tarballs/bindist/bin_tgz.)
      </verb>

    <sect1>
      <heading>I've written a kernel extension, who do I send it to?</heading>

      <p>Please take a look at <url url="../handbook/contrib.html"
      name="The Handbook entry on how to submit code.">

      <p>And thanks for the thought!

    <sect1>
      <heading>How are Plug N Play ISA cards detected and initialized?</heading>

      <p>By: <url url="mailto:uhclem@nemesis.lonestar.org"
      name="Frank Durda IV">

      <p>In a nutshell, there a few I/O ports that all of the PnP boards
      respond to when the host asks if anyone is out there.  So when
      the PnP probe routine starts, he asks if there are any PnP boards
      present, and all the PnP boards respond with their model &num; to
      a I/O read of the same port, so the probe routine gets a wired-OR
      ``yes'' to that question.  At least one bit will be on in that
      reply.  Then the probe code is able to cause boards with board
      model IDs (assigned by Microsoft/Intel) lower than X to go
      ``off-line''.  It then looks to see if any boards are still
      responding to the query.  If the answer was ``<tt/0/'', then
      there are no boards with IDs above X.  Now probe asks if there
      are any boards below ``X''.  If so, probe knows there are boards
      with a model numbers below X.  Probe then asks for boards greater
      than X-(limit/4) to go off-line.  If repeats the query.  By
      repeating this semi-binary search of IDs-in-range enough times,
      the probing code will eventually identify all PnP boards present
      in a given machine with a number of iterations that is much lower
      than what 2^64 would take.

      <p>The IDs are two 32-bit fields (hence 2&circ;64) + 8 bit checksum.
      The first 32 bits are a vendor identifier.  They never come out
      and say it, but it appears to be assumed that different types of
      boards from the same vendor could have different 32-bit vendor
      ids.  The idea of needing 32 bits just for unique manufacturers
      is a bit excessive.

      <p>The lower 32 bits are a serial &num;, ethernet address, something
      that makes this one board unique.  The vendor must never produce
      a second board that has the same lower 32 bits unless the upper
      32 bits are also different.  So you can have multiple boards of
      the same type in the machine and the full 64 bits will still be
      unique.

      <p>The 32 bit groups can never be all zero.  This allows the
      wired-OR to show non-zero bits during the initial binary search.

      <p>Once the system has identified all the board IDs present, it will
      reactivate each board, one at a time (via the same I/O ports),
      and find out what resources the given board needs, what interrupt
      choices are available, etc.  A scan is made over all the boards
      to collect this information.

      <p>This info is then combined with info from any ECU files on the
      hard disk or wired into the MLB BIOS.  The ECU and BIOS PnP
      support for hardware on the MLB is usually synthetic, and the
      peripherals don't really do genuine PnP.  However by examining
      the BIOS info plus the ECU info, the probe routines can cause the
      devices that are PnP to avoid those devices the probe code cannot
      relocate.

      <p>Then the PnP devices are visited once more and given their I/O,
      DMA, IRQ and Memory-map address assignments.  The devices will
      then appear at those locations and remain there until the next
      reboot, although there is nothing that says you can't move them
      around whenever you want.

      <p>There is a lot of oversimplification above, but you should get
      the general idea.

      <p>Microsoft took over some of the primary printer status ports to
      do PnP, on the logic that no boards decoded those addresses for
      the opposing I/O cycles.  I found a genuine IBM printer board
      that did decode writes of the status port during the early PnP
      proposal review period, but MS said ``tough''.  So they do a
      write to the printer status port for setting addresses, plus that
      use that address + <tt/0x800/, and a third I/O port for reading
      that can be located anywhere between <tt/0x200/ and <tt/0x3ff/.

    <sect1>
      <heading>Does FreeBSD support architectures other than the x86?</heading>

      <p>Several groups of people have expressed interest in working on
      multi-architecture ports for FreeBSD and the FreeBSD/AXP (ALPHA)
      port is one such effort which has been quite successful, now
      available in 3.0 SNAPshot release form at <url
      url="ftp://ftp.FreeBSD.org/pub/FreeBSD/alpha/"
      name="ftp://ftp.FreeBSD.org/pub/FreeBSD/alpha">.  The ALPHA
      port currently runs  on a growing number of ALPHA machine
      types, among them the AlphaStation, AXPpci, PC164, Miata and Multia
      models.  This port is not yet considered a full release and won't be
      until a full compliment of system installation tools and a distribution
      on CDROM installation media is available, including a reasonable
      number of working ports and packages.
      FreeBSD/AXP should be considered BETA quality software at this
      time.  For status information, please join the
      <tt>&lt;freebsd-alpha@FreeBSD.org&gt;</tt><ref id="mailing"
      name="mailing list">.

      Interest has also been expressed in a port of FreeBSD to
      the SPARC architecture, join the <tt>&lt;freebsd-sparc@FreeBSD.org&gt;
      </tt><ref id="mailing" name="mailing list"> if you are interested
      in joining that project.  For general discussion on new architectures,
      join the <tt>&lt;freebsd-platforms@FreeBSD.org&gt;</tt>
      <ref id="mailing" name="mailing list">.

    <sect1>
      <heading>I need a major number for a device driver I've written.</heading>

      <p>This depends on whether or not you plan on making the driver
      publicly available.  If you do, then please send us a copy of the
      driver source code, plus the appropriate modifications to
      <tt>files.i386</tt>, a sample configuration file entry, and the
      appropriate <htmlurl url="http://www.FreeBSD.org/cgi/man.cgi?MAKEDEV"
      name="MAKEDEV"> code to create any special files your device uses.  If
      you do not, or are unable to because of licensing restrictions, then
      character major number 32 and block major number 8 have been reserved
      specifically for this purpose; please use them.  In any case, we'd
      appreciate hearing about your driver on
      <tt>&lt;freebsd-hackers@FreeBSD.org&gt;</tt>.


    <sect1>
      <heading>Alternative layout policies for directories</heading>

      <p>
      In answer to the question of alternative layout policies for
      directories, the scheme that is currently in use is unchanged
      from what I wrote in 1983. I wrote that policy for the original
      fast filesystem, and never revisited it. It works well at keeping
      cylinder groups from filling up. As several of you have noted,
      it works poorly for find. Most filesystems are created from
      archives that were created by a depth first search (aka ftw).
      These directories end up being striped across the cylinder groups
      thus creating a worst possible senario for future depth first
      searches. If one knew the total number of directories to be
      created, the solution would be to create (total / fs_ncg) per
      cylinder group before moving on. Obviously, one would have to
      create some heuristic to guess at this number. Even using a
      small fixed number like say 10 would make an order of magnitude
      improvement. To differentiate restores from normal operation
      (when the current algorithm is probably more sensible), you
      could use the clustering of up to 10 if they were all done
      within a ten second window. Anyway, my conclusion is that this
      is an area ripe for experimentation.</p>

      <p>Kirk McKusick, September 1998</p>

    <sect1>
      <heading>Making the most of a kernel panic</heading>

      <p>
      <em>[This section was extracted from a mail written by <url
      url="mailto:wpaul@FreeBSD.org" name="Bill Paul"> on the
      freebsd-current <ref id="mailing" name="mailing list"> by <url
      url="mailto:des@FreeBSD.org" name="Dag-Erling Co&iuml;dan
      Sm&oslash;rgrav">, who fixed a few typos and added the bracketed
      comments]</em>

      <p>
      <verb>
From: Bill Paul <wpaul@skynet.ctr.columbia.edu>
Subject: Re: the fs fun never stops
To: ben@rosengart.com
Date: Sun, 20 Sep 1998 15:22:50 -0400 (EDT)
Cc: current@FreeBSD.org
      </verb>

      <p>
      <em>[&lt;ben@rosengart.com&gt; posted the following panic
      message]</em>
      <verb>
> Fatal trap 12: page fault while in kernel mode
> fault virtual address   = 0x40
> fault code              = supervisor read, page not present
> instruction pointer     = 0x8:0xf014a7e5
                                ^^^^^^^^^^
> stack pointer           = 0x10:0xf4ed6f24
> frame pointer           = 0x10:0xf4ed6f28
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, def32 1, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 80 (mount)
> interrupt mask          =
> trap number             = 12
> panic: page fault
      </verb>

      <p> [When] you see a message like this, it's not enough to just
      reproduce it and send it in. The instruction pointer value that
      I highlighted up there is important; unfortunately, it's also
      configuration dependent. In other words, the value varies
      depending on the exact kernel image that you're using. If you're
      using a GENERIC kernel image from one of the snapshots, then
      it's possible for somebody else to track down the offending
      function, but if you're running a custom kernel then only
      <em/you/ can tell us where the fault occured.

      <p> What you should do is this:

      <itemize>
        <item>Write down the instruction pointer value. Note that the
        <tt/0x8:/ part at the begining is not significant in this case:
        it's the <tt/0xf0xxxxxx/ part that we want.
	<item>When the system reboots, do the following:
	  <verb>
% nm /kernel.that.caused.the.panic | grep f0xxxxxx
          </verb>
	  where <tt/f0xxxxxx/ is the instruction pointer value. The
	  odds are you will not get an exact match since the symbols
	  in the kernel symbol table are for the entry points of
	  functions and the instruction pointer address will be
	  somewhere inside a function, not at the start. If you don't
	  get an exact match, omit the last digit from the instruction
	  pointer value and try again, i.e.:
	  <verb>
% nm /kernel.that.caused.the.panic | grep f0xxxxx
	  </verb>
	  If that doesn't yield any results, chop off another digit.
	  Repeat until you get some sort of output. The result will be
	  a possible list of functions which caused the panic. This is
	  a less than exact mechanism for tracking down the point of
	  failure, but it's better than nothing.
      </itemize>

      <p> I see people constantly show panic messages like this but
      rarely do I see someone take the time to match up the
      instruction pointer with a function in the kernel symbol table.

      <p> The best way to track down the cause of a panic is by
      capturing a crash dump, then using <tt/gdb(1)/ to to a stack
      trace on the crash dump. Of course, this depends on <tt/gdb(1)/
      in -current working correctly, which I can't guarantee (I recall
      somebody saying that the new ELF-ized <tt/gdb(1)/ didn't handle
      kernel crash dumps correctly: somebody should check this before
      3.0 goes out of beta or there'll be a lot of red faces after the
      CDs ship).

      <p>
      In any case, the method I normally use is this:

      <itemize>
        <item>Set up a kernel config file, optionally adding 'options DDB' if you
	think you need the kernel debugger for something. (I use this mainly
	for setting beakpoints if I suspect an infinite loop condition of
	some kind.)
        <item>Use <tt/config -g KERNELCONFIG/ to set up the build directory.
        <item><tt>cd /sys/compile/KERNELCONFIG; make</tt>
        <item>Wait for kernel to finish compiling.
        <item><tt/cp kernel kernel.debug/
        <item><tt/strip -d kernel/
        <item><tt/mv /kernel /kernel.orig/
        <item><tt>cp kernel /</tt>
        <item>reboot
      </itemize>

      <p> <em>[Note: Now that FreeBSD 3.x kernels are Elf by default,
      you should use <tt/strip -g/ instead of <tt/strip -d/. If for some
      reason your kernel is still a.out, use <tt/strip -aout -d/.]</em>

      <p> Note that YOU DO <em/NOT/ WANT TO ACTUALLY BOOT THE KERNEL
      WITH ALL THE DEBUG SYMBOLS IN IT. A kernel compiled with <tt/-g/
      can easily be close to 10MB in size. You don't have to actually
      boot this massive image: you only need it later for <tt/gdb(1)/
      (<tt/gdb(1)/ wants the symbol table). Instead, you want to keep
      a copy of the full image and create a second image with the
      debug symbols stripped out using <tt/strip -d/. It is this
      second stripped image that you want to boot.

      <p> To make sure you capture a crash dump, you need edit
      <tt>/etc/rc.conf</tt> and set <tt/dumpdev/ to point to your swap
      partition. This will cause the <tt/rc(8)/ scripts to use the
      <tt/dumpon(8)/ command to enable crash dumps. You can also run
      <tt/dumpon(8)/ manually. After a panic, the crash dump can be
      recovered using <tt/savecore(8)/; if <tt/dumpdev/ is set in
      <tt>/etc/rc.conf</tt>, the <tt/rc(8)/ scripts will run
      <tt/savecore(8)/ automatically and put the crash dump in
      <tt>/var/crash</tt>.

      <p> NOTE: FreeBSD crash dumps are usually the same size as the
      physical RAM size of your machine. That is, if you have 64MB of
      RAM, you will get a 64MB crash dump. Therefore you must make sure
      there's enough space in <tt>/var/crash</tt> to hold the dump.
      Alternatively, you run <tt/savecore(8)/ manually and have it
      recover the crash dump to another directory where you have more
      room. It's possible to limit the size of the crash dump by using
      <tt/options MAXMEM=(foo)/ to set the amount of memory the kernel
      will use to something a little more sensible. For example, if
      you have 128MB of RAM, you can limit the kernel's memory usage
      to 16MB so that your crash dump size will be 16MB instead of
      128MB.

      <p> Once you have recovered the crash dump, you can get a stack
      trace with <tt/gdb(1)/ as follows:

      <p>
      <verb>
% gdb -k /sys/compile/KERNELCONFIG/kernel.debug /var/crash/vmcore.0
(gdb) where
      </verb>

      <p> Note that there may be several screens worth of information;
      ideally you should use <tt/script(1)/ to capture all of them.
      Using the unstripped kernel image with all the debug symbols
      should show the exact line of kernel source code where the panic
      occured. Usually you have to read the stack trace from the
      bottom up in order to trace the exact sequence of events that
      lead to the crash. You can also use <tt/gdb(1)/ to print out the
      contents of various variables or structures in order to examine
      the system state at the time of the crash.

      <p> Now, if you're really insane and have a second computer, you
      can also configure <tt/gdb(1)/ to do remote debugging such that
      you can use <tt/gdb(1)/ on one system to debug the kernel on
      another system, including setting breakpoints, single-stepping
      through the kernel code, just like you can do with a normal
      user-mode program. I haven't played with this yet as I don't
      often have the chance to set up two machines side by side for
      debugging purposes.

      <p> <em>[Bill adds: "I forgot to mention one thing: if you have
      DDB enabled and the kernel drops into the debugger, you can
      force a panic (and a crash dump) just by typing 'panic' at the
      ddb prompt. It may stop in the debugger again during the panic
      phase. If it does, type 'continue' and it will finish the crash
      dump." -ed]</em>

    <sect1>
      <heading>dlsym() stopped working for ELF executables!</heading>

      <p>The ELF toolchain does not, by default, make the symbols
      defined in an executable visible to the dynamic linker.
      Consequently <tt>dlsym()</tt> searches on handles obtained
      from calls to <tt>dlopen(NULL, flags)</tt> will fail to find
      such symbols.

      <p>If you want to search, using <tt>dlsym()</tt>, for symbols
      present in the main executable of a process, you need to link
      the executable using the <tt>-export-dynamic</tt> option to the
      <htmlurl url="http://www.FreeBSD.org/cgi/man.cgi?ld"
      name="ELF linker">.


    <sect1>
      <heading>Increasing or reducing the kernel address space</heading>

      <p>
      By default, the kernel address space is 256 MB on FreeBSD 3.x
      and 1 GB on FreeBSD 4.x. If you run a network-intensive server
      (e.g. a large FTP or HTTP server), you might find that 256 MB is
      not enough.

      <p>
      So how do you increase the address space? There are two aspects
      to this. First, you need to tell the kernel to reserve a larger
      portion of the address space for itself. Second, since the
      kernel is loaded at the top of the address space, you need to
      lower the load address so it doesn't bump its head against the
      ceiling.

      <p>
      The first goal is achieved by increasing the value of
      <tt/NKPDE/ in <tt>src/sys/i386/include/pmap.h</tt>. Here's what
      it looks like for a 1 GB address space:

      <verb>
#ifndef NKPDE
#ifdef SMP
#define NKPDE                   254     /* addressable number of page tables/pde's */
#else
#define NKPDE                   255     /* addressable number of page tables/pde's */
#endif  /* SMP */
#endif
      </verb>

      <p>
      To find the correct value of <tt/NKPDE/, divide the desired
      address space size (in megabytes) by four, then subtract one for
      UP and two for SMP.

      <p>
      To achieve the second goal, you need to compute the correct load
      address: simply subtract the address space size (in bytes) from
      0x100100000; the result is 0xc0100000 for a 1 GB address space.
      Set <tt/LOAD_ADDRESS/ in <tt>src/sys/i386/conf/Makefile.i386</tt>
      to that value; then set the location counter in the beginning of
      the section listing in <tt>src/sys/i386/conf/kernel.script</tt>
      to the same value, as follows:

      <verb>
OUTPUT_FORMAT("elf32-i386", "elf32-i386", "elf32-i386")
OUTPUT_ARCH(i386)
ENTRY(btext)
SEARCH_DIR(/usr/lib); SEARCH_DIR(/usr/obj/elf/home/src/tmp/usr/i386-unknown-freebsdelf/lib);
SECTIONS
{
  /* Read-only sections, merged into text segment: */
  . = 0xc0100000 + SIZEOF_HEADERS;
  .interp     : { *(.interp)    }
      </verb>

      <p>
      Then reconfig and rebuild your kernel. You will probably have
      problems with <tt/ps(1)/, <tt/top(1)/ and the like; <tt/make
      world/ should take care of it (or a manual rebuild of
      <tt/libkvm/, <tt/ps/ and <tt/top/ after copying the patched
      <tt/pmap.h/ to <tt>/usr/include/vm/</tt>.

      <p>
      NOTE: the size of the kernel address space must be a multiple of
      four megabytes.

      <p>
      [<url url="mailto:dg@FreeBSD.org" name="David Greenman">
      adds: <em> I think the kernel address space needs to be a power
      of two, but I'm not certain about that. The old(er) boot code
      used to monkey with the high order address bits and I think
      expected at least 256MB granularity.]</em>

  </sect>