doc/FAQ/troubleshoot.sgml

<!-- $Id: troubleshoot.sgml,v 1.9 1998-12-05 00:24:11 dwhite Exp $ -->
<!-- The FreeBSD Documentation Project -->

  <sect>
    <heading>Troubleshooting<label id="troubleshoot"></heading>

    <sect1>
      <heading>I have bad blocks on my hard drive!<label id="awre"></heading>

      <p>With SCSI drives, the drive should be capable of re-mapping
      these automatically.  However, many drives are shipped with
      this feature disabled, for some mysterious reason...

      <p>To enable this, you'll need to edit the first device page mode,
      which can be done on FreeBSD by giving the command (as root)

      <verb>
        scsi -f /dev/rsd0c -m 1 -e -P 3
      </verb>

      <p>and changing the values of AWRE and ARRE from 0 to 1:-

      <verb>
        AWRE (Auto Write Reallocation Enbld):  1
        ARRE (Auto Read Reallocation Enbld):  1
      </verb>

      <p>The following paragraphs were submitted by
      <url url="mailto:tedm@toybox.placo.com" name="Ted Mittelstaedt">:

      <p>For IDE drives, any bad block is usually a sign of potential trouble.
      All modern IDE drives come with internal bad-block remapping turned
      on.  All IDE hard drive manufacturers today offer extensive
      warranties and will replace drives with bad blocks on them.

      <p>If you still want to attempt to rescue an IDE drive with bad blocks,
      you can attempt to download the IDE drive manufacturer's IDE diagnostic
      program, and run this against the drive.  Sometimes these programs can
      be set to force the drive electronics to rescan the drive for bad blocks
      and lock them out.

      <p>For ESDI, RLL and MFM drives, bad blocks are a normal part of the
      drive and are no sign of trouble, generally.  With a PC, the disk drive
      controller card and BIOS handle the task of locking out bad sectors.
      This is fine for operating systems like DOS that use BIOS code to
      access the disk.  However, FreeBSD's disk driver does not go through
      BIOS, therefore a mechanism, bad144, exists that replaces this
      functionality.  bad144 only works with the wd driver,
      it is NOT able to be used with SCSI.  bad144 works by entering all bad
      sectors found into a special file.

      <p>One caveat with bad144 - the bad block special file is placed on the
      last track of the disk.  As this file may possibly contain a listing for
      a bad sector that would occur near the beginning of the disk, where the
      /kernel file might be located, it therefore must be accessible to the
      bootstrap program that uses BIOS calls to read the kernel file.  This
      means that the disk with bad144 used on it must not exceed 1024
      cylinders, 16 heads, and 63 sectors.  This places an effective limit
      of 500MB on a disk that is mapped with bad144.

      <p>To use bad144, simply set the "Bad Block" scanning to ON in the
      FreeBSD fdisk screen during the initial install.  This works up through
      FreeBSD 2.2.7. The disk must have less than 1024 cylinders.  It is
      generally recommended that the disk drive has been in operation for at
      least 4 hours prior to this to allow for thermal expansion and track
      wandering.

      <p>If the disk has more than 1024 cylinders (such as a large ESDI drive)
      the ESDI controller uses a special translation mode to make it work
      under DOS. The wd driver understands about these translation modes,
      IF you enter the "translated" geometry with the "set geometry" command
      in fdisk.  You must also NOT use the "dangerously dedicated" mode of
      creating the FreeBSD partition, as this ignores the geometry.  Also,
      even though fdisk will use your overridden geometry, it still knows the
      true size of the disk, and will attempt to create a too large FreeBSD
      partition.  If the disk geometry is changed to the translated geometry,
      the partition MUST be manually created with the number of blocks.

      <p>A quick trick to use is to set up the large ESDI disk with the ESDI
      controller, boot it with a DOS disk and format it with a DOS partition.
      Then, boot the FreeBSD install and in the fdisk screen, read off and
      write down the blocksize and block numbers for the DOS partition.  Then,
      reset the geometry to the same that DOS uses, delete the DOS partition,
      and create a "cooperative" FreeBSD partition using the blocksize you
      recorded earlier.  Then, set the partition bootable and turn on bad
      block scanning.  During the actual install, bad144 will run first,
      before any filesystems are created.  (you can view this with an Alt-F2)
      If it has any trouble creating the badsector file, you have set too
      large a disk geometry - reboot the system and start all over again
      (including repartitioning and reformatting with DOS).

      <p>If remapping is enabled and you are seeing bad blocks, consider
      replacing the drive. The bad blocks will only get worse as time goes on.

    <sect1>
      <heading>FreeBSD does not recognize my Bustek 742a EISA SCSI!</heading>

      <p>This info is specific to the 742a but may also cover other
      Buslogic cards.  (Bustek = Buslogic)

      <p>There are 2 general ``versions'' of the 742a card.  They are
      hardware revisions A-G, and revisions H - onwards.  The revision
      letter is located after the Assembly number on the edge of the
      card.  The 742a has 2 ROM chips on it, one is the BIOS chip and
      the other is the Firmware chip.  FreeBSD doesn't care what
      version of BIOS chip you have but it does care about what version
      of firmware chip.  Buslogic will send upgrade ROMS out if you
      call their tech support dept.  The BIOS and Firmware chips are
      shipped as a matched pair.  You must have the most current
      Firmware ROM in your adapter card for your hardware revision.

      <p>The REV A-G cards can only accept BIOS/Firmware sets up to
      2.41/2.21.  The REV H- up cards can accept the most current
      BIOS/Firmware sets of 4.70/3.37. The difference between the
      firmware sets is that the 3.37 firmware supports ``round robin''

      <p>The Buslogic cards also have a serial number on them.  If you
      have a old hardware revision card you can call the Buslogic RMA
      department and give them the serial number and attempt to
      exchange the card for a newer hardware revision.  If the card is
      young enough they will do so.

      <p>FreeBSD 2.1 only supports Firmware revisions 2.21 onward.  If you
      have a Firmware revision older than this your card will not be
      recognized as a Buslogic card.  It may be recognized as an
      Adaptec 1540, however.  The early Buslogic firmware contains an
      AHA1540 ``emulation'' mode.  This is not a good thing for an EISA
      card, however.

      <p>If you have an old hardware revision card and you obtain the 2.21
      firmware for it, you will need to check the position of jumper W1
      to B-C, the default is A-B.

      <p>The 742a EISA cards never had the ``&gt;16MB'' problem mentioned in
      the section <ref id="bigram" name="on &gt;16 MB machines">. This is a
      problem that occurs with the Vesa-Local Buslogic SCSI cards.

    <sect1>
      <heading>
        My HP Netserver's SCSI controller is not detected!
      </heading>

      <p>This is basically a known problem.  The EISA on-board SCSI controller
      in the HP Netserver machines occupies EISA slot number 11, so all
      the ``true'' EISA slots are in front of it.  Alas, the address space
      for EISA slots >= 10 collides with the address space assigned to PCI,
      and FreeBSD's auto-configuration currently cannot handle this
      situation very well.

      <p>So now, the best you can do is to pretend there is no address
      range clash :), by bumping the kernel option <tt/EISA_SLOTS/
      to a value of 12.
      Configure and compile a kernel, as described in the
      <url url="../handbook/kernelconfig.html"
      name="Handbook entry on configuring the kernel">.

      <p>Of course, this does present you with a chicken-and-egg problem when
      installing on such a machine.  In order to work around this
      problem, a special hack is available inside <em>UserConfig</em>.
      Do not use the ``visual'' interface, but the plain command-line
      interface there.  Simply type

      <verb>
        eisa 12
        quit
      </verb>

      <p>at the prompt, and install your system as usual.  While it's
      recommended you compile and install a custom kernel anyway,

      <htmlurl url="http://www.freebsd.org/cgi/man.cgi?dset" name="dset">
      now also understands to save this value.

      <p>Hopefully, future versions will have a proper fix for this problem.

      <p><tt/NOTE:/ You can not use a <bf/dangerously dedicated/ disk with
      an HP Netserver. See <ref id="dedicate" name="this note"> for
      more info.

    <sect1>
      <heading>What's up with this CMD640 IDE controller?</heading>

      <p>It's broken.  It cannot handle commands on both channels
      simultaneously.

      <p>There's a workaround available now and it is enabled automatically
      if your system uses this chip. For the details refer to the
      manual page of the disk driver (man 4 wd).

      <p>If you're already running FreeBSD 2.2.1 or 2.2.2 with a
      CMD640 IDE controller and you want to use the second channel,
      build a new kernel with <tt/options "CMD640"/ enabled. This
      is the default for 2.2.5 and later.

    <sect1>
      <heading>I keep seeing messages like ``<tt/ed1: timeout/''.</heading>

      <p>This is usually caused by an interrupt conflict (e.g., two boards
      using the same IRQ).  FreeBSD prior to 2.0.5R used to be tolerant
      of this, and  the  network driver  would  still function  in  the
      presence  of IRQ conflicts.  However, with  2.0.5R and later, IRQ
      conflicts are no  longer tolerated.  Boot with the -c option and
      change the ed0/de0/... entry to match your board.

      <p>If you're using the BNC connector on your network card, you may
      also see device timeouts because of bad termination.  To check this,
      attach a terminator directly to the NIC (with no cable) and see if
      the error messages go away.

      <p>Some NE2000 compatible cards will give this error if there is
      no link on the UTP port or if the cable is disconnected.

    <sect1>
      <heading>When I mount a CDROM, I get ``Incorrect super block''.</heading>

      <p>You have to tell
      <htmlurl url="http://www.freebsd.org/cgi/man.cgi?mount" name="mount">
      the type of the device that you want to mount.  By default,
      <htmlurl url="http://www.freebsd.org/cgi/man.cgi?mount" name="mount">
      will assume the filesystem is of type ``<tt/ufs/''.  You want to mount
      a CDROM filesystem, and you do this by specifying the ``<tt/-t cd9660/''
      option to <htmlurl url="http://www.freebsd.org/cgi/man.cgi?mount"
      name="mount">.  This does, of course, assume that the
      CDROM contains an ISO 9660 filesystem, which is what most CDROMs
      have.  As of 1.1R, FreeBSD automatically understands the Rock Ridge
      (long filename) extensions as well.

      <p>As an example, if you want to mount the CDROM device,
      ``<tt>/dev/cd0c</tt>'', under <tt>/mnt</tt>, you would execute:

      <verb>
        mount -t cd9660 /dev/cd0c /mnt
      </verb>

      <p>Note that your device name (``<tt>/dev/cd0c</tt>'' in this
      example) could be different, depending on the CDROM interface.
      Note that the ``<tt/-t cd9660/'' option just causes the
      ``<tt/mount&lowbar;cd9660/'' command to be executed, and so the
      above example could be shortened to:

      <verb>
        mount_cd9660 /dev/cd0c /mnt
      </verb>

    <sect1>
      <heading>When I mount a CDROM, I get ``Device not configured''.</heading>

      <p>This generally means that there is no CDROM in the CDROM drive,
      or the drive is not visible on the bus. Feed the drive
      something, and/or check its master/slave status if it is
      IDE (ATAPI). It can take a couple of seconds for a CDROM drive
      to notice that it's been fed, so be patient.

      <p>Sometimes a SCSI CD-ROM may be missed because it hadn't enough time
      to answer the bus reset. If you have a SCSI CD-ROM please try to
      add the following symbol into your  kernel configuration file
      and recompile.

      <verb>
        options "SCSI_DELAY=15"
      </verb>

    <sect1>
      <heading>My printer is ridiculously slow. What can I do ?</heading>

      <p>If it's parallel, and the only problem is that it's terribly
      slow, try setting your printer port into ``polled'' mode:

      <verb>
        lptcontrol -p
      </verb>

      <p>Some newer HP printers are claimed not to work correctly in
      interrupt mode, apparently due to some (not yet exactly
      understood) timing problem.

    <sect1>
      <heading>My programs occasionally die with ``Signal 11'' errors.</heading>

      <p>This can be caused by bad hardware (memory, motherboard, etc.).
      Try running a memory-testing program on your PC.  Note that, even
      though every memory testing program you try will report your
      memory as being fine, it's possible for slightly marginal memory
      to pass all memory tests, yet fail under operating conditions
      (such as during bus mastering DMA from a SCSI controller like the
      Adaptec 1542, when you're beating on memory by compiling a kernel,
      or just when the system's running particularly hot).

      <p>The SIG11 FAQ (listed below) points up slow memory as being the
      most common problem. Increase the number of wait states in your
      BIOS setup, or get faster memory.

      <p>For me the guilty party has been bad cache RAM or a bad on-board
      cache controller. Try disabling the on-board (secondary) cache in
      the BIOS setup and see if that solves the problem.

      <p>There's an extensive FAQ on this at
      <url url="http://www.bitwizard.nl/sig11/" name="the SIG11 problem FAQ">

    <sect1>
      <heading>When I boot, the screen goes black and loses sync!</heading>

      <p>This is a known problem with the ATI Mach 64 video card.
      The problem is that this card uses address <tt/2e8/, and
      the fourth serial port does too. Due to a bug (feature?) in the
      <htmlurl url="http://www.freebsd.org/cgi/man.cgi?sio" name="sio.c">
      driver it will touch this port even if you don't have the
      fourth serial port, and <bf/even/ if you disable sio3 (the fourth
      port) which normally uses this address.

      <p>Until the bug has been fixed, you can use this workaround:

      <enum>
        <item>Enter <tt/-c/ at the bootprompt.  (This will put the kernel
        into configuration mode).

        <item>Disable <tt/sio0/, <tt/sio1/, <tt/sio2/ and <tt/sio3/
        (all of them).  This way the sio driver doesn't get activated
        -> no problems.

        <item>Type exit to continue booting.
      </enum>

      <p>If you want to be able to use your serial ports,
      you'll have to build a new kernel with the following
      modification: in <tt>/usr/src/sys/i386/isa/sio.c</tt> find the
      one occurrence of the string <tt/0x2e8/ and remove that string
      and the preceding comma (keep the trailing comma).  Now follow
      the normal procedure of building a new kernel.

      <p>Even after applying these workarounds, you may still find that
      X Window does not work properly.  Some newer ATI Mach 64 video
      cards (notably ATI Mach Xpression) do not run with the current
      version of <tt/XFree86/; the screen goes black when you start
      X Window, or it works with strange problems. You can get
      a beta-version of a new X-server that works better, by looking at
      <url url="http://www.xfree86.org" name="the XFree86 site">
      and following the links to the new beta release. Get the
      following files:

      <p><tt>AccelCards, BetaReport, Cards, Devices, FILES, README.ati,
      README.FreeBSD, README.Mach64, RELNOTES, VGADriver.Doc,
      X312BMa64.tgz</tt>

      <p>Replace the older files with the new versions and make sure you
      run <htmlurl
      url="http://www.freebsd.org/cgi/man.cgi?manpath=xfree86&amp;query=xf86config"
      name="xf86config"> again.

    <sect1>
      <heading>
        I have 128 MB of RAM but the system only uses 64 MB.
        <label id="reallybigram">
      </heading>

      <p>Due to the manner in which FreeBSD gets the memory size from the
      BIOS, it can only detect 16 bits worth of Kbytes in size (65535
      Kbytes = 64MB) (or less... some BIOSes peg the memory size to 16M).
      If you have more than 64MB, FreeBSD will attempt to detect it;
      however, the attempt may fail.

      <p>To work around this problem, you need to use the
      kernel option specified below. There is a way to get complete
      memory information from the BIOS, but we don't have room in the
      bootblocks to do it. Someday when lack of room in the bootblocks
      is fixed, we'll use the extended BIOS functions to get the full
      memory information...but for now we're stuck with the kernel
      option.

      <tt>
        options "MAXMEM=&lt;n>"
      </tt>

      <p>Where <tt/n/ is your memory in Kilobytes. For a 128 MB machine,
      you'd want to use <tt/131072/.

    <sect1>
      <heading>FreeBSD 2.0 panics with ``kmem_map too small!''</heading>

      <p><tt /Note/ The message may also be ``mb_map too small!''

      <p>The panic indicates that the system ran out of virtual memory for
      network buffers (specifically, mbuf clusters). You can increase
      the amount of VM available for mbuf clusters by adding:

      <p><tt>options "NMBCLUSTERS=&lt;n>"</tt>

      <p>to your kernel config file, where &lt;n&gt; is a number in the
      range 512-4096, depending on the number of concurrent TCP
      connections you need to support. I'd recommend trying 2048 - this
      should get rid of the panic completely. You can monitor the
      number of mbuf clusters allocated/in use on the system with
      <htmlurl url="http://www.freebsd.org/cgi/man.cgi?netstat"
      name="netstat -m">.  The default value for NMBCLUSTERS is
      <tt/512 + MAXUSERS * 16/.

    <sect1>
      <heading>``CMAP busy panic'' when rebooting with a new kernel.</heading>

      <p>The logic that attempts to detect an out of date
      <tt>/var/db/kvm_*.db</tt> files sometimes fails and using a
      mismatched file can sometimes lead to panics.

      <p>If this happens, reboot single-user and do:

      <verb>
        rm /var/db/kvm_*.db
      </verb>

    <sect1>
      <heading>ahc0: brkadrint,  Illegal Host Access at seqaddr 0x0</heading>

      <p>This is a conflict with an Ultrastor SCSI Host Adapter.

      <p>During the boot process enter the kernel configuration menu and
      disable <htmlurl url="http://www.freebsd.org/cgi/man.cgi?uha(4)"
      name="uha0">, which is causing the problem.

    <sect1>
      <heading>Sendmail says ``mail loops back to myself''</heading>

      <p>This is answered in the sendmail FAQ as follows:-

      <verb>
        * I'm getting "Local configuration error" messages, such as:

        553 relay.domain.net config error: mail loops back to myself
        554 <user@domain.net>... Local configuration error

        How can I solve this problem?

        You have asked mail to the domain (e.g., domain.net) to be
        forwarded to a specific host (in this case, relay.domain.net)
        by using an MX record, but the relay machine doesn't recognize
        itself as domain.net.  Add domain.net to /etc/sendmail.cw
        (if you are using FEATURE(use_cw_file)) or add "Cw domain.net"
        to /etc/sendmail.cf.
      </verb>

      <p>The current version of the <url
      url="ftp://rtfm.mit.edu/pub/usenet/news.answers/mail/sendmail-faq"
      name="sendmail FAQ"> is no longer maintained with the sendmail
      release.  It is however regularly posted to
      <url url="news:comp.mail.sendmail" name="comp.mail.sendmail">,
      <url url="news:comp.mail.misc" name="comp.mail.misc">,
      <url url="news:comp.mail.smail" name="comp.mail.smail">,
      <url url="news:comp.answers" name="comp.answers">, and
      <url url="news:news.answers" name="news.answers">.
      You can also receive a copy via email by sending a message to
      <url url="mailto:mail-server@rtfm.mit.edu"
      name="mail-server@rtfm.mit.edu"> with the command "send
      usenet/news.answers/mail/sendmail-faq" as the body of the
      message.

    <sect1>
      <heading>Full screen applications on remote machines misbehave!
      </heading>
      <p>The remote machine may be setting your terminal type
	to something other than the <tt>cons25</tt> terminal type used
	by the FreeBSD console.
      <p>There are a number of work-arounds for this problem:
	<itemize>
	<item>After logging on to the remote machine, set your TERM shell
	  variable to either <tt>ansi</tt> or <tt>sco</tt>.
	<item>Use a VT100 emulator like <htmlurl
	 url="http://www.freebsd.org/cgi/ports.cgi?screen-" name="screen">
	 locally.  <tt>screen</tt> offers you the ability to run
         multiple concurrent sessions from one terminal, and is a neat
	 program in its own right.
	<item>Install the <tt>cons25</tt> terminal database entry on
	  the remote machine.
	<item>Fire up X and login to the remote machine from an
	  <tt>xterm</tt>.
	</itemize>

     <sect1>
       <heading>My machine prints "calcru: negative time..."</heading>
       <p>This can be caused by various hardware and/or software ailments
          relating to interrupts.  It may be due to bugs but can also happen
          by nature of certain devices.  Running TCP/IP over the parallel
          port using a large MTU is one good way to provoke this problem.
          Graphics accelleratorscan also get you here, in which case you
          should check the interrupt setting of the card first.

       <p>A side effect of this problem are dying processes with the
          message "SIGXCPU exceeded cpu time limit".

       <p>For FreeBSD 3.0 and later from Nov 29, 1998 forward: If the
          problem cannot be fixed otherwise the solution is to set
          this sysctl variable:
<verb>
               sysctl -w kern.timecounter.method=1
</verb>
       <p> This means a performance impact, but considering the cause of
           this problem, you probably will not notice.  If the problem
           persists, keep the sysctl set to one and set the "NTIMECOUNTER"
           option in your kernel to increasingly large values.  If by the
           time you have reached "NTIMECOUNTER=20" the problem isn't
           solved, interrupts are too hosed on your machine for reliable
           timekeeping.
  </sect>