doc/FAQ/troubleshoot.sgml
Doug White 9f73002a0c Add questions for:
1.  Adding mre pty's
2.  Solutions to the 'calcru' problems under -current
    Submitted by:	Poul-Henning Kamp <phk@FreeBSD.ORG>
1998-12-05 00:24:11 +00:00

502 lines
22 KiB
Text

<!-- $Id: troubleshoot.sgml,v 1.9 1998-12-05 00:24:11 dwhite Exp $ -->
<!-- The FreeBSD Documentation Project -->
<sect>
<heading>Troubleshooting<label id="troubleshoot"></heading>
<sect1>
<heading>I have bad blocks on my hard drive!<label id="awre"></heading>
<p>With SCSI drives, the drive should be capable of re-mapping
these automatically. However, many drives are shipped with
this feature disabled, for some mysterious reason...
<p>To enable this, you'll need to edit the first device page mode,
which can be done on FreeBSD by giving the command (as root)
<verb>
scsi -f /dev/rsd0c -m 1 -e -P 3
</verb>
<p>and changing the values of AWRE and ARRE from 0 to 1:-
<verb>
AWRE (Auto Write Reallocation Enbld): 1
ARRE (Auto Read Reallocation Enbld): 1
</verb>
<p>The following paragraphs were submitted by
<url url="mailto:tedm@toybox.placo.com" name="Ted Mittelstaedt">:
<p>For IDE drives, any bad block is usually a sign of potential trouble.
All modern IDE drives come with internal bad-block remapping turned
on. All IDE hard drive manufacturers today offer extensive
warranties and will replace drives with bad blocks on them.
<p>If you still want to attempt to rescue an IDE drive with bad blocks,
you can attempt to download the IDE drive manufacturer's IDE diagnostic
program, and run this against the drive. Sometimes these programs can
be set to force the drive electronics to rescan the drive for bad blocks
and lock them out.
<p>For ESDI, RLL and MFM drives, bad blocks are a normal part of the
drive and are no sign of trouble, generally. With a PC, the disk drive
controller card and BIOS handle the task of locking out bad sectors.
This is fine for operating systems like DOS that use BIOS code to
access the disk. However, FreeBSD's disk driver does not go through
BIOS, therefore a mechanism, bad144, exists that replaces this
functionality. bad144 only works with the wd driver,
it is NOT able to be used with SCSI. bad144 works by entering all bad
sectors found into a special file.
<p>One caveat with bad144 - the bad block special file is placed on the
last track of the disk. As this file may possibly contain a listing for
a bad sector that would occur near the beginning of the disk, where the
/kernel file might be located, it therefore must be accessible to the
bootstrap program that uses BIOS calls to read the kernel file. This
means that the disk with bad144 used on it must not exceed 1024
cylinders, 16 heads, and 63 sectors. This places an effective limit
of 500MB on a disk that is mapped with bad144.
<p>To use bad144, simply set the "Bad Block" scanning to ON in the
FreeBSD fdisk screen during the initial install. This works up through
FreeBSD 2.2.7. The disk must have less than 1024 cylinders. It is
generally recommended that the disk drive has been in operation for at
least 4 hours prior to this to allow for thermal expansion and track
wandering.
<p>If the disk has more than 1024 cylinders (such as a large ESDI drive)
the ESDI controller uses a special translation mode to make it work
under DOS. The wd driver understands about these translation modes,
IF you enter the "translated" geometry with the "set geometry" command
in fdisk. You must also NOT use the "dangerously dedicated" mode of
creating the FreeBSD partition, as this ignores the geometry. Also,
even though fdisk will use your overridden geometry, it still knows the
true size of the disk, and will attempt to create a too large FreeBSD
partition. If the disk geometry is changed to the translated geometry,
the partition MUST be manually created with the number of blocks.
<p>A quick trick to use is to set up the large ESDI disk with the ESDI
controller, boot it with a DOS disk and format it with a DOS partition.
Then, boot the FreeBSD install and in the fdisk screen, read off and
write down the blocksize and block numbers for the DOS partition. Then,
reset the geometry to the same that DOS uses, delete the DOS partition,
and create a "cooperative" FreeBSD partition using the blocksize you
recorded earlier. Then, set the partition bootable and turn on bad
block scanning. During the actual install, bad144 will run first,
before any filesystems are created. (you can view this with an Alt-F2)
If it has any trouble creating the badsector file, you have set too
large a disk geometry - reboot the system and start all over again
(including repartitioning and reformatting with DOS).
<p>If remapping is enabled and you are seeing bad blocks, consider
replacing the drive. The bad blocks will only get worse as time goes on.
<sect1>
<heading>FreeBSD does not recognize my Bustek 742a EISA SCSI!</heading>
<p>This info is specific to the 742a but may also cover other
Buslogic cards. (Bustek = Buslogic)
<p>There are 2 general ``versions'' of the 742a card. They are
hardware revisions A-G, and revisions H - onwards. The revision
letter is located after the Assembly number on the edge of the
card. The 742a has 2 ROM chips on it, one is the BIOS chip and
the other is the Firmware chip. FreeBSD doesn't care what
version of BIOS chip you have but it does care about what version
of firmware chip. Buslogic will send upgrade ROMS out if you
call their tech support dept. The BIOS and Firmware chips are
shipped as a matched pair. You must have the most current
Firmware ROM in your adapter card for your hardware revision.
<p>The REV A-G cards can only accept BIOS/Firmware sets up to
2.41/2.21. The REV H- up cards can accept the most current
BIOS/Firmware sets of 4.70/3.37. The difference between the
firmware sets is that the 3.37 firmware supports ``round robin''
<p>The Buslogic cards also have a serial number on them. If you
have a old hardware revision card you can call the Buslogic RMA
department and give them the serial number and attempt to
exchange the card for a newer hardware revision. If the card is
young enough they will do so.
<p>FreeBSD 2.1 only supports Firmware revisions 2.21 onward. If you
have a Firmware revision older than this your card will not be
recognized as a Buslogic card. It may be recognized as an
Adaptec 1540, however. The early Buslogic firmware contains an
AHA1540 ``emulation'' mode. This is not a good thing for an EISA
card, however.
<p>If you have an old hardware revision card and you obtain the 2.21
firmware for it, you will need to check the position of jumper W1
to B-C, the default is A-B.
<p>The 742a EISA cards never had the ``&gt;16MB'' problem mentioned in
the section <ref id="bigram" name="on &gt;16 MB machines">. This is a
problem that occurs with the Vesa-Local Buslogic SCSI cards.
<sect1>
<heading>
My HP Netserver's SCSI controller is not detected!
</heading>
<p>This is basically a known problem. The EISA on-board SCSI controller
in the HP Netserver machines occupies EISA slot number 11, so all
the ``true'' EISA slots are in front of it. Alas, the address space
for EISA slots >= 10 collides with the address space assigned to PCI,
and FreeBSD's auto-configuration currently cannot handle this
situation very well.
<p>So now, the best you can do is to pretend there is no address
range clash :), by bumping the kernel option <tt/EISA_SLOTS/
to a value of 12.
Configure and compile a kernel, as described in the
<url url="../handbook/kernelconfig.html"
name="Handbook entry on configuring the kernel">.
<p>Of course, this does present you with a chicken-and-egg problem when
installing on such a machine. In order to work around this
problem, a special hack is available inside <em>UserConfig</em>.
Do not use the ``visual'' interface, but the plain command-line
interface there. Simply type
<verb>
eisa 12
quit
</verb>
<p>at the prompt, and install your system as usual. While it's
recommended you compile and install a custom kernel anyway,
<htmlurl url="http://www.freebsd.org/cgi/man.cgi?dset" name="dset">
now also understands to save this value.
<p>Hopefully, future versions will have a proper fix for this problem.
<p><tt/NOTE:/ You can not use a <bf/dangerously dedicated/ disk with
an HP Netserver. See <ref id="dedicate" name="this note"> for
more info.
<sect1>
<heading>What's up with this CMD640 IDE controller?</heading>
<p>It's broken. It cannot handle commands on both channels
simultaneously.
<p>There's a workaround available now and it is enabled automatically
if your system uses this chip. For the details refer to the
manual page of the disk driver (man 4 wd).
<p>If you're already running FreeBSD 2.2.1 or 2.2.2 with a
CMD640 IDE controller and you want to use the second channel,
build a new kernel with <tt/options "CMD640"/ enabled. This
is the default for 2.2.5 and later.
<sect1>
<heading>I keep seeing messages like ``<tt/ed1: timeout/''.</heading>
<p>This is usually caused by an interrupt conflict (e.g., two boards
using the same IRQ). FreeBSD prior to 2.0.5R used to be tolerant
of this, and the network driver would still function in the
presence of IRQ conflicts. However, with 2.0.5R and later, IRQ
conflicts are no longer tolerated. Boot with the -c option and
change the ed0/de0/... entry to match your board.
<p>If you're using the BNC connector on your network card, you may
also see device timeouts because of bad termination. To check this,
attach a terminator directly to the NIC (with no cable) and see if
the error messages go away.
<p>Some NE2000 compatible cards will give this error if there is
no link on the UTP port or if the cable is disconnected.
<sect1>
<heading>When I mount a CDROM, I get ``Incorrect super block''.</heading>
<p>You have to tell
<htmlurl url="http://www.freebsd.org/cgi/man.cgi?mount" name="mount">
the type of the device that you want to mount. By default,
<htmlurl url="http://www.freebsd.org/cgi/man.cgi?mount" name="mount">
will assume the filesystem is of type ``<tt/ufs/''. You want to mount
a CDROM filesystem, and you do this by specifying the ``<tt/-t cd9660/''
option to <htmlurl url="http://www.freebsd.org/cgi/man.cgi?mount"
name="mount">. This does, of course, assume that the
CDROM contains an ISO 9660 filesystem, which is what most CDROMs
have. As of 1.1R, FreeBSD automatically understands the Rock Ridge
(long filename) extensions as well.
<p>As an example, if you want to mount the CDROM device,
``<tt>/dev/cd0c</tt>'', under <tt>/mnt</tt>, you would execute:
<verb>
mount -t cd9660 /dev/cd0c /mnt
</verb>
<p>Note that your device name (``<tt>/dev/cd0c</tt>'' in this
example) could be different, depending on the CDROM interface.
Note that the ``<tt/-t cd9660/'' option just causes the
``<tt/mount&lowbar;cd9660/'' command to be executed, and so the
above example could be shortened to:
<verb>
mount_cd9660 /dev/cd0c /mnt
</verb>
<sect1>
<heading>When I mount a CDROM, I get ``Device not configured''.</heading>
<p>This generally means that there is no CDROM in the CDROM drive,
or the drive is not visible on the bus. Feed the drive
something, and/or check its master/slave status if it is
IDE (ATAPI). It can take a couple of seconds for a CDROM drive
to notice that it's been fed, so be patient.
<p>Sometimes a SCSI CD-ROM may be missed because it hadn't enough time
to answer the bus reset. If you have a SCSI CD-ROM please try to
add the following symbol into your kernel configuration file
and recompile.
<verb>
options "SCSI_DELAY=15"
</verb>
<sect1>
<heading>My printer is ridiculously slow. What can I do ?</heading>
<p>If it's parallel, and the only problem is that it's terribly
slow, try setting your printer port into ``polled'' mode:
<verb>
lptcontrol -p
</verb>
<p>Some newer HP printers are claimed not to work correctly in
interrupt mode, apparently due to some (not yet exactly
understood) timing problem.
<sect1>
<heading>My programs occasionally die with ``Signal 11'' errors.</heading>
<p>This can be caused by bad hardware (memory, motherboard, etc.).
Try running a memory-testing program on your PC. Note that, even
though every memory testing program you try will report your
memory as being fine, it's possible for slightly marginal memory
to pass all memory tests, yet fail under operating conditions
(such as during bus mastering DMA from a SCSI controller like the
Adaptec 1542, when you're beating on memory by compiling a kernel,
or just when the system's running particularly hot).
<p>The SIG11 FAQ (listed below) points up slow memory as being the
most common problem. Increase the number of wait states in your
BIOS setup, or get faster memory.
<p>For me the guilty party has been bad cache RAM or a bad on-board
cache controller. Try disabling the on-board (secondary) cache in
the BIOS setup and see if that solves the problem.
<p>There's an extensive FAQ on this at
<url url="http://www.bitwizard.nl/sig11/" name="the SIG11 problem FAQ">
<sect1>
<heading>When I boot, the screen goes black and loses sync!</heading>
<p>This is a known problem with the ATI Mach 64 video card.
The problem is that this card uses address <tt/2e8/, and
the fourth serial port does too. Due to a bug (feature?) in the
<htmlurl url="http://www.freebsd.org/cgi/man.cgi?sio" name="sio.c">
driver it will touch this port even if you don't have the
fourth serial port, and <bf/even/ if you disable sio3 (the fourth
port) which normally uses this address.
<p>Until the bug has been fixed, you can use this workaround:
<enum>
<item>Enter <tt/-c/ at the bootprompt. (This will put the kernel
into configuration mode).
<item>Disable <tt/sio0/, <tt/sio1/, <tt/sio2/ and <tt/sio3/
(all of them). This way the sio driver doesn't get activated
-> no problems.
<item>Type exit to continue booting.
</enum>
<p>If you want to be able to use your serial ports,
you'll have to build a new kernel with the following
modification: in <tt>/usr/src/sys/i386/isa/sio.c</tt> find the
one occurrence of the string <tt/0x2e8/ and remove that string
and the preceding comma (keep the trailing comma). Now follow
the normal procedure of building a new kernel.
<p>Even after applying these workarounds, you may still find that
X Window does not work properly. Some newer ATI Mach 64 video
cards (notably ATI Mach Xpression) do not run with the current
version of <tt/XFree86/; the screen goes black when you start
X Window, or it works with strange problems. You can get
a beta-version of a new X-server that works better, by looking at
<url url="http://www.xfree86.org" name="the XFree86 site">
and following the links to the new beta release. Get the
following files:
<p><tt>AccelCards, BetaReport, Cards, Devices, FILES, README.ati,
README.FreeBSD, README.Mach64, RELNOTES, VGADriver.Doc,
X312BMa64.tgz</tt>
<p>Replace the older files with the new versions and make sure you
run <htmlurl
url="http://www.freebsd.org/cgi/man.cgi?manpath=xfree86&amp;query=xf86config"
name="xf86config"> again.
<sect1>
<heading>
I have 128 MB of RAM but the system only uses 64 MB.
<label id="reallybigram">
</heading>
<p>Due to the manner in which FreeBSD gets the memory size from the
BIOS, it can only detect 16 bits worth of Kbytes in size (65535
Kbytes = 64MB) (or less... some BIOSes peg the memory size to 16M).
If you have more than 64MB, FreeBSD will attempt to detect it;
however, the attempt may fail.
<p>To work around this problem, you need to use the
kernel option specified below. There is a way to get complete
memory information from the BIOS, but we don't have room in the
bootblocks to do it. Someday when lack of room in the bootblocks
is fixed, we'll use the extended BIOS functions to get the full
memory information...but for now we're stuck with the kernel
option.
<tt>
options "MAXMEM=&lt;n>"
</tt>
<p>Where <tt/n/ is your memory in Kilobytes. For a 128 MB machine,
you'd want to use <tt/131072/.
<sect1>
<heading>FreeBSD 2.0 panics with ``kmem_map too small!''</heading>
<p><tt /Note/ The message may also be ``mb_map too small!''
<p>The panic indicates that the system ran out of virtual memory for
network buffers (specifically, mbuf clusters). You can increase
the amount of VM available for mbuf clusters by adding:
<p><tt>options "NMBCLUSTERS=&lt;n>"</tt>
<p>to your kernel config file, where &lt;n&gt; is a number in the
range 512-4096, depending on the number of concurrent TCP
connections you need to support. I'd recommend trying 2048 - this
should get rid of the panic completely. You can monitor the
number of mbuf clusters allocated/in use on the system with
<htmlurl url="http://www.freebsd.org/cgi/man.cgi?netstat"
name="netstat -m">. The default value for NMBCLUSTERS is
<tt/512 + MAXUSERS * 16/.
<sect1>
<heading>``CMAP busy panic'' when rebooting with a new kernel.</heading>
<p>The logic that attempts to detect an out of date
<tt>/var/db/kvm_*.db</tt> files sometimes fails and using a
mismatched file can sometimes lead to panics.
<p>If this happens, reboot single-user and do:
<verb>
rm /var/db/kvm_*.db
</verb>
<sect1>
<heading>ahc0: brkadrint, Illegal Host Access at seqaddr 0x0</heading>
<p>This is a conflict with an Ultrastor SCSI Host Adapter.
<p>During the boot process enter the kernel configuration menu and
disable <htmlurl url="http://www.freebsd.org/cgi/man.cgi?uha(4)"
name="uha0">, which is causing the problem.
<sect1>
<heading>Sendmail says ``mail loops back to myself''</heading>
<p>This is answered in the sendmail FAQ as follows:-
<verb>
* I'm getting "Local configuration error" messages, such as:
553 relay.domain.net config error: mail loops back to myself
554 <user@domain.net>... Local configuration error
How can I solve this problem?
You have asked mail to the domain (e.g., domain.net) to be
forwarded to a specific host (in this case, relay.domain.net)
by using an MX record, but the relay machine doesn't recognize
itself as domain.net. Add domain.net to /etc/sendmail.cw
(if you are using FEATURE(use_cw_file)) or add "Cw domain.net"
to /etc/sendmail.cf.
</verb>
<p>The current version of the <url
url="ftp://rtfm.mit.edu/pub/usenet/news.answers/mail/sendmail-faq"
name="sendmail FAQ"> is no longer maintained with the sendmail
release. It is however regularly posted to
<url url="news:comp.mail.sendmail" name="comp.mail.sendmail">,
<url url="news:comp.mail.misc" name="comp.mail.misc">,
<url url="news:comp.mail.smail" name="comp.mail.smail">,
<url url="news:comp.answers" name="comp.answers">, and
<url url="news:news.answers" name="news.answers">.
You can also receive a copy via email by sending a message to
<url url="mailto:mail-server@rtfm.mit.edu"
name="mail-server@rtfm.mit.edu"> with the command "send
usenet/news.answers/mail/sendmail-faq" as the body of the
message.
<sect1>
<heading>Full screen applications on remote machines misbehave!
</heading>
<p>The remote machine may be setting your terminal type
to something other than the <tt>cons25</tt> terminal type used
by the FreeBSD console.
<p>There are a number of work-arounds for this problem:
<itemize>
<item>After logging on to the remote machine, set your TERM shell
variable to either <tt>ansi</tt> or <tt>sco</tt>.
<item>Use a VT100 emulator like <htmlurl
url="http://www.freebsd.org/cgi/ports.cgi?screen-" name="screen">
locally. <tt>screen</tt> offers you the ability to run
multiple concurrent sessions from one terminal, and is a neat
program in its own right.
<item>Install the <tt>cons25</tt> terminal database entry on
the remote machine.
<item>Fire up X and login to the remote machine from an
<tt>xterm</tt>.
</itemize>
<sect1>
<heading>My machine prints "calcru: negative time..."</heading>
<p>This can be caused by various hardware and/or software ailments
relating to interrupts. It may be due to bugs but can also happen
by nature of certain devices. Running TCP/IP over the parallel
port using a large MTU is one good way to provoke this problem.
Graphics accelleratorscan also get you here, in which case you
should check the interrupt setting of the card first.
<p>A side effect of this problem are dying processes with the
message "SIGXCPU exceeded cpu time limit".
<p>For FreeBSD 3.0 and later from Nov 29, 1998 forward: If the
problem cannot be fixed otherwise the solution is to set
this sysctl variable:
<verb>
sysctl -w kern.timecounter.method=1
</verb>
<p> This means a performance impact, but considering the cause of
this problem, you probably will not notice. If the problem
persists, keep the sysctl set to one and set the "NTIMECOUNTER"
option in your kernel to increasingly large values. If by the
time you have reached "NTIMECOUNTER=20" the problem isn't
solved, interrupts are too hosed on your machine for reliable
timekeeping.
</sect>