Bring in ACPI debugging documentation.

In collaboration with:	njl, Peter Schultz
This commit is contained in:
Tom Rhodes 2004-02-16 22:04:11 +00:00
parent dfb4b35189
commit f6bef84959
Notes: svn2git 2020-12-08 03:00:23 +00:00
svn path=/head/; revision=20054

View file

@ -2251,7 +2251,7 @@ device_probe_and_attach: cbb0 attach returned 12</screen>
BIOS did not have enough room to implement a sophisticated power
policy, or one that can adapt very well to the purpose of the
machine.</para>
<para><emphasis>Plug and Play BIOS (PNPBIOS)</emphasis> was
unreliable in many situations. PNPBIOS is 16-bit technology,
so the OS has to use 16-bit emulation in order to
@ -2293,36 +2293,518 @@ device_probe_and_attach: cbb0 attach returned 12</screen>
<para>The other options are available. Check out the &man.acpiconf.8;
manual page for more information.</para>
</sect2>
</sect1>
<sect2 id="acpi-debug">
<title>Debugging and Disabling <acronym>ACPI</acronym></title>
<sect1 id="ACPI-debug">
<sect1info>
<authorgroup>
<author>
<firstname>Nate</firstname>
<surname>Lawson</surname>
<contrib>Written by </contrib>
</author>
</authorgroup>
<authorgroup>
<author>
<firstname>Peter</firstname>
<surname>Schultz</surname>
<contrib>With contributions from </contrib>
</author>
</authorgroup>
<authorgroup>
<author>
<firstname>Tom</firstname>
<surname>Rhodes</surname>
<contrib>And </contrib>
</author>
</authorgroup>
</sect1info>
<para>Almost everything in <acronym>ACPI</acronym> is transparent, until
it does not work. That is usually when you as a user will know there
is something not working properly. The &man.acpi.4; driver
supports many debugging options, it is even possible to
selectively disable some parts of the <acronym>ACPI</acronym>
system. For more information about debugging facilities, read
the &man.acpi.4; manual page.</para>
<title>Using and Debugging &os; <acronym>ACPI</acronym></title>
<para>Sometimes for various reasons, the
<filename>acpi.ko</filename> module must be unloaded. This
can only be done at boot time by the &man.loader.8;. You can
type at &man.loader.8; prompt the command
<command>unset acpi_load</command> each time you boot the
system, or to stop the autoloading of the
&man.acpi.4; driver add the following line to the
<filename>/boot/loader.conf</filename> file:</para>
<para><acronym>ACPI</acronym> is a fundamentally new way of
discovering devices, managing power usage, and providing
standardized access to various hardware previously managed
by the <acronym>BIOS</acronym>. Progress is being made toward
<acronym>ACPI</acronym> working on all systems, but bugs in some
motherboards' <acronym>AML</acronym> bytecode, incompleteness in
&os;'s kernel subsystems, and bugs in the Intel
<acronym>ACPI-CA</acronym> interpreter continue to appear.</para>
<programlisting>exec="unset acpi_load"</programlisting>
<para>This document is intended to help you assist the &os;
<acronym>ACPI</acronym> maintainers in identifying the culprit
of problems you observe and debugging and developing a solution.
Thanks for reading this and we hope we can solve your system's
problems.</para>
<para>&os;&nbsp;5.1-RELEASE and later come with a boot-time menu
that controls how &os; is booted. One of the proposed options
is to turn off <acronym>ACPI</acronym>. So to disable
<acronym>ACPI</acronym> just select
<guimenuitem>2. Boot &os; with ACPI disabled</guimenuitem>
in the menu.</para>
<sect2 id="ACPI-submitdebug">
<title>Submitting Debugging Information</title>
<para>For those of you that want to submit a problem right away,
please send the following information to
<ulink url="mailto:acpi-jp@jp.freebsd.org">
acpi-jp@jp.freebsd.org</ulink></para>
<itemizedlist>
<listitem>
<para>Description of the buggy behavior, including system type
and model and anything that causes the bug to appear. Also,
please note as accurately as possible when the bug began
occuring if it is new for you.</para>
</listitem>
<listitem>
<para>The dmesg output after <quote>boot
<option>-v</option></quote>, including any error messages
generated by you exercising the bug.</para>
</listitem>
<listitem>
<para>dmesg output from <quote>boot
<option>-v</option></quote> with <acronym>ACPI</acronym>
disabled, if disabling it helps fix the problem.</para>
</listitem>
<listitem>
<para>Output from <quote>sysctl hw.acpi</quote>. This is also
a good way of figuring out what features your system
offers.</para>
</listitem>
<listitem>
<para><acronym>URL</acronym> where your <acronym>ASL</acronym>
can be found. Do <emphasis>not</emphasis> send the
<acronym>ASL</acronym> directly to the list as it can be
very large. Generate a copy of your <acronym>ASL</acronym>
by running this command:</para>
<screen>&prompt.root; <userinput>acpidump -t -d &gt; $NAME-$SYSTEM.asl</userinput></screen>
<para>(Substitute your login name for
<filename>$NAME</filename> and manufacturer/model for
<filename>$SYSTEM</filename>. Example:
<filename>njl-FooCo6000.asl</filename>)</para>
</listitem>
</itemizedlist>
<para>Most of the developers watch the freebsd-current mailing
list but please submit problems to acpi-jp to be sure it is
seen. Please be patient, all of us have full-time jobs
elsewhere. If your bug is not immediately apparent, we will
probably ask you to submit a <acronym>PR</acronym> via
&man.send-pr.1;. When entering a <acronym>PR</acronym>, please
include the same information as requested above. This will help
us track the problem and resolve it. Do not send a
<acronym>PR</acronym> without emailing acpi-jp first as we use
<acronym>PR</acronym>s as reminders of existing problems, not a
reporting mechanism.</para>
</sect2>
<sect2 id="ACPI-background">
<title>Background</title>
<para><acronym>ACPI</acronym> is present in all modern computers
that conform to the ia32 (x86), ia64 (Itanium), and amd64 (AMD)
architectures. The full standard has many features including
<acronym>CPU</acronym> performance management, power planes
control, thermal zones, various battery systems, embedded
controllers, and bus enumeration. Most systems implement less
than the full standard. For instance, a desktop system usually
only implements the bus enumeration parts while a laptop might
have a lot of cooling and battery management support as well.
Laptops also have suspend and resume, with their own associated
complexity.</para>
<para>An <acronym>ACPIM</acronym>-compliant system has various
components. The <acronym>BIOS</acronym> and chipset vendors
provide various fixed tables (e.g., <acronym>FADT</acronym>)
in memory that specify things like the <acronym>APIC</acronym>
map (used for <acronym>SMP</acronym>), config registers, and
simple configuration values. Additionally, a table of bytecode
(the <acronym>DSDT</acronym>) is provided that specifies a
tree-like name space of devices and methods.</para>
<para>The <acronym>ACPI</acronym> driver must parse the fixed
tables, implement an interpreter for the bytecode, and modify
device drivers and the kernel to accept information from the
<acronym>ACPI</acronym> subsystem. For &os;, Intel has
provided an interpreter (<acronym>ACPI-CA</acronym>) that is
shared with Linux and NetBSD. The path to
<acronym>ACPI-CA</acronym> is
<filename>src/sys/contrib/dev/acpica</filename>. The glue code
that allows <acronym>ACPI-CA</acronym> to work on &os; is in
<filename>src/sys/dev/acpica/Osd.</filename> Finally, drivers
that implement various <acronym>ACPI</acronym> devices are found
in
<filename role="directory">src/sys/dev/acpica</filename>.</para>
</sect2>
<sect2 id="ACPI-comprob">
<title>Common Problems</title>
<para>For <acronym>ACPI</acronym> to work correctly, all the parts
have to work correctly. Here are some common problems, in order
of frequency of appearance, and some possible workarounds or
fixes.</para>
<sect3>
<title>Suspend/Resume</title>
<para><acronym>ACPI</acronym> has three suspend to
<acronym>RAM</acronym> (<acronym>STR</acronym>) states,
<literal>S1</literal>-<literal>S3</literal>, and one suspend
to disk state (<literal>STD</literal>), called
<literal>S4</literal>. <literal>S5</literal> is
<quote>soft off</quote> and is the normal state your system
is in when plugged in but not powered up.
<literal>S4</literal> can actually be implemented two separate
ways. <literal>S4</literal><acronym>BIOS</acronym> is a
<acronym>BIOS</acronym>-assisted suspend to disk.
<literal>S4</literal><acronym>OS</acronym> is implemented
entirely by the operating system.</para>
<para>Start by checking <command>sysctl</command>
<option>hw.acpi</option> for the suspend-related items. Here
are the results for my Thinkpad:</para>
<screen>hw.acpi.supported_sleep_state: S3 S4 S5</screen>
<screen>hw.acpi.s4bios: 0</screen>
<para>This means that I can use <quote>acpiconf
<option>-s</option></quote> to test <literal>S3</literal>,
<literal>S4</literal><acronym>OS</acronym>, and
<literal>S5</literal>. If <option>s4bios</option> was one
(1), I would have <literal>S4</literal><acronym>BIOS</acronym>
instead of <literal>S4</literal><acronym>OS</acronym>.</para>
<para>When testing suspend/resume, start with
<literal>S1</literal>, if supported. This state is most
likely to work since it doesn't require much driver support.
No one has implemented <literal>S2</literal> but if you have
it, it's similar to <literal>S1</literal>. The next thing
to try is <literal>S3</literal>. This is the deepest
<acronym>STR</acronym> state and requires a lot of driver
support to properly reinitialize your hardware. If you have
problems resuming, feel free to email the acpi-jp list but
do not expect the problem to be resolved since there are a lot
of drivers/hardware that need more testing and work.</para>
<para>To help isolate the problem, remove as many drivers from
your kernel as possible. If it works, you can narrow down
which driver is the problem by loading drivers until it fails
again. Typically binary drivers like
<filename>nvidia.ko</filename>, <application>X11</application>
display drivers, and <acronym>USB</acronym> will have the most
problems while Ethernet interfaces usually work fine. If you
can load/unload the drivers ok, you can automate this by
putting the appropriate commands in
<filename>/etc/rc.suspend</filename> and
<filename>/etc/rc.resume</filename>. There is a
commented-out example for unloading and loading a driver. Try
setting <option>hw.acpi.reset_video</option> to zero (0) if
your display is messed up after resume. Try setting longer or
shorter values for <option>hw.acpi.sleep_delay</option> to see
if that helps.</para>
<para>Another thing to try is load a recent Linux distribution
with <acronym>ACPI</acronym> support and test their
suspend/resume support on the same hardware. If it works
on Linux, it's likely a &os; driver problem and narrowing down
which driver causes the problems will help us fix the problem.
Note that the <acronym>ACPI</acronym> maintainers do not
usually maintain other drivers (e.g sound,
<acronym>ATA</acronym>, etc.) so any work done on tracking
down a driver problem should probably eventually be posted
to the freebsd-current list and mailed to the driver
maintainer. If you are feeling adventurous, go ahead and
start putting some debugging &man.printf.3;s in a problematic
driver to track down where in its resume function it
hangs.</para>
<para>Finally, try disabling <acronym>ACPI</acronym> and
enabling <acronym>APM</acronym> instead. If suspend/resume
works with <acronym>APM</acronym>, you may be better off
sticking with <acronym>APM</acronym>, especially on older
hardware (pre-2000). It took vendors a while to get
<acronym>ACPI</acronym> support correct and older hardware is
more likely to have <acronym>BIOS</acronym> problems with
<acronym>ACPI</acronym>.</para>
</sect3>
<sect3>
<title>System Hangs (temporary or permanent)</title>
<para>Most system hangs are a result of lost interrupts or an
interrupt storm. Chipsets have a lot of problems based on how
the <acronym>BIOS</acronym> configures interrupts before boot,
correctness of the <acronym>APIC</acronym>
(<acronym>MADT</acronym>) table, and routing of the
<acronym>SCI</acronym>. There are several patches and ad-hoc
workarounds that may help.</para>
<para>Interrupt storms can be distinguished from lost interrupts
by checking the output of <command>vmstat</command>
<option>-i</option> and looking at the line that has
<quote>acpi0</quote>. If the counter is increasing at more
than a couple per second, you have an interrupt storm. If the
system appears hung, try breaking to <acronym>DDB</acronym>
(<keycombo action="simul"><keycap>CTRL</keycap>
<keycap>ALT</keycap><keycap>ESC</keycap></keycombo> on
console) and type <command>show interrupts</command>. Patches
to try if you get an interrupt storm are:</para>
</sect3>
<sect3>
<title>Panics</title>
<para>Panics are relatively rare for <acronym>ACPI</acronym> and
are the top priority to be fixed. The first step is to
isolate the steps to reproduce the panic (if possible)
and get a backtrace. Follow the advice for enabling
<option>options DDB</option> and setting up a serial console
or setting up a &man.dump.8; partition. You can get a
backtrace in <acronym>DDB</acronym> with
<option>tr</option>. If you have to handwrite the
backtrace, be sure to at least get the lowest five (5) and top
five (5) lines in the trace.</para>
<para>Then, try to isolate the problem by booting with
<acronym>ACPI</acronym> disabled. If that works, you can
isolate the <acronym>ACP</acronym>I subsystem by using various
values of <option>debug.acpi.disable</option>. See the
&man.acpi.4; manual page for some examples.</para>
</sect3>
<sect3>
<title>Other Problems</title>
<para>If you have other problems with <acronym>ACPI</acronym>
(working with a docking station, devices not detected, etc.),
please email a description to the mailing list as well;
however, some of these issues may be related to unfinished
parts of the <acronym>ACPI</acronym> subsystem so they might
take a while to be implemented. Please be patient and
prepared to test patches we may send you.</para>
</sect3>
</sect2>
<sect2 id="ACPI-aslanddump">
<title><acronym>ASL</acronym>, <command>acpidump</command>, and
<acronym>IASL</acronym></title>
<para>The most common problem is the <acronym>BIOS</acronym>
vendors providing incorrect (or outright buggy!) bytecode. This
is usually manifested by kernel console messages like
this:</para>
<screen>ACPI-1287: *** Error: Method execution failed</screen>
<screen>[\\_SB_.PCI0.LPC0.FIGD._STA] (Node 0xc3f6d160), AE_NOT_FOUND</screen>
<para>Often, you can resolve these problems by updating your
<acronym>BIOS</acronym> to the latest revision. Most console
messages are harmless but if you have other problems like
battery status not working, they're a good place to start
looking for problems in the <acronym>AML</acronym>. The
bytecode, known as <acronym>AML</acronym>, is compiled from a
source language called <acronym>ASL</acronym>. The
<acronym>AML</acronym> is found in the table known as the
<acronym>DSDT</acronym>. To get a copy of your
<acronym>ASL</acronym>, use &man.acpidump.8;. You should use
both the <option>-t</option> (show contents of the fixed tables)
and <option>-d</option> (disassemble <acronym>AML</acronym> to
<acronym>ASL</acronym>) options. See the
<link linkend="ACPI-submitdebug">Submitting Debugging
Information</link> section for an example syntax.</para>
<para>The simplest first check you can do is to recompile your
<acronym>ASL</acronym> to check for errors. Warnings can
usually be ignored but errors are bugs that will usually prevent
<acronym>ACPI</acronym> from working correctly. To recompile
your <acronym>ASL</acronym>, issue the following command:</para>
<screen>&prompt.root; <userinput>iasl your.asl</userinput></screen>
</sect2>
<sect2 id="ACPI-fixasl">
<title>Fixing Your <acronym>ASL</acronym></title>
<para>In the long run, our goal is for almost everyone to have
<acronym>ACPI</acronym> work without any user intervention. At
this point, however, we are still developing workarounds for
common mistakes made by the <acronym>BIOS</acronym> vendors.
The Microsoft interpreter (<filename>acpi.sys</filename> and
<filename>acpiec.sys</filename>) does not strictly check for
adherence to the standard, and thus many <acronym>BIOS</acronym>
vendors who only test <acronym>ACPI</acronym> under Windows
never fix their <acronym>ASL</acronym>. We hope to continue to
identify and document exactly what non-standard behavior is
allowed by Microsoft's interpreter and replicate it so &os; can
work without forcing users to fix the <acronym>ASL</acronym>.
As a workaround and to help us identify behavior, you can fix
the <acronym>ASL</acronym> manually. If this works for you,
please send a &man.diff.1; of the old and new
<acronym>ASL</acronym> so we can possibly work around the buggy
behavior in <acronym>ACPI-CA</acronym> and thus make your fix
unnecessary.</para>
<para>Here is a list of common error messages, their cause, and
how to fix them:</para>
<sect3>
<title>_OS dependencies</title>
<para>Some <acronym>AML</acronym> assumes the world consists of
various Windows versions. You can tell &os; to claim it is
any <acronym>OS</acronym> to see if this fixes problems you
may have. An easy way to override this is to set
<option>hw.acpi.os_name</option>=<quote>Windows 2001</quote>
in <filename>/boot/loader.conf</filename> or other similar
strings you find in the <acronym>ASL</acronym>. You could
also manually change references to <literal>_OS</literal> or
<literal>_OS_</literal> (they are the same thing) to check for
<quote>&os;</quote>. For example:</para>
<programlisting>If (MCTH (\_OS, "Microsoft Windows NT"))
{
Return (PIC1)
}
Else
{
Return (PIC0)
}</programlisting>
<para>Modified to look for &os;:</para>
<programlisting>If (MCTH (\_OS, "FreeBSD"))
{
Return (PIC1)
}
Else
{
Return (PIC0)
}</programlisting>
<sect3>
<title>Missing Return statements</title>
<para>Explaination of these things.</para>
</sect3>
<sect3>
<title>Overriding the Default <acronym>AML</acronym></title>
<para>After you customize <filename>your.asl</filename>, it
will need rebuilt, run:</para>
<screen>&prompt.root; <userinput>iasl your.asl</userinput></screen>
<para>You can add the <option>-f</option> flag to force creation
of the <acronym>AML</acronym>, even if there are errors during
compilation. Remember that some errors (e.g., missing Return
statements) are automatically worked around by the
interpreter.</para>
<para><filename>DSDT.aml</filename> is the default output
filename for <command>iasl</command>. You can load this
instead of your <acronym>BIOS</acronym>'s buggy copy (which
is still present in flash memory) by editing
<filename>/boot/loader.conf</filename> as follows:</para>
<programlisting>acpi_dsdt_load="YES"
acpi_dsdt_name="/boot/DSDT.aml"</programlisting>
<para>Be sure to copy your <filename>DSDT.aml</filename> to the
<filename>/boot</filename> directory.</para>
</sect3>
<sect2 id="ACPI-debugoutput">
<title>Getting Debugging Output From
<acronym>ACPI</acronym></title>
<para>The <acronym>ACPI</acronym> driver has a very flexible
debuging facility. It allows you to specify a set of subsystems
as well as the level of verbosity. The subsystems you wish to
debug are specified as <quote>layers</quote> and are broken down
into <acronym>ACPI-CA</acronym> components (ACPI_ALL_COMPONENTS)
and <acronym>ACPI</acronym> hardware support (ACPI_ALL_DRIVERS).
The verbosity of debugging output is specified as the
<quote>level</quote> and ranges from ACPI_LV_ERROR (just report
errors) to ACPI_LV_VERBOSE (everything). In practice, you will
want to use a serial console to log the output if it is so long
it flushes the console message buffer. A full list of the
individual layers and levels is found in the &man.acpi.4; manual
page.</para>
<para>Debugging output is not enabled by default. To enable it,
add <option>options ACPI_DEBUG</option> to your kernel config
if <acronym>ACPI</acronym> is compiled into the kernel. You can
add <option>ACPI_DEBUG=1</option> to your
<filename>/etc/make.conf</filename> to enable it globally. If
it is a module, you can recompile just your
<filename>acpi.ko</filename> module as follows:</para>
<screen>&prompt.root; <userinput>cd /sys/modules/acpi/acpi
&amp;&amp; make clean &amp;&amp; make
ACPI_DEBUG=1</userinput></screen>
<para>Install <filename>acpi.ko</filename> in
<filename>/boot/kernel</filename> and add your desired level and
layer to <filename>loader.conf</filename>. This example enables
debug messages for all <acronym>ACPI-CA</acronym> components and
all <acronym>ACPI</acronym> hardware drivers
(<acronym>CPU</acronym>, <acronym>LID</acronym>, etc.) It will
only output error messages, the least verbose level.</para>
<programlisting>debug.acpi.layer="ACPI_ALL_COMPONENTS ACPI_ALL_DRIVERS"
debug.acpi.level="ACPI_LV_ERROR"</programlisting>
<para>If the information you want is triggered by a specific event
(say, a suspend and then resume), you can leave out changes to
<filename>loader.conf</filename> and instead use
<command>sysctl</command> to specify the layer and level after
booting and preparing your system for the specific event. The
<command>sysctl</command>s are named the same as the tunables
in <filename>loader.conf</filename>.</para>
</sect2>
<sect2 id="ACPI-References">
<title>References</title>
<para>More information about <acronym>ACPI</acronym> may be found
in the following locations:</para>
<itemizedlist>
<listitem>
<para>The <acronym>ACPI</acronym> Mailing List
<ulink url="mailto:acpi-jp@jp.freebsd.org">
acpi-jp@jp.freebsd.org</ulink></para>
</listitem>
<listitem>
<para>The <acronym>ACPI</acronym> Mailing List Archives
<ulink url="http://home.jp.freebsd.org/mail-list/acpi-jp/">
http://home.jp.freebsd.org/mail-list/acpi-jp/</ulink></para>
</listitem>
<listitem>
<para>The <acronym>ACPI</acronym> 2.0 Specification
<ulink url="http://acpi.info/spec.htm/">
http://acpi.info/spec.htm/</ulink></para>
</listitem>
<listitem>
<para>&os; Manual pages: &man.acpi.4;,
&man.acpi.thermal.4;, &man.acpidump.8;, &man.iasl.8;,
&man.acpidb.8;</para>
</listitem>
<listitem>
<para><ulink
url="http://www.cpqlinux.com/acpi-howto.html#fix_broken_dsdt">
<acronym>DSDT</acronym> debugging resource</ulink>.
(Uses Compaq as an example but useful.)</para>
</listitem>
</itemizedlist>
</sect2>
</sect1>
</chapter>