<?xml version="1.0" encoding="ISO8859-1" standalone="no"?> <!-- The FreeBSD Documentation Project $FreeBSD$ --> <chapter id="config-tuning"> <chapterinfo> <authorgroup> <author> <firstname>Chern</firstname> <surname>Lee</surname> <contrib>Written by </contrib> </author> </authorgroup> <authorgroup> <author> <firstname>Mike</firstname> <surname>Smith</surname> <contrib>Based on a tutorial written by </contrib> </author> </authorgroup> <authorgroup> <author> <firstname>Matt</firstname> <surname>Dillon</surname> <contrib>Also based on tuning(7) written by </contrib> </author> </authorgroup> </chapterinfo> <title>Configuration and Tuning</title> <sect1 id="config-synopsis"> <title>Synopsis</title> <indexterm><primary>system configuration</primary></indexterm> <indexterm><primary>system optimization</primary></indexterm> <para>One of the important aspects of &os; is system configuration. Correct system configuration will help prevent headaches during future upgrades. This chapter will explain much of the &os; configuration process, including some of the parameters which can be set to tune a &os; system.</para> <para>After reading this chapter, you will know:</para> <itemizedlist> <listitem> <para>How to efficiently work with file systems and swap partitions.</para> </listitem> <listitem> <para>The basics of <filename>rc.conf</filename> configuration and <filename class="directory">/usr/local/etc/rc.d</filename> startup systems.</para> </listitem> <listitem> <para>How to configure and test a network card.</para> </listitem> <listitem> <para>How to configure virtual hosts on your network devices.</para> </listitem> <listitem> <para>How to use the various configuration files in <filename class="directory">/etc</filename>.</para> </listitem> <listitem> <para>How to tune &os; using <command>sysctl</command> variables.</para> </listitem> <listitem> <para>How to tune disk performance and modify kernel limitations.</para> </listitem> </itemizedlist> <para>Before reading this chapter, you should:</para> <itemizedlist> <listitem> <para>Understand &unix; and &os; basics (<xref linkend="basics"/>).</para> </listitem> <listitem> <para>Be familiar with the basics of kernel configuration/compilation (<xref linkend="kernelconfig"/>).</para> </listitem> </itemizedlist> </sect1> <sect1 id="configtuning-initial"> <title>Initial Configuration</title> <sect2> <title>Partition Layout</title> <indexterm><primary>partition layout</primary></indexterm> <indexterm> <primary><filename class="directory">/etc</filename></primary> </indexterm> <indexterm> <primary><filename class="directory">/var</filename></primary> </indexterm> <indexterm> <primary><filename class="directory">/usr</filename></primary> </indexterm> <sect3> <title>Base Partitions</title> <para>When laying out file systems with &man.bsdlabel.8; or &man.sysinstall.8;, remember that hard drives transfer data faster from the outer tracks to the inner. Thus smaller and heavier-accessed file systems should be closer to the outside of the drive, while larger partitions like <filename class="directory">/usr</filename> should be placed toward the inner parts of the disk. It is a good idea to create partitions in an order similar to: root, swap, <filename class="directory">/var</filename>, <filename class="directory">/usr</filename>.</para> <para>The size of the <filename class="directory">/var</filename> partition reflects the intended machine usage. The <filename class="directory">/var</filename> file system is used to hold mailboxes, log files, and printer spools. Mailboxes and log files can grow to unexpected sizes depending on how many users exist and how long log files are kept. Most users will rarely need more than about a gigabyte of free disk space in <filename class="directory">/var</filename>.</para> <note> <para>There are a few times that a lot of disk space is required in <filename class="directory">/var/tmp</filename>. When new software is installed with &man.pkg.add.1; the packaging tools extract a temporary copy of the packages under <filename class="directory">/var/tmp</filename>. Large software packages, like <application>Firefox</application>, <application>OpenOffice</application> or <application>LibreOffice</application> may be tricky to install if there is not enough disk space under <filename class="directory">/var/tmp</filename>.</para> </note> <para>The <filename class="directory">/usr</filename> partition holds many of the files required to support the system, including the &man.ports.7; collection (recommended) and the source code (optional). Both the ports and the sources of the base system are optional at install time, but we recommend at least 2 gigabytes for this partition.</para> <para>When selecting partition sizes, keep the space requirements in mind. Running out of space in one partition while barely using another can be a hassle.</para> <note> <para>Some users have found that &man.sysinstall.8;'s <literal>Auto-defaults</literal> partition sizer will sometimes select smaller than adequate <filename class="directory">/var</filename> and <filename class="directory">/</filename> partitions. Partition wisely and generously.</para> </note> </sect3> <sect3 id="swap-design"> <title>Swap Partition</title> <indexterm><primary>swap sizing</primary></indexterm> <indexterm><primary>swap partition</primary></indexterm> <para>As a rule of thumb, the swap partition should be about double the size of system memory (RAM). For example, if the machine has 128 megabytes of memory, the swap file should be 256 megabytes. Systems with less memory may perform better with more swap. Less than 256 megabytes of swap is not recommended and memory expansion should be considered. The kernel's VM paging algorithms are tuned to perform best when the swap partition is at least two times the size of main memory. Configuring too little swap can lead to inefficiencies in the VM page scanning code and might create issues later if more memory is added.</para> <para>On larger systems with multiple SCSI disks (or multiple IDE disks operating on different controllers), it is recommend that a swap is configured on each drive (up to four drives). The swap partitions should be approximately the same size. The kernel can handle arbitrary sizes but internal data structures scale to 4 times the largest swap partition. Keeping the swap partitions near the same size will allow the kernel to optimally stripe swap space across disks. Large swap sizes are fine, even if swap is not used much. It might be easier to recover from a runaway program before being forced to reboot.</para> </sect3> <sect3> <title>Why Partition?</title> <para>Several users think a single large partition will be fine, but there are several reasons why this is a bad idea. First, each partition has different operational characteristics and separating them allows the file system to tune accordingly. For example, the root and <filename class="directory">/usr</filename> partitions are read-mostly, without much writing. While a lot of reading and writing could occur in <filename class="directory">/var</filename> and <filename class="directory">/var/tmp</filename>.</para> <para>By properly partitioning a system, fragmentation introduced in the smaller write heavy partitions will not bleed over into the mostly-read partitions. Keeping the write-loaded partitions closer to the disk's edge, will increase I/O performance in the partitions where it occurs the most. Now while I/O performance in the larger partitions may be needed, shifting them more toward the edge of the disk will not lead to a significant performance improvement over moving <filename class="directory">/var</filename> to the edge. Finally, there are safety concerns. A smaller, neater root partition which is mostly read-only has a greater chance of surviving a bad crash.</para> </sect3> </sect2> </sect1> <sect1 id="configtuning-core-configuration"> <title>Core Configuration</title> <indexterm> <primary>rc files</primary> <secondary><filename>rc.conf</filename></secondary> </indexterm> <para>The principal location for system configuration information is within <filename>/etc/rc.conf</filename>. This file contains a wide range of configuration information, principally used at system startup to configure the system. Its name directly implies this; it is configuration information for the <filename>rc*</filename> files.</para> <para>An administrator should make entries in the <filename>rc.conf</filename> file to override the default settings from <filename>/etc/defaults/rc.conf</filename>. The defaults file should not be copied verbatim to <filename class="directory">/etc</filename> - it contains default values, not examples. All system-specific changes should be made in the <filename>rc.conf</filename> file itself.</para> <para>A number of strategies may be applied in clustered applications to separate site-wide configuration from system-specific configuration in order to keep administration overhead down. The recommended approach is to place system-specific configuration into the <filename>/etc/rc.conf.local</filename> file. For example:</para> <itemizedlist> <listitem> <para><filename>/etc/rc.conf</filename>:</para> <programlisting>sshd_enable="YES" keyrate="fast" defaultrouter="10.1.1.254"</programlisting> </listitem> <listitem> <para><filename>/etc/rc.conf.local</filename>:</para> <programlisting>hostname="node1.example.org" ifconfig_fxp0="inet 10.1.1.1/8"</programlisting> </listitem> </itemizedlist> <para>The <filename>rc.conf</filename> file can then be distributed to every system using <command>rsync</command> or a similar program, while the <filename>rc.conf.local</filename> file remains unique.</para> <para>Upgrading the system using &man.sysinstall.8; or <command>make world</command> will not overwrite the <filename>rc.conf</filename> file, so system configuration information will not be lost.</para> <tip> <para>The <filename>/etc/rc.conf</filename> configuration file is parsed by &man.sh.1;. This allows system operators to add a certain amount of logic to this file, which may help to create very complex configuration scenarios. Please see &man.rc.conf.5; for further information on this topic.</para> </tip> </sect1> <sect1 id="configtuning-appconfig"> <title>Application Configuration</title> <para>Typically, installed applications have their own configuration files, with their own syntax, etc. It is important that these files be kept separate from the base system, so that they may be easily located and managed by the package management tools.</para> <indexterm><primary>/usr/local/etc</primary></indexterm> <para>Typically, these files are installed in <filename class="directory">/usr/local/etc</filename>. In the case where an application has a large number of configuration files, a subdirectory will be created to hold them.</para> <para>Normally, when a port or package is installed, sample configuration files are also installed. These are usually identified with a <filename>.default</filename> suffix. If there are no existing configuration files for the application, they will be created by copying the <filename>.default</filename> files.</para> <para>For example, consider the contents of the directory <filename class="directory">/usr/local/etc/apache</filename>:</para> <literallayout class="monospaced">-rw-r--r-- 1 root wheel 2184 May 20 1998 access.conf -rw-r--r-- 1 root wheel 2184 May 20 1998 access.conf.default -rw-r--r-- 1 root wheel 9555 May 20 1998 httpd.conf -rw-r--r-- 1 root wheel 9555 May 20 1998 httpd.conf.default -rw-r--r-- 1 root wheel 12205 May 20 1998 magic -rw-r--r-- 1 root wheel 12205 May 20 1998 magic.default -rw-r--r-- 1 root wheel 2700 May 20 1998 mime.types -rw-r--r-- 1 root wheel 2700 May 20 1998 mime.types.default -rw-r--r-- 1 root wheel 7980 May 20 1998 srm.conf -rw-r--r-- 1 root wheel 7933 May 20 1998 srm.conf.default</literallayout> <para>The file sizes show that only the <filename>srm.conf</filename> file has been changed. A later update of the <application>Apache</application> port would not overwrite this changed file.</para> </sect1> <sect1 id="configtuning-starting-services"> <sect1info> <authorgroup> <author> <firstname>Tom</firstname> <surname>Rhodes</surname> <contrib>Contributed by </contrib> </author> </authorgroup> </sect1info> <title>Starting Services</title> <indexterm><primary>services</primary></indexterm> <para>Many users choose to install third party software on &os; from the Ports Collection. In many of these situations it may be necessary to configure the software in a manner which will allow it to be started upon system initialization. Services, such as <filename role="package">mail/postfix</filename> or <filename role="package">www/apache22</filename> are just two of the many software packages which may be started during system initialization. This section explains the procedures available for starting third party software.</para> <para>In &os;, most included services, such as &man.cron.8;, are started through the system start up scripts. These scripts may differ depending on &os; or vendor version; however, the most important aspect to consider is that their start up configuration can be handled through simple startup scripts.</para> <sect2> <title>Extended Application Configuration</title> <para>Now that &os; includes <filename>rc.d</filename>, configuration of application startup has become easier, and more featureful. Using the key words discussed in the <link linkend="configtuning-rcd">rc.d</link> section, applications may now be set to start after certain other services for example <acronym>DNS</acronym>; may permit extra flags to be passed through <filename>rc.conf</filename> in place of hard coded flags in the start up script, etc. A basic script may look similar to the following:</para> <programlisting>#!/bin/sh # # PROVIDE: utility # REQUIRE: DAEMON # KEYWORD: shutdown . /etc/rc.subr name=utility rcvar=utility_enable command="/usr/local/sbin/utility" load_rc_config $name # # DO NOT CHANGE THESE DEFAULT VALUES HERE # SET THEM IN THE /etc/rc.conf FILE # utility_enable=${utility_enable-"NO"} pidfile=${utility_pidfile-"/var/run/utility.pid"} run_rc_command "$1"</programlisting> <para>This script will ensure that the provided <application>utility</application> will be started after the <literal>DAEMON</literal> pseudo-service. It also provides a method for setting and tracking the <acronym>PID</acronym>, or process <acronym>ID</acronym> file.</para> <para>This application could then have the following line placed in <filename>/etc/rc.conf</filename>:</para> <programlisting>utility_enable="YES"</programlisting> <para>This method also allows for easier manipulation of the command line arguments, inclusion of the default functions provided in <filename>/etc/rc.subr</filename>, compatibility with the &man.rcorder.8; utility and provides for easier configuration via the <filename>rc.conf</filename> file.</para> </sect2> <sect2> <title>Using Services to Start Services</title> <para>Other services, such as <acronym>POP</acronym>3 server daemons, <acronym>IMAP</acronym>, etc. could be started using &man.inetd.8;. This involves installing the service utility from the Ports Collection with a configuration line added to the <filename>/etc/inetd.conf</filename> file, or by uncommenting one of the current configuration lines. Working with <application>inetd</application> and its configuration is described in depth in the <link linkend="network-inetd">inetd</link> section.</para> <para>In some cases it may make more sense to use the &man.cron.8; daemon to start system services. This approach has a number of advantages because <command>cron</command> runs these processes as the <filename>crontab</filename>'s file owner. This allows regular users to start and maintain some applications.</para> <para>The <command>cron</command> utility provides a unique feature, <literal>@reboot</literal>, which may be used in place of the time specification. This will cause the job to be run when &man.cron.8; is started, normally during system initialization.</para> </sect2> </sect1> <sect1 id="configtuning-cron"> <sect1info> <authorgroup> <author> <firstname>Tom</firstname> <surname>Rhodes</surname> <contrib>Contributed by </contrib> <!-- 20 May 2003 --> </author> </authorgroup> </sect1info> <title>Configuring the <command>cron</command> Utility</title> <indexterm><primary>cron</primary> <secondary>configuration</secondary></indexterm> <para>One of the most useful utilities in &os; is &man.cron.8;. The <command>cron</command> utility runs in the background and constantly checks the <filename>/etc/crontab</filename> file. The <command>cron</command> utility also checks the <filename class="directory">/var/cron/tabs</filename> directory, in search of new <filename>crontab</filename> files. These <filename>crontab</filename> files store information about specific functions which <command>cron</command> is supposed to perform at certain times.</para> <para>The <command>cron</command> utility uses two different types of configuration files, the system crontab and user crontabs. These formats only differ in the sixth field and later. In the system crontab, <command>cron</command> will run the command as the user specified in the sixth field. In a user crontab, all commands run as the user who created the crontab, so the sixth field is the last field; this is an important security feature. The final field is always the command to run.</para> <note> <para>User crontabs allow individual users to schedule tasks without the need for <username>root</username> privileges. Commands in a user's crontab run with the permissions of the user who owns the crontab.</para> <para>The <username>root</username> user can have a user crontab just like any other user. The <username>root</username> user crontab is separate from <filename>/etc/crontab</filename> (the system crontab). Because the system crontab effectively invokes the specified commands as root there is usually no need to create a user crontab for <username>root</username>.</para> </note> <para>Let us take a look at the <filename>/etc/crontab</filename> file (the system crontab):</para> <programlisting># /etc/crontab - root's crontab for &os; # # $&os;: src/etc/crontab,v 1.32 2002/11/22 16:13:39 tom Exp $ # <co id="co-comments"/> # SHELL=/bin/sh PATH=/etc:/bin:/sbin:/usr/bin:/usr/sbin <co id="co-env"/> HOME=/var/log # # #minute hour mday month wday who command <co id="co-field-descr"/> # # */5 * * * * root /usr/libexec/atrun <co id="co-main"/></programlisting> <calloutlist> <callout arearefs="co-comments"> <para>Like most &os; configuration files, the <literal>#</literal> character represents a comment. A comment can be placed in the file as a reminder of what and why a desired action is performed. Comments cannot be on the same line as a command or else they will be interpreted as part of the command; they must be on a new line. Blank lines are ignored.</para> </callout> <callout arearefs="co-env"> <para>First, the environment must be defined. The equals (<literal>=</literal>) character is used to define any environment settings, as with this example where it is used for the <envar>SHELL</envar>, <envar>PATH</envar>, and <envar>HOME</envar> options. If the shell line is omitted, <command>cron</command> will use the default, which is <command>sh</command>. If the <envar>PATH</envar> variable is omitted, no default will be used and file locations will need to be absolute. If <envar>HOME</envar> is omitted, <command>cron</command> will use the invoking users home directory.</para> </callout> <callout arearefs="co-field-descr"> <para>This line defines a total of seven fields. Listed here are the values <literal>minute</literal>, <literal>hour</literal>, <literal>mday</literal>, <literal>month</literal>, <literal>wday</literal>, <literal>who</literal>, and <literal>command</literal>. These are almost all self explanatory. <literal>minute</literal> is the time in minutes the command will be run. <literal>hour</literal> is similar to the <literal>minute</literal> option, just in hours. <literal>mday</literal> stands for day of the month. <literal>month</literal> is similar to <literal>hour</literal> and <literal>minute</literal>, as it designates the month. The <literal>wday</literal> option stands for day of the week. All these fields must be numeric values, and follow the twenty-four hour clock. The <literal>who</literal> field is special, and only exists in the <filename>/etc/crontab</filename> file. This field specifies which user the command should be run as. The last field is the command to be executed.</para> </callout> <callout arearefs="co-main"> <para>This last line will define the values discussed above. Notice here we have a <literal>*/5</literal> listing, followed by several more <literal>*</literal> characters. These <literal>*</literal> characters mean <quote>first-last</quote>, and can be interpreted as <emphasis>every</emphasis> time. So, judging by this line, it is apparent that the <command>atrun</command> command is to be invoked by <username>root</username> every five minutes regardless of what day or month it is. For more information on the <command>atrun</command> command, see the &man.atrun.8; manual page.</para> <para>Commands can have any number of flags passed to them; however, commands which extend to multiple lines need to be broken with the backslash <quote>\</quote> continuation character.</para> </callout> </calloutlist> <para>This is the basic setup for every <filename>crontab</filename> file, although there is one thing different about this one. Field number six, where we specified the username, only exists in the system <filename>/etc/crontab</filename> file. This field should be omitted for individual user <filename>crontab</filename> files.</para> <sect2 id="configtuning-installcrontab"> <title>Installing a Crontab</title> <important> <para>Do not use the procedure described here to edit and install the system crontab, <filename>/etc/crontab</filename>. Just use your favorite editor: the <command>cron</command> utility will notice that the file has changed and immediately begin using the updated version. See <ulink url="&url.books.faq;/admin.html#ROOT-NOT-FOUND-CRON-ERRORS"> this FAQ entry</ulink> for more information.</para> </important> <para>To install a freshly written user <filename>crontab</filename>, first use your favorite editor to create a file in the proper format, and then use the <command>crontab</command> utility. The most common usage is:</para> <screen>&prompt.user; <userinput>crontab crontab-file</userinput></screen> <para>In this example, <filename>crontab-file</filename> is the filename of a <filename>crontab</filename> that was previously created.</para> <para>There is also an option to list installed <filename>crontab</filename> files: just pass the <option>-l</option> option to <command>crontab</command> and look over the output.</para> <para>For users who wish to begin their own crontab file from scratch, without the use of a template, the <command>crontab -e</command> option is available. This will invoke the selected editor with an empty file. When the file is saved, it will be automatically installed by the <command>crontab</command> command.</para> <para>In order to remove a user <filename>crontab</filename> completely, use <command>crontab</command> with the <option>-r</option> option.</para> </sect2> </sect1> <sect1 id="configtuning-rcd"> <sect1info> <authorgroup> <author> <firstname>Tom</firstname> <surname>Rhodes</surname> <contrib>Contributed by </contrib> <!-- 16 May 2003 --> </author> </authorgroup> </sect1info> <title>Using &man.rc.8; Under &os;</title> <para>In 2002 &os; integrated the NetBSD <filename>rc.d</filename> system for system initialization. Users should notice the files listed in the <filename class="directory">/etc/rc.d</filename> directory. Many of these files are for basic services which can be controlled with the <option>start</option>, <option>stop</option>, and <option>restart</option> options. For instance, &man.sshd.8; can be restarted with the following command:</para> <screen>&prompt.root; <userinput>/etc/rc.d/sshd restart</userinput></screen> <para>This procedure is similar for other services. Of course, services are usually started automatically at boot time as specified in &man.rc.conf.5;. For example, enabling the Network Address Translation daemon at startup is as simple as adding the following line to <filename>/etc/rc.conf</filename>:</para> <programlisting>natd_enable="YES"</programlisting> <para>If a <option>natd_enable="NO"</option> line is already present, then simply change the <option>NO</option> to <option>YES</option>. The rc scripts will automatically load any other dependent services during the next reboot, as described below.</para> <para>Since the <filename>rc.d</filename> system is primarily intended to start/stop services at system startup/shutdown time, the standard <option>start</option>, <option>stop</option> and <option>restart</option> options will only perform their action if the appropriate <filename>/etc/rc.conf</filename> variables are set. For instance the above <command>sshd restart</command> command will only work if <varname>sshd_enable</varname> is set to <option>YES</option> in <filename>/etc/rc.conf</filename>. To <option>start</option>, <option>stop</option> or <option>restart</option> a service regardless of the settings in <filename>/etc/rc.conf</filename>, the commands should be prefixed with <quote>one</quote>. For instance to restart <command>sshd</command> regardless of the current <filename>/etc/rc.conf</filename> setting, execute the following command:</para> <screen>&prompt.root; <userinput>/etc/rc.d/sshd onerestart</userinput></screen> <para>It is easy to check if a service is enabled in <filename>/etc/rc.conf</filename> by running the appropriate <filename>rc.d</filename> script with the option <option>rcvar</option>. Thus, an administrator can check that <command>sshd</command> is in fact enabled in <filename>/etc/rc.conf</filename> by running:</para> <screen>&prompt.root; <userinput>/etc/rc.d/sshd rcvar</userinput> # sshd $sshd_enable=YES</screen> <note> <para>The second line (<literal># sshd</literal>) is the output from the <command>sshd</command> command, not a <username>root</username> console.</para> </note> <para>To determine if a service is running, a <option>status</option> option is available. For instance to verify that <command>sshd</command> is actually started:</para> <screen>&prompt.root; <userinput>/etc/rc.d/sshd status</userinput> sshd is running as pid 433.</screen> <para>In some cases it is also possible to <option>reload</option> a service. This will attempt to send a signal to an individual service, forcing the service to reload its configuration files. In most cases this means sending the service a <literal>SIGHUP</literal> signal. Support for this feature is not included for every service.</para> <para>The <filename>rc.d</filename> system is not only used for network services, it also contributes to most of the system initialization. For instance, consider the <filename>bgfsck</filename> file. When this script is executed, it will print out the following message:</para> <screen>Starting background file system checks in 60 seconds.</screen> <para>Therefore this file is used for background file system checks, which are done only during system initialization.</para> <para>Many system services depend on other services to function properly. For example, NIS and other RPC-based services may fail to start until after the <command>rpcbind</command> (portmapper) service has started. To resolve this issue, information about dependencies and other meta-data is included in the comments at the top of each startup script. The &man.rcorder.8; program is then used to parse these comments during system initialization to determine the order in which system services should be invoked to satisfy the dependencies.</para> <para>The following words must be included in all startup scripts (they are required by &man.rc.subr.8; to <quote>enable</quote> the startup script):</para> <itemizedlist> <listitem> <para><literal>PROVIDE</literal>: Specifies the services this file provides.</para> </listitem> </itemizedlist> <para>The following words may be included at the top of each startup file. They are not strictly necessary, but they are useful as hints to &man.rcorder.8;:</para> <itemizedlist> <listitem> <para><literal>REQUIRE</literal>: Lists services which are required for this service. This file will run <emphasis>after</emphasis> the specified services.</para> </listitem> <listitem> <para><literal>BEFORE</literal>: Lists services which depend on this service. This file will run <emphasis>before</emphasis> the specified services.</para> </listitem> </itemizedlist> <para>By carefully setting these keywords for each startup script, an administrator has a very fine-grained level of control of the startup order of the scripts, without the hassle of <quote>runlevels</quote> like some other &unix; operating systems.</para> <para>Additional information about the <filename>rc.d</filename> system can be found in the &man.rc.8; and &man.rc.subr.8; manual pages. If you are interested in writing your own <filename>rc.d</filename> scripts or improving the existing ones, you may find <ulink url="&url.articles.rc-scripting">this article</ulink> also useful.</para> </sect1> <sect1 id="config-network-setup"> <sect1info> <authorgroup> <author> <firstname>Marc</firstname> <surname>Fonvieille</surname> <contrib>Contributed by </contrib> <!-- 6 October 2002 --> </author> </authorgroup> </sect1info> <title>Setting Up Network Interface Cards</title> <indexterm> <primary>network cards</primary> <secondary>configuration</secondary> </indexterm> <para>Nowadays we can not think about a computer without thinking about a network connection. Adding and configuring a network card is a common task for any &os; administrator.</para> <sect2> <title>Locating the Correct Driver</title> <indexterm> <primary>network cards</primary> <secondary>driver</secondary> </indexterm> <para>Before you begin, you should know the model of the card you have, the chip it uses, and whether it is a PCI or ISA card. &os; supports a wide variety of both PCI and ISA cards. Check the Hardware Compatibility List for your release to see if your card is supported.</para> <para>Once you are sure your card is supported, you need to determine the proper driver for the card. <filename>/usr/src/sys/conf/NOTES</filename> and <filename>/usr/src/sys/<replaceable>arch</replaceable>/conf/NOTES</filename> will give you the list of network interface drivers with some information about the supported chipsets/cards. If you have doubts about which driver is the correct one, read the manual page of the driver. The manual page will give you more information about the supported hardware and even the possible problems that could occur.</para> <para>If you own a common card, most of the time you will not have to look very hard for a driver. Drivers for common network cards are present in the <filename>GENERIC</filename> kernel, so your card should show up during boot, like so:</para> <screen>dc0: <82c169 PNIC 10/100BaseTX> port 0xa000-0xa0ff mem 0xd3800000-0xd38 000ff irq 15 at device 11.0 on pci0 miibus0: <MII bus> on dc0 bmtphy0: <BCM5201 10/100baseTX PHY> PHY 1 on miibus0 bmtphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto dc0: Ethernet address: 00:a0:cc:da:da:da dc0: [ITHREAD] dc1: <82c169 PNIC 10/100BaseTX> port 0x9800-0x98ff mem 0xd3000000-0xd30 000ff irq 11 at device 12.0 on pci0 miibus1: <MII bus> on dc1 bmtphy1: <BCM5201 10/100baseTX PHY> PHY 1 on miibus1 bmtphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto dc1: Ethernet address: 00:a0:cc:da:da:db dc1: [ITHREAD]</screen> <para>In this example, we see that two cards using the &man.dc.4; driver are present on the system.</para> <para>If the driver for your NIC is not present in <filename>GENERIC</filename>, you will need to load the proper driver to use your NIC. This may be accomplished in one of two ways:</para> <itemizedlist> <listitem> <para>The easiest way is to simply load a kernel module for your network card with &man.kldload.8;, or automatically at boot time by adding the appropriate line to the file <filename>/boot/loader.conf</filename>. Not all NIC drivers are available as modules; notable examples of devices for which modules do not exist are ISA cards.</para> </listitem> <listitem> <para>Alternatively, you may statically compile the support for your card into your kernel. Check <filename>/usr/src/sys/conf/NOTES</filename>, <filename>/usr/src/sys/<replaceable>arch</replaceable>/conf/NOTES</filename> and the manual page of the driver to know what to add in your kernel configuration file. For more information about recompiling your kernel, please see <xref linkend="kernelconfig"/>. If your card was detected at boot by your kernel (<filename>GENERIC</filename>) you do not have to build a new kernel.</para> </listitem> </itemizedlist> <sect3 id="config-network-ndis"> <title>Using &windows; NDIS Drivers</title> <indexterm><primary>NDIS</primary></indexterm> <indexterm><primary>NDISulator</primary></indexterm> <indexterm><primary>&windows; drivers</primary></indexterm> <indexterm><primary>Microsoft Windows</primary></indexterm> <indexterm> <primary>Microsoft Windows</primary> <secondary>device drivers</secondary> </indexterm> <indexterm> <primary>KLD (kernel loadable object)</primary> </indexterm> <!-- We should probably omit the expanded name, and add a <see> entry for it. Whatever is done must also be done to the same indexterm in linuxemu/chapter.sgml --> <para>Unfortunately, there are still many vendors that do not provide schematics for their drivers to the open source community because they regard such information as trade secrets. Consequently, the developers of &os; and other operating systems are left two choices: develop the drivers by a long and pain-staking process of reverse engineering or using the existing driver binaries available for the µsoft.windows; platforms. Most developers, including those involved with &os;, have taken the latter approach.</para> <para>Thanks to the contributions of Bill Paul (wpaul) there is <quote>native</quote> support for the Network Driver Interface Specification (NDIS). The &os; NDISulator (otherwise known as Project Evil) takes a &windows; driver binary and basically tricks it into thinking it is running on &windows;. Because the &man.ndis.4; driver is using a &windows; binary, it only runs on &i386; and amd64 systems. PCI, CardBus, PCMCIA (PC-Card), and USB devices are supported.</para> <para>To use the NDISulator, three things are needed:</para> <orderedlist> <listitem> <para>Kernel sources</para> </listitem> <listitem> <para>&windowsxp; driver binary (<filename>.SYS</filename> extension)</para> </listitem> <listitem> <para>&windowsxp; driver configuration file (<filename>.INF</filename> extension)</para> </listitem> </orderedlist> <para>Locate the files for your specific card. Generally, they can be found on the included CDs or at the vendor's website. In the following examples, we will use <filename>W32DRIVER.SYS</filename> and <filename>W32DRIVER.INF</filename>.</para> <para>The driver bit width must match the version of &os;. For &os;/i386, use a &windows; 32-bit driver. For &os;/amd64, a &windows; 64-bit driver is needed.</para> <para>The next step is to compile the driver binary into a loadable kernel module. As <username>root</username>, use &man.ndisgen.8;:</para> <screen>&prompt.root; <userinput>ndisgen <replaceable>/path/to/W32DRIVER.INF</replaceable> <replaceable>/path/to/W32DRIVER.SYS</replaceable></userinput></screen> <para>&man.ndisgen.8; is interactive and prompts for any extra information it requires. A new kernel module is written in the current directory. Use &man.kldload.8; to load the new module:</para> <screen>&prompt.root; <userinput>kldload <replaceable>./W32DRIVER_SYS.ko</replaceable></userinput></screen> <para>In addition to the generated kernel module, you must load the <filename>ndis.ko</filename> and <filename>if_ndis.ko</filename> modules. This should be automatically done when you load any module that depends on &man.ndis.4;. If you want to load them manually, use the following commands:</para> <screen>&prompt.root; <userinput>kldload ndis</userinput> &prompt.root; <userinput>kldload if_ndis</userinput></screen> <para>The first command loads the NDIS miniport driver wrapper, the second loads the actual network interface.</para> <para>Now, check &man.dmesg.8; to see if there were any errors loading. If all went well, you should get output resembling the following:</para> <screen>ndis0: <Wireless-G PCI Adapter> mem 0xf4100000-0xf4101fff irq 3 at device 8.0 on pci1 ndis0: NDIS API version: 5.0 ndis0: Ethernet address: 0a:b1:2c:d3:4e:f5 ndis0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps ndis0: 11g rates: 6Mbps 9Mbps 12Mbps 18Mbps 36Mbps 48Mbps 54Mbps</screen> <para>From here you can treat the <devicename>ndis0</devicename> device like any other network interface (e.g., <devicename>dc0</devicename>).</para> <para>You can configure the system to load the NDIS modules at boot time in the same way as with any other module. First, copy the generated module, <filename>W32DRIVER_SYS.ko</filename>, to the <filename class="directory">/boot/modules</filename> directory. Then, add the following line to <filename>/boot/loader.conf</filename>:</para> <programlisting>W32DRIVER_SYS_load="YES"</programlisting> </sect3> </sect2> <sect2> <title>Configuring the Network Card</title> <indexterm> <primary>network cards</primary> <secondary>configuration</secondary> </indexterm> <para>Once the right driver is loaded for the network card, the card needs to be configured. As with many other things, the network card may have been configured at installation time by <application>sysinstall</application>.</para> <para>To display the configuration for the network interfaces on your system, enter the following command:</para> <screen>&prompt.user; <userinput>ifconfig</userinput> dc0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=80008<VLAN_MTU,LINKSTATE> ether 00:a0:cc:da:da:da inet 192.168.1.3 netmask 0xffffff00 broadcast 192.168.1.255 media: Ethernet autoselect (100baseTX <full-duplex>) status: active dc1: flags=8802<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=80008<VLAN_MTU,LINKSTATE> ether 00:a0:cc:da:da:db inet 10.0.0.1 netmask 0xffffff00 broadcast 10.0.0.255 media: Ethernet 10baseT/UTP status: no carrier plip0: flags=8810<POINTOPOINT,SIMPLEX,MULTICAST> metric 0 mtu 1500 lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 options=3<RXCSUM,TXCSUM> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 inet6 ::1 prefixlen 128 inet 127.0.0.1 netmask 0xff000000 nd6 options=3<PERFORMNUD,ACCEPT_RTADV></screen> <para>In this example, the following devices were displayed:</para> <itemizedlist> <listitem> <para><devicename>dc0</devicename>: The first Ethernet interface</para> </listitem> <listitem> <para><devicename>dc1</devicename>: The second Ethernet interface</para> </listitem> <listitem> <para><devicename>plip0</devicename>: The parallel port interface (if a parallel port is present on the machine)</para> </listitem> <listitem> <para><devicename>lo0</devicename>: The loopback device</para> </listitem> </itemizedlist> <para>&os; uses the driver name followed by the order in which one the card is detected at the kernel boot to name the network card. For example <devicename>sis2</devicename> would be the third network card on the system using the &man.sis.4; driver.</para> <para>In this example, the <devicename>dc0</devicename> device is up and running. The key indicators are:</para> <orderedlist> <listitem> <para><literal>UP</literal> means that the card is configured and ready.</para> </listitem> <listitem> <para>The card has an Internet (<literal>inet</literal>) address (in this case <hostid role="ipaddr">192.168.1.3</hostid>).</para> </listitem> <listitem> <para>It has a valid subnet mask (<literal>netmask</literal>; <hostid role="netmask">0xffffff00</hostid> is the same as <hostid role="netmask">255.255.255.0</hostid>).</para> </listitem> <listitem> <para>It has a valid broadcast address (in this case, <hostid role="ipaddr">192.168.1.255</hostid>).</para> </listitem> <listitem> <para>The MAC address of the card (<literal>ether</literal>) is <hostid role="mac">00:a0:cc:da:da:da</hostid></para> </listitem> <listitem> <para>The physical media selection is on autoselection mode (<literal>media: Ethernet autoselect (100baseTX <full-duplex>)</literal>). We see that <devicename>dc1</devicename> was configured to run with <literal>10baseT/UTP</literal> media. For more information on available media types for a driver, please refer to its manual page.</para> </listitem> <listitem> <para>The status of the link (<literal>status</literal>) is <literal>active</literal>, i.e., the carrier is detected. For <devicename>dc1</devicename>, we see <literal>status: no carrier</literal>. This is normal when an Ethernet cable is not plugged into the card.</para> </listitem> </orderedlist> <para>If the &man.ifconfig.8; output had shown something similar to:</para> <screen>dc0: flags=8843<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=80008<VLAN_MTU,LINKSTATE> ether 00:a0:cc:da:da:da media: Ethernet autoselect (100baseTX <full-duplex>) status: active</screen> <para>it would indicate the card has not been configured.</para> <para>To configure your card, you need <username>root</username> privileges. The network card configuration can be done from the command line with &man.ifconfig.8; but you would have to do it after each reboot of the system. The file <filename>/etc/rc.conf</filename> is where to add the network card's configuration.</para> <para>Open <filename>/etc/rc.conf</filename> in your favorite editor. You need to add a line for each network card present on the system, for example in our case, we added these lines:</para> <programlisting>ifconfig_dc0="inet 192.168.1.3 netmask 255.255.255.0" ifconfig_dc1="inet 10.0.0.1 netmask 255.255.255.0 media 10baseT/UTP"</programlisting> <para>You have to replace <devicename>dc0</devicename>, <devicename>dc1</devicename>, and so on, with the correct device for your cards, and the addresses with the proper ones. You should read the card driver and &man.ifconfig.8; manual pages for more details about the allowed options and also &man.rc.conf.5; manual page for more information on the syntax of <filename>/etc/rc.conf</filename>.</para> <para>If you configured the network during installation, some lines about the network card(s) may be already present. Double check <filename>/etc/rc.conf</filename> before adding any lines.</para> <para>You will also have to edit the file <filename>/etc/hosts</filename> to add the names and the IP addresses of various machines of the LAN, if they are not already there. For more information please refer to &man.hosts.5; and to <filename>/usr/share/examples/etc/hosts</filename>.</para> <note> <para>If access to the Internet is planned with the machine, you also have to manually set up the default gateway and the nameserver:</para> <screen>&prompt.root; <userinput>echo 'defaultrouter="<replaceable>your_default_router</replaceable>"' >> /etc/rc.conf</userinput> &prompt.root; <userinput>echo 'nameserver <replaceable>your_DNS_server</replaceable>' >> /etc/resolv.conf</userinput></screen> </note> </sect2> <sect2> <title>Testing and Troubleshooting</title> <para>Once you have made the necessary changes in <filename>/etc/rc.conf</filename>, you should reboot your system. This will allow the change(s) to the interface(s) to be applied, and verify that the system restarts without any configuration errors. Alternatively you can just relaunch the networking system:</para> <screen>&prompt.root; <userinput>/etc/rc.d/netif restart</userinput></screen> <note> <para>If a default gateway has been set in <filename>/etc/rc.conf</filename>, use also this command:</para> <screen>&prompt.root; <userinput>/etc/rc.d/routing restart</userinput></screen> </note> <para>Once the networking system has been relaunched, you should test the network interfaces.</para> <sect3> <title>Testing the Ethernet Card</title> <indexterm> <primary>network cards</primary> <secondary>testing</secondary> </indexterm> <para>To verify that an Ethernet card is configured correctly, you have to try two things. First, ping the interface itself, and then ping another machine on the LAN.</para> <para>First test the local interface:</para> <screen>&prompt.user; <userinput>ping -c5 192.168.1.3</userinput> PING 192.168.1.3 (192.168.1.3): 56 data bytes 64 bytes from 192.168.1.3: icmp_seq=0 ttl=64 time=0.082 ms 64 bytes from 192.168.1.3: icmp_seq=1 ttl=64 time=0.074 ms 64 bytes from 192.168.1.3: icmp_seq=2 ttl=64 time=0.076 ms 64 bytes from 192.168.1.3: icmp_seq=3 ttl=64 time=0.108 ms 64 bytes from 192.168.1.3: icmp_seq=4 ttl=64 time=0.076 ms --- 192.168.1.3 ping statistics --- 5 packets transmitted, 5 packets received, 0% packet loss round-trip min/avg/max/stddev = 0.074/0.083/0.108/0.013 ms</screen> <para>Now we have to ping another machine on the LAN:</para> <screen>&prompt.user; <userinput>ping -c5 192.168.1.2</userinput> PING 192.168.1.2 (192.168.1.2): 56 data bytes 64 bytes from 192.168.1.2: icmp_seq=0 ttl=64 time=0.726 ms 64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=0.766 ms 64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=0.700 ms 64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=0.747 ms 64 bytes from 192.168.1.2: icmp_seq=4 ttl=64 time=0.704 ms --- 192.168.1.2 ping statistics --- 5 packets transmitted, 5 packets received, 0% packet loss round-trip min/avg/max/stddev = 0.700/0.729/0.766/0.025 ms</screen> <para>You could also use the machine name instead of <hostid role="ipaddr">192.168.1.2</hostid> if you have set up the <filename>/etc/hosts</filename> file.</para> </sect3> <sect3> <title>Troubleshooting</title> <indexterm> <primary>network cards</primary> <secondary>troubleshooting</secondary> </indexterm> <para>Troubleshooting hardware and software configurations is always a pain, and a pain which can be alleviated by checking the simple things first. Is your network cable plugged in? Have you properly configured the network services? Did you configure the firewall correctly? Is the card you are using supported by &os;? Always check the hardware notes before sending off a bug report. Update your version of &os; to the latest STABLE version. Check the mailing list archives, or perhaps search the Internet.</para> <para>If the card works, yet performance is poor, it would be worthwhile to read over the &man.tuning.7; manual page. You can also check the network configuration as incorrect network settings can cause slow connections.</para> <para>Some users experience one or two <errorname>device timeout</errorname> messages, which is normal for some cards. If they continue, or are bothersome, you may wish to be sure the device is not conflicting with another device. Double check the cable connections. Perhaps you may just need to get another card.</para> <para>At times, users see a few <errorname>watchdog timeout</errorname> errors. The first thing to do here is to check your network cable. Many cards require a PCI slot which supports Bus Mastering. On some old motherboards, only one PCI slot allows it (usually slot 0). Check the network card and the motherboard documentation to determine if that may be the problem.</para> <para><errorname>No route to host</errorname> messages occur if the system is unable to route a packet to the destination host. This can happen if no default route is specified, or if a cable is unplugged. Check the output of <command>netstat -rn</command> and make sure there is a valid route to the host you are trying to reach. If there is not, read on to <xref linkend="advanced-networking"/>.</para> <para><errorname>ping: sendto: Permission denied</errorname> error messages are often caused by a misconfigured firewall. If <command>ipfw</command> is enabled in the kernel but no rules have been defined, then the default policy is to deny all traffic, even ping requests! Read on to <xref linkend="firewalls"/> for more information.</para> <para>Sometimes performance of the card is poor, or below average. In these cases it is best to set the media selection mode from <literal>autoselect</literal> to the correct media selection. While this usually works for most hardware, it may not resolve this issue for everyone. Again, check all the network settings, and read over the &man.tuning.7; manual page.</para> </sect3> </sect2> </sect1> <sect1 id="configtuning-virtual-hosts"> <title>Virtual Hosts</title> <indexterm><primary>virtual hosts</primary></indexterm> <indexterm><primary>IP aliases</primary></indexterm> <para>A very common use of &os; is virtual site hosting, where one server appears to the network as many servers. This is achieved by assigning multiple network addresses to a single interface.</para> <para>A given network interface has one <quote>real</quote> address, and may have any number of <quote>alias</quote> addresses. These aliases are normally added by placing alias entries in <filename>/etc/rc.conf</filename>.</para> <para>An alias entry for the interface <devicename>fxp0</devicename> looks like:</para> <programlisting>ifconfig_fxp0_alias0="inet xxx.xxx.xxx.xxx netmask xxx.xxx.xxx.xxx"</programlisting> <para>Note that alias entries must start with <literal>alias0</literal> and proceed upwards in order, (for example, <literal>_alias1</literal>, <literal>_alias2</literal>, and so on). The configuration process will stop at the first missing number.</para> <para>The calculation of alias netmasks is important, but fortunately quite simple. For a given interface, there must be one address which correctly represents the network's netmask. Any other addresses which fall within this network must have a netmask of all <literal>1</literal>s (expressed as either <hostid role="netmask">255.255.255.255</hostid> or <hostid role="netmask">0xffffffff</hostid>).</para> <para>For example, consider the case where the <devicename>fxp0</devicename> interface is connected to two networks, the <hostid role="ipaddr">10.1.1.0</hostid> network with a netmask of <hostid role="netmask">255.255.255.0</hostid> and the <hostid role="ipaddr">202.0.75.16</hostid> network with a netmask of <hostid role="netmask">255.255.255.240</hostid>. We want the system to appear at <hostid role="ipaddr">10.1.1.1</hostid> through <hostid role="ipaddr">10.1.1.5</hostid> and at <hostid role="ipaddr">202.0.75.17</hostid> through <hostid role="ipaddr">202.0.75.20</hostid>. As noted above, only the first address in a given network range (in this case, <hostid role="ipaddr">10.0.1.1</hostid> and <hostid role="ipaddr">202.0.75.17</hostid>) should have a real netmask; all the rest (<hostid role="ipaddr">10.1.1.2</hostid> through <hostid role="ipaddr">10.1.1.5</hostid> and <hostid role="ipaddr">202.0.75.18</hostid> through <hostid role="ipaddr">202.0.75.20</hostid>) must be configured with a netmask of <hostid role="netmask">255.255.255.255</hostid>.</para> <para>The following <filename>/etc/rc.conf</filename> entries configure the adapter correctly for this arrangement:</para> <programlisting>ifconfig_fxp0="inet 10.1.1.1 netmask 255.255.255.0" ifconfig_fxp0_alias0="inet 10.1.1.2 netmask 255.255.255.255" ifconfig_fxp0_alias1="inet 10.1.1.3 netmask 255.255.255.255" ifconfig_fxp0_alias2="inet 10.1.1.4 netmask 255.255.255.255" ifconfig_fxp0_alias3="inet 10.1.1.5 netmask 255.255.255.255" ifconfig_fxp0_alias4="inet 202.0.75.17 netmask 255.255.255.240" ifconfig_fxp0_alias5="inet 202.0.75.18 netmask 255.255.255.255" ifconfig_fxp0_alias6="inet 202.0.75.19 netmask 255.255.255.255" ifconfig_fxp0_alias7="inet 202.0.75.20 netmask 255.255.255.255"</programlisting> </sect1> <sect1 id="configtuning-syslog"> <sect1info> <authorgroup> <author> <firstname>Niclas</firstname> <surname>Zeising</surname> <contrib>Contributed by </contrib> </author> </authorgroup> </sect1info> <title>Configuring the system logger <application>syslogd</application></title> <indexterm><primary>system logging</primary></indexterm> <indexterm><primary>syslog</primary></indexterm> <indexterm><primary>syslogd</primary></indexterm> <para>System logging is an important aspect of system administration. It is used both to detect hardware and software issues and errors in the system. It also plays a very important role in security auditing and incident response. System daemons without a controlling terminal also usually log information to a system logging facility or other log file.</para> <para>This section will describe how to configure and use the &os; system logger, &man.syslogd.8;, as well as discuss log rotation and log management using &man.newsyslog.8;. Focus will be on setting up and using <command>syslogd</command> on a local machine. For more advanced setups using a separate loghost, see <xref linkend="network-syslogd"/>.</para> <sect2> <title>Using <application>syslogd</application></title> <para>In the default &os; configuration &man.syslogd.8; is started at boot. This is controlled by the variable <literal>syslogd_enable</literal> in <filename>/etc/rc.conf</filename>. There are numerous application arguments that affect the behavior of &man.syslogd.8;. To change them, use <literal>syslogd_flags</literal> in <filename>/etc/rc.conf</filename>. Refer to &man.syslogd.8; for more information on the arguments, and &man.rc.conf.5;, <xref linkend="configtuning-core-configuration"/> and <xref linkend="configtuning-rcd"/> for more information about <filename>/etc/rc.conf</filename> and the &man.rc.8; subsystem.</para> </sect2> <sect2> <title>Configuring <application>syslogd</application></title> <indexterm><primary>syslog.conf</primary></indexterm> <para>The configuration file, by default <filename>/etc/syslog.conf</filename>, controls what &man.syslogd.8; does with the log entries once they are received. There are several parameters to control the handling of incoming events, of which the most basic are <firstterm>facility</firstterm> and <firstterm>level</firstterm>. The facility describes which subsystem generated the message, such as the kernel or a daemon, and the level describes the severity of the event that occurred. This makes it possible to log the message to different log files, or discard it, depending on the facility and level. It is also possible to take action depending on the application that sent the message, and in the case of remote logging, also the hostname of the machine generating the logging event.</para> <para>Configuring &man.syslogd.8; is quite straight forward. The configuration file contains one line per action, and the syntax for each line is a selector field followed by an action field. The syntax of the selector field is <replaceable>facility.level</replaceable> which will match log messages from <replaceable>facility</replaceable> at level <replaceable>level</replaceable> or higher. It is also possible to add an optional comparison flag before the level to specify more precisely what is logged. Multiple selector fields can be used for the same action, and are separated with a semicolon (<literal>;</literal>). Using <literal>*</literal> will match everything. The action field denotes where to send the log message, such as a file or a remote log host. As an example, here is the default <filename>syslog.conf</filename> from &os;:</para> <programlisting># $&os;$ # # Spaces ARE valid field separators in this file. However, # other *nix-like systems still insist on using tabs as field # separators. If you are sharing this file between systems, you # may want to use only tabs as field separators here. # Consult the &man.syslog.conf.5; manpage. *.err;kern.warning;auth.notice;mail.crit /dev/console <co id="co-syslog-many-match"/> *.notice;authpriv.none;kern.debug;lpr.info;mail.crit;news.err /var/log/messages security.* /var/log/security auth.info;authpriv.info /var/log/auth.log mail.info /var/log/maillog <co id="co-syslog-one-match"/> lpr.info /var/log/lpd-errs ftp.info /var/log/xferlog cron.* /var/log/cron *.=debug /var/log/debug.log <co id="co-syslog-comparison"/> *.emerg * # uncomment this to log all writes to /dev/console to /var/log/console.log #console.info /var/log/console.log # uncomment this to enable logging of all log messages to /var/log/all.log # touch /var/log/all.log and chmod it to mode 600 before it will work #*.* /var/log/all.log # uncomment this to enable logging to a remote loghost named loghost #*.* @loghost # uncomment these if you're running inn # news.crit /var/log/news/news.crit # news.err /var/log/news/news.err # news.notice /var/log/news/news.notice !ppp <co id="co-syslog-prog-spec"/> *.* /var/log/ppp.log !*</programlisting> <calloutlist> <callout arearefs="co-syslog-many-match"> <para>Match all messages with a level of <literal>err</literal> or higher, as well as <literal>kern.warning</literal>, <literal>auth.notice</literal> and <literal>mail.crit</literal>, and send these log messages to the console (<filename>/dev/console</filename>).</para> </callout> <callout arearefs="co-syslog-one-match"> <para>Match all messages from the <literal>mail</literal> facility at level <literal>info</literal> or above, and log the messages to <filename>/var/log/maillog</filename>.</para> </callout> <callout arearefs="co-syslog-comparison"> <para>This line uses a comparison flag, <literal>=</literal> to only match messages at level <literal>debug</literal>, and log them in <filename>/var/log/debug.log</filename>.</para> </callout> <callout arearefs="co-syslog-prog-spec"> <para>Here is an example usage of a <emphasis>program specification</emphasis>. This will make the rules following only be valid for the program in the program specification. In this case this line and the following makes all messages from <command>ppp</command>, but no other programs, end up in <filename>/var/log/ppp.log</filename>.</para> </callout> </calloutlist> <para>This example shows that there are plenty of levels and subsystems. The levels are, in order from most to least critical: <literal>emerg</literal>, <literal>alert</literal>, <literal>crit</literal>, <literal>err</literal>, <literal>warning</literal>, <literal>notice</literal>, <literal>info</literal> and <literal>debug</literal>.</para> <para>The facilities are, in no particular order: <literal>auth</literal>, <literal>authpriv</literal>, <literal>console</literal>, <literal>cron</literal>, <literal>daemon</literal>, <literal>ftp</literal>, <literal>kern</literal>, <literal>lpr</literal>, <literal>mail</literal>, <literal>mark</literal>, <literal>news</literal>, <literal>security</literal>, <literal>syslog</literal>, <literal>user</literal>, <literal>uucp</literal> and <literal>local0</literal> through <literal>local7</literal>. Be aware that other operating systems might have different facilities.</para> <para>With this knowledge it is easy to add a new line to <filename>/etc/syslog.conf</filename> to log everything from the different daemons on level <literal>notice</literal> and higher to <filename>/var/log/daemon.log</filename>. Just add the following:</para> <programlisting>daemon.notice /var/log/daemon.log</programlisting> <para>For more information about the different levels and facilities, refer to &man.syslog.3; and &man.syslogd.8;. For more information about <filename>syslog.conf</filename>, its syntax, and more advanced usage examples, see &man.syslog.conf.5; and <xref linkend="network-syslogd"/>.</para> </sect2> <sect2> <title>Log management and rotation with <application>newsyslog</application></title> <indexterm><primary>newsyslog</primary></indexterm> <indexterm><primary>newsyslog.conf</primary></indexterm> <indexterm><primary>log rotation</primary></indexterm> <indexterm><primary>log management</primary></indexterm> <para>Log files tend to grow quickly and accumulate steadily. This leads to the files being full of less immediately useful information, as well as filling up the hard drive. To mitigate this, log management comes into play. In &os;, &man.newsyslog.8; is the tool used to manage log files. This program is used to periodically rotate and compress log files, as well as optionally create missing log files and signal programs when log files are moved. The log files do not necessarily have to come from syslog; &man.newsyslog.8; works with any logs written from any program. It is important to note that <command>newsyslog</command> is normally run from &man.cron.8; and is not a system daemon. In the default configuration it is run every hour.</para> <sect3> <title>Configuring <application>newsyslog</application></title> <para>To know what actions to take, &man.newsyslog.8; reads its configuration file, by default <filename>/etc/newsyslog.conf</filename>. This configuration file contains one line for each file that &man.newsyslog.8; manages. Each line states the file owner, permissions, and when to rotate that file, as well as optional flags that affect the log rotation (such as compression) and programs to signal when the log is rotated. As an example, here is the default configuration in &os;:</para> <programlisting># configuration file for newsyslog # $&os;$ # # Entries which do not specify the '/pid_file' field will cause the # syslogd process to be signalled when that log file is rotated. This # action is only appropriate for log files which are written to by the # syslogd process (ie, files listed in /etc/syslog.conf). If there # is no process which needs to be signalled when a given log file is # rotated, then the entry for that file should include the 'N' flag. # # The 'flags' field is one or more of the letters: BCDGJNUXZ or a '-'. # # Note: some sites will want to select more restrictive protections than the # defaults. In particular, it may be desirable to switch many of the 644 # entries to 640 or 600. For example, some sites will consider the # contents of maillog, messages, and lpd-errs to be confidential. In the # future, these defaults may change to more conservative ones. # # logfilename [owner:group] mode count size when flags [/pid_file] [sig_num] /var/log/all.log 600 7 * @T00 J /var/log/amd.log 644 7 100 * J /var/log/auth.log 600 7 100 @0101T JC /var/log/console.log 600 5 100 * J /var/log/cron 600 3 100 * JC /var/log/daily.log 640 7 * @T00 JN /var/log/debug.log 600 7 100 * JC /var/log/init.log 644 3 100 * J /var/log/kerberos.log 600 7 100 * J /var/log/lpd-errs 644 7 100 * JC /var/log/maillog 640 7 * @T00 JC /var/log/messages 644 5 100 @0101T JC /var/log/monthly.log 640 12 * $M1D0 JN /var/log/pflog 600 3 100 * JB /var/run/pflogd.pid /var/log/ppp.log root:network 640 3 100 * JC /var/log/security 600 10 100 * JC /var/log/sendmail.st 640 10 * 168 B /var/log/utx.log 644 3 * @01T05 B /var/log/weekly.log 640 5 1 $W6D0 JN /var/log/xferlog 600 7 100 * JC</programlisting> <para>Each line starts with the name of the file to be rotated, optionally followrd by an owner and group for both rotated and newly created files. The next field, <literal>mode</literal> is the mode of the files and <literal>count</literal> denotes how many rotated log files should be kept. The <literal>size</literal> and <literal>when</literal> fields tell <command>newsyslog</command> when to rotate the file. A log file is rotated when either its size is larger than the <literal>size</literal> field, or when the time in the <literal>when</literal> filed has passed. <literal>*</literal> means that this field is ignored. The <replaceable>flags</replaceable> field gives &man.newsyslog.8; further instructions, such as how to compress the rotated file, or to create the log file if it is missing. The last two fields are optional, and specify the <acronym role="Process Identifier">PID</acronym>-file of a process and a signal number to send to that process with when the file is rotated. For more information on all fields, valid flags and how to specify the rotation time, refer to &man.newsyslog.conf.5;. Remember that <command>newsyslog</command> is run from <command>cron</command> and can not rotate files more often than it is run from &man.cron.8;.</para> </sect3> </sect2> </sect1> <sect1 id="configtuning-configfiles"> <title>Configuration Files</title> <sect2> <title><filename class="directory">/etc</filename> Layout</title> <para>There are a number of directories in which configuration information is kept. These include:</para> <informaltable frame="none" pgwide="1"> <tgroup cols="2"> <colspec colwidth="1*"/> <colspec colwidth="2*"/> <tbody> <row> <entry><filename class="directory">/etc</filename></entry> <entry>Generic system configuration information; data here is system-specific.</entry> </row> <row> <entry><filename class="directory">/etc/defaults</filename></entry> <entry>Default versions of system configuration files.</entry> </row> <row> <entry><filename class="directory">/etc/mail</filename></entry> <entry>Extra &man.sendmail.8; configuration, other MTA configuration files.</entry> </row> <row> <entry><filename class="directory">/etc/ppp</filename></entry> <entry>Configuration for both user- and kernel-ppp programs.</entry> </row> <row> <entry><filename class="directory">/etc/namedb</filename></entry> <entry>Default location for &man.named.8; data. Normally <filename>named.conf</filename> and zone files are stored here.</entry> </row> <row> <entry><filename class="directory">/usr/local/etc</filename></entry> <entry>Configuration files for installed applications. May contain per-application subdirectories.</entry> </row> <row> <entry><filename class="directory">/usr/local/etc/rc.d</filename></entry> <entry>Start/stop scripts for installed applications.</entry> </row> <row> <entry><filename class="directory">/var/db</filename></entry> <entry>Automatically generated system-specific database files, such as the package database, the locate database, and so on</entry> </row> </tbody> </tgroup> </informaltable> </sect2> <sect2> <title>Hostnames</title> <indexterm><primary>hostname</primary></indexterm> <indexterm><primary>DNS</primary></indexterm> <sect3> <title><filename>/etc/resolv.conf</filename></title> <indexterm> <primary><filename>resolv.conf</filename></primary> </indexterm> <para><filename>/etc/resolv.conf</filename> dictates how &os;'s resolver accesses the Internet Domain Name System (DNS).</para> <para>The most common entries to <filename>resolv.conf</filename> are:</para> <informaltable frame="none" pgwide="1"> <tgroup cols="2"> <colspec colwidth="1*"/> <colspec colwidth="2*"/> <tbody> <row> <entry><literal>nameserver</literal></entry> <entry>The IP address of a name server the resolver should query. The servers are queried in the order listed with a maximum of three.</entry> </row> <row> <entry><literal>search</literal></entry> <entry>Search list for hostname lookup. This is normally determined by the domain of the local hostname.</entry> </row> <row> <entry><literal>domain</literal></entry> <entry>The local domain name.</entry> </row> </tbody> </tgroup> </informaltable> <para>A typical <filename>resolv.conf</filename>:</para> <programlisting>search example.com nameserver 147.11.1.11 nameserver 147.11.100.30</programlisting> <note> <para>Only one of the <literal>search</literal> and <literal>domain</literal> options should be used.</para> </note> <para>If you are using DHCP, &man.dhclient.8; usually rewrites <filename>resolv.conf</filename> with information received from the DHCP server.</para> </sect3> <sect3> <title><filename>/etc/hosts</filename></title> <indexterm><primary>hosts</primary></indexterm> <para><filename>/etc/hosts</filename> is a simple text database reminiscent of the old Internet. It works in conjunction with DNS and NIS providing name to IP address mappings. Local computers connected via a LAN can be placed in here for simplistic naming purposes instead of setting up a &man.named.8; server. Additionally, <filename>/etc/hosts</filename> can be used to provide a local record of Internet names, reducing the need to query externally for commonly accessed names.</para> <programlisting># $&os;$ # # # Host Database # # This file should contain the addresses and aliases for local hosts that # share this file. Replace 'my.domain' below with the domainname of your # machine. # # In the presence of the domain name service or NIS, this file may # not be consulted at all; see /etc/nsswitch.conf for the resolution order. # # ::1 localhost localhost.my.domain 127.0.0.1 localhost localhost.my.domain # # Imaginary network. #10.0.0.2 myname.my.domain myname #10.0.0.3 myfriend.my.domain myfriend # # According to RFC 1918, you can use the following IP networks for # private nets which will never be connected to the Internet: # # 10.0.0.0 - 10.255.255.255 # 172.16.0.0 - 172.31.255.255 # 192.168.0.0 - 192.168.255.255 # # In case you want to be able to connect to the Internet, you need # real official assigned numbers. Do not try to invent your own network # numbers but instead get one from your network provider (if any) or # from your regional registry (ARIN, APNIC, LACNIC, RIPE NCC, or AfriNIC.) #</programlisting> <para><filename>/etc/hosts</filename> takes on the simple format of:</para> <programlisting>[Internet address] [official hostname] [alias1] [alias2] ...</programlisting> <para>For example:</para> <programlisting>10.0.0.1 myRealHostname.example.com myRealHostname foobar1 foobar2</programlisting> <para>Consult &man.hosts.5; for more information.</para> </sect3> </sect2> <sect2 id="configtuning-sysctlconf"> <title><filename>sysctl.conf</filename></title> <indexterm><primary>sysctl.conf</primary></indexterm> <indexterm><primary>sysctl</primary></indexterm> <para><filename>sysctl.conf</filename> looks much like <filename>rc.conf</filename>. Values are set in a <literal>variable=value</literal> form. The specified values are set after the system goes into multi-user mode. Not all variables are settable in this mode.</para> <para>To turn off logging of fatal signal exits and prevent users from seeing processes started from other users, the following tunables can be set in <filename>sysctl.conf</filename>:</para> <programlisting># Do not log fatal signal exits (e.g., sig 11) kern.logsigexit=0 # Prevent users from seeing information about processes that # are being run under another UID. security.bsd.see_other_uids=0</programlisting> </sect2> </sect1> <sect1 id="configtuning-sysctl"> <title>Tuning with &man.sysctl.8;</title> <indexterm><primary>sysctl</primary></indexterm> <indexterm> <primary>tuning</primary> <secondary>with sysctl</secondary> </indexterm> <para>&man.sysctl.8; is an interface that allows you to make changes to a running &os; system. This includes many advanced options of the TCP/IP stack and virtual memory system that can dramatically improve performance for an experienced system administrator. Over five hundred system variables can be read and set using &man.sysctl.8;.</para> <para>At its core, &man.sysctl.8; serves two functions: to read and to modify system settings.</para> <para>To view all readable variables:</para> <screen>&prompt.user; <userinput>sysctl -a</userinput></screen> <para>To read a particular variable, for example, <varname>kern.maxproc</varname>:</para> <screen>&prompt.user; <userinput>sysctl kern.maxproc</userinput> kern.maxproc: 1044</screen> <para>To set a particular variable, use the intuitive <replaceable>variable</replaceable>=<replaceable>value</replaceable> syntax:</para> <screen>&prompt.root; <userinput>sysctl kern.maxfiles=5000</userinput> kern.maxfiles: 2088 -> 5000</screen> <para>Settings of sysctl variables are usually either strings, numbers, or booleans (a boolean being <literal>1</literal> for yes or a <literal>0</literal> for no).</para> <para>If you want to set automatically some variables each time the machine boots, add them to the <filename>/etc/sysctl.conf</filename> file. For more information see the &man.sysctl.conf.5; manual page and the <xref linkend="configtuning-sysctlconf"/>.</para> <sect2 id="sysctl-readonly"> <sect2info> <authorgroup> <author> <firstname>Tom</firstname> <surname>Rhodes</surname> <contrib>Contributed by </contrib> <!-- 31 January 2003 --> </author> </authorgroup> </sect2info> <title>&man.sysctl.8; Read-only</title> <para>In some cases it may be desirable to modify read-only &man.sysctl.8; values. While this is sometimes unavoidable, it can only be done on (re)boot.</para> <para>For instance on some laptop models the &man.cardbus.4; device will not probe memory ranges, and fail with errors which look similar to:</para> <screen>cbb0: Could not map register memory device_probe_and_attach: cbb0 attach returned 12</screen> <para>Cases like the one above usually require the modification of some default &man.sysctl.8; settings which are set read only. To overcome these situations a user can put &man.sysctl.8; <quote>OIDs</quote> in their local <filename>/boot/loader.conf</filename>. Default settings are located in the <filename>/boot/defaults/loader.conf</filename> file.</para> <para>Fixing the problem mentioned above would require a user to set <option>hw.pci.allow_unsupported_io_range=1</option> in the aforementioned file. Now &man.cardbus.4; will work properly.</para> </sect2> </sect1> <sect1 id="configtuning-disk"> <title>Tuning Disks</title> <sect2> <title>Sysctl Variables</title> <sect3> <title><varname>vfs.vmiodirenable</varname></title> <indexterm> <primary><varname>vfs.vmiodirenable</varname></primary> </indexterm> <para>The <varname>vfs.vmiodirenable</varname> sysctl variable may be set to either 0 (off) or 1 (on); it is 1 by default. This variable controls how directories are cached by the system. Most directories are small, using just a single fragment (typically 1 K) in the file system and less (typically 512 bytes) in the buffer cache. With this variable turned off (to 0), the buffer cache will only cache a fixed number of directories even if you have a huge amount of memory. When turned on (to 1), this sysctl allows the buffer cache to use the VM Page Cache to cache the directories, making all the memory available for caching directories. However, the minimum in-core memory used to cache a directory is the physical page size (typically 4 K) rather than 512 bytes. We recommend keeping this option on if you are running any services which manipulate large numbers of files. Such services can include web caches, large mail systems, and news systems. Keeping this option on will generally not reduce performance even with the wasted memory but you should experiment to find out.</para> </sect3> <sect3> <title><varname>vfs.write_behind</varname></title> <indexterm> <primary><varname>vfs.write_behind</varname></primary> </indexterm> <para>The <varname>vfs.write_behind</varname> sysctl variable defaults to <literal>1</literal> (on). This tells the file system to issue media writes as full clusters are collected, which typically occurs when writing large sequential files. The idea is to avoid saturating the buffer cache with dirty buffers when it would not benefit I/O performance. However, this may stall processes and under certain circumstances you may wish to turn it off.</para> </sect3> <sect3> <title><varname>vfs.hirunningspace</varname></title> <indexterm> <primary><varname>vfs.hirunningspace</varname></primary> </indexterm> <para>The <varname>vfs.hirunningspace</varname> sysctl variable determines how much outstanding write I/O may be queued to disk controllers system-wide at any given instance. The default is usually sufficient but on machines with lots of disks you may want to bump it up to four or five <emphasis>megabytes</emphasis>. Note that setting too high a value (exceeding the buffer cache's write threshold) can lead to extremely bad clustering performance. Do not set this value arbitrarily high! Higher write values may add latency to reads occurring at the same time.</para> <para>There are various other buffer-cache and VM page cache related sysctls. We do not recommend modifying these values, the VM system does an extremely good job of automatically tuning itself.</para> </sect3> <sect3> <title><varname>vm.swap_idle_enabled</varname></title> <indexterm> <primary><varname>vm.swap_idle_enabled</varname></primary> </indexterm> <para>The <varname>vm.swap_idle_enabled</varname> sysctl variable is useful in large multi-user systems where you have lots of users entering and leaving the system and lots of idle processes. Such systems tend to generate a great deal of continuous pressure on free memory reserves. Turning this feature on and tweaking the swapout hysteresis (in idle seconds) via <varname>vm.swap_idle_threshold1</varname> and <varname>vm.swap_idle_threshold2</varname> allows you to depress the priority of memory pages associated with idle processes more quickly then the normal pageout algorithm. This gives a helping hand to the pageout daemon. Do not turn this option on unless you need it, because the tradeoff you are making is essentially pre-page memory sooner rather than later; thus eating more swap and disk bandwidth. In a small system this option will have a determinable effect but in a large system that is already doing moderate paging this option allows the VM system to stage whole processes into and out of memory easily.</para> </sect3> <sect3> <title><varname>hw.ata.wc</varname></title> <indexterm> <primary><varname>hw.ata.wc</varname></primary> </indexterm> <para>&os; 4.3 flirted with turning off IDE write caching. This reduced write bandwidth to IDE disks but was considered necessary due to serious data consistency issues introduced by hard drive vendors. The problem is that IDE drives lie about when a write completes. With IDE write caching turned on, IDE hard drives not only write data to disk out of order, but will sometimes delay writing some blocks indefinitely when under heavy disk loads. A crash or power failure may cause serious file system corruption. &os;'s default was changed to be safe. Unfortunately, the result was such a huge performance loss that we changed write caching back to on by default after the release. You should check the default on your system by observing the <varname>hw.ata.wc</varname> sysctl variable. If IDE write caching is turned off, you can turn it back on by setting the kernel variable back to 1. This must be done from the boot loader at boot time. Attempting to do it after the kernel boots will have no effect.</para> <para>For more information, please see &man.ata.4;.</para> </sect3> <sect3> <title><literal>SCSI_DELAY</literal> (<varname>kern.cam.scsi_delay</varname>)</title> <indexterm> <primary><varname>kern.cam.scsi_delay</varname></primary> </indexterm> <indexterm> <primary>kernel options</primary> <secondary><literal>SCSI_DELAY</literal></secondary> </indexterm> <para>The <literal>SCSI_DELAY</literal> kernel config may be used to reduce system boot times. The defaults are fairly high and can be responsible for <literal>15</literal> seconds of delay in the boot process. Reducing it to <literal>5</literal> seconds usually works (especially with modern drives). The <varname>kern.cam.scsi_delay</varname> boot time tunable should be used. The tunable, and kernel config option accept values in terms of <emphasis>milliseconds</emphasis> and <emphasis>not</emphasis> <emphasis>seconds</emphasis>.</para> </sect3> </sect2> <sect2 id="soft-updates"> <title>Soft Updates</title> <indexterm><primary>Soft Updates</primary></indexterm> <indexterm><primary>tunefs</primary></indexterm> <para>The &man.tunefs.8; program can be used to fine-tune a file system. This program has many different options, but for now we are only concerned with toggling Soft Updates on and off, which is done by:</para> <screen>&prompt.root; <userinput>tunefs -n enable /filesystem</userinput> &prompt.root; <userinput>tunefs -n disable /filesystem</userinput></screen> <para>A filesystem cannot be modified with &man.tunefs.8; while it is mounted. A good time to enable Soft Updates is before any partitions have been mounted, in single-user mode.</para> <para>Soft Updates drastically improves meta-data performance, mainly file creation and deletion, through the use of a memory cache. We recommend to use Soft Updates on all of your file systems. There are two downsides to Soft Updates that you should be aware of: First, Soft Updates guarantees filesystem consistency in the case of a crash but could very easily be several seconds (even a minute!) behind updating the physical disk. If your system crashes you may lose more work than otherwise. Secondly, Soft Updates delays the freeing of filesystem blocks. If you have a filesystem (such as the root filesystem) which is almost full, performing a major update, such as <command>make installworld</command>, can cause the filesystem to run out of space and the update to fail.</para> <sect3> <title>More Details About Soft Updates</title> <indexterm> <primary>Soft Updates</primary> <secondary>details</secondary> </indexterm> <para>There are two traditional approaches to writing a file systems meta-data back to disk. (Meta-data updates are updates to non-content data like inodes or directories.)</para> <para>Historically, the default behavior was to write out meta-data updates synchronously. If a directory had been changed, the system waited until the change was actually written to disk. The file data buffers (file contents) were passed through the buffer cache and backed up to disk later on asynchronously. The advantage of this implementation is that it operates safely. If there is a failure during an update, the meta-data are always in a consistent state. A file is either created completely or not at all. If the data blocks of a file did not find their way out of the buffer cache onto the disk by the time of the crash, &man.fsck.8; is able to recognize this and repair the filesystem by setting the file length to 0. Additionally, the implementation is clear and simple. The disadvantage is that meta-data changes are slow. An <command>rm -r</command>, for instance, touches all the files in a directory sequentially, but each directory change (deletion of a file) will be written synchronously to the disk. This includes updates to the directory itself, to the inode table, and possibly to indirect blocks allocated by the file. Similar considerations apply for unrolling large hierarchies (<command>tar -x</command>).</para> <para>The second case is asynchronous meta-data updates. This is the default for Linux/ext2fs and <command>mount -o async</command> for *BSD ufs. All meta-data updates are simply being passed through the buffer cache too, that is, they will be intermixed with the updates of the file content data. The advantage of this implementation is there is no need to wait until each meta-data update has been written to disk, so all operations which cause huge amounts of meta-data updates work much faster than in the synchronous case. Also, the implementation is still clear and simple, so there is a low risk for bugs creeping into the code. The disadvantage is that there is no guarantee at all for a consistent state of the filesystem. If there is a failure during an operation that updated large amounts of meta-data (like a power failure, or someone pressing the reset button), the filesystem will be left in an unpredictable state. There is no opportunity to examine the state of the filesystem when the system comes up again; the data blocks of a file could already have been written to the disk while the updates of the inode table or the associated directory were not. It is actually impossible to implement a <command>fsck</command> which is able to clean up the resulting chaos (because the necessary information is not available on the disk). If the filesystem has been damaged beyond repair, the only choice is to use &man.newfs.8; on it and restore it from backup.</para> <para>The usual solution for this problem was to implement <emphasis>dirty region logging</emphasis>, which is also referred to as <emphasis>journaling</emphasis>, although that term is not used consistently and is occasionally applied to other forms of transaction logging as well. Meta-data updates are still written synchronously, but only into a small region of the disk. Later on they will be moved to their proper location. Because the logging area is a small, contiguous region on the disk, there are no long distances for the disk heads to move, even during heavy operations, so these operations are quicker than synchronous updates. Additionally the complexity of the implementation is fairly limited, so the risk of bugs being present is low. A disadvantage is that all meta-data are written twice (once into the logging region and once to the proper location) so for normal work, a performance <quote>pessimization</quote> might result. On the other hand, in case of a crash, all pending meta-data operations can be quickly either rolled-back or completed from the logging area after the system comes up again, resulting in a fast filesystem startup.</para> <para>Kirk McKusick, the developer of Berkeley FFS, solved this problem with Soft Updates: all pending meta-data updates are kept in memory and written out to disk in a sorted sequence (<quote>ordered meta-data updates</quote>). This has the effect that, in case of heavy meta-data operations, later updates to an item <quote>catch</quote> the earlier ones if the earlier ones are still in memory and have not already been written to disk. So all operations on, say, a directory are generally performed in memory before the update is written to disk (the data blocks are sorted according to their position so that they will not be on the disk ahead of their meta-data). If the system crashes, this causes an implicit <quote>log rewind</quote>: all operations which did not find their way to the disk appear as if they had never happened. A consistent filesystem state is maintained that appears to be the one of 30 to 60 seconds earlier. The algorithm used guarantees that all resources in use are marked as such in their appropriate bitmaps: blocks and inodes. After a crash, the only resource allocation error that occurs is that resources are marked as <quote>used</quote> which are actually <quote>free</quote>. &man.fsck.8; recognizes this situation, and frees the resources that are no longer used. It is safe to ignore the dirty state of the filesystem after a crash by forcibly mounting it with <command>mount -f</command>. In order to free resources that may be unused, &man.fsck.8; needs to be run at a later time. This is the idea behind the <emphasis>background fsck</emphasis>: at system startup time, only a <emphasis>snapshot</emphasis> of the filesystem is recorded. The <command>fsck</command> can be run later on. All file systems can then be mounted <quote>dirty</quote>, so the system startup proceeds in multiuser mode. Then, background <command>fsck</command>s will be scheduled for all file systems where this is required, to free resources that may be unused. (File systems that do not use Soft Updates still need the usual foreground <command>fsck</command> though.)</para> <para>The advantage is that meta-data operations are nearly as fast as asynchronous updates (i.e., faster than with <emphasis>logging</emphasis>, which has to write the meta-data twice). The disadvantages are the complexity of the code (implying a higher risk for bugs in an area that is highly sensitive regarding loss of user data), and a higher memory consumption. Additionally there are some idiosyncrasies one has to get used to. After a crash, the state of the filesystem appears to be somewhat <quote>older</quote>. In situations where the standard synchronous approach would have caused some zero-length files to remain after the <command>fsck</command>, these files do not exist at all with a Soft Updates filesystem because neither the meta-data nor the file contents have ever been written to disk. Disk space is not released until the updates have been written to disk, which may take place some time after running <command>rm</command>. This may cause problems when installing large amounts of data on a filesystem that does not have enough free space to hold all the files twice.</para> </sect3> </sect2> </sect1> <sect1 id="configtuning-kernel-limits"> <title>Tuning Kernel Limits</title> <indexterm> <primary>tuning</primary> <secondary>kernel limits</secondary> </indexterm> <sect2 id="file-process-limits"> <title>File/Process Limits</title> <sect3 id="kern-maxfiles"> <title><varname>kern.maxfiles</varname></title> <indexterm> <primary><varname>kern.maxfiles</varname></primary> </indexterm> <para><varname>kern.maxfiles</varname> can be raised or lowered based upon your system requirements. This variable indicates the maximum number of file descriptors on your system. When the file descriptor table is full, <errorname>file: table is full</errorname> will show up repeatedly in the system message buffer, which can be viewed with the <command>dmesg</command> command.</para> <para>Each open file, socket, or fifo uses one file descriptor. A large-scale production server may easily require many thousands of file descriptors, depending on the kind and number of services running concurrently.</para> <para>In older FreeBSD releases, the default value of <varname>kern.maxfiles</varname> is derived from the <option>maxusers</option> option in your kernel configuration file. <varname>kern.maxfiles</varname> grows proportionally to the value of <option>maxusers</option>. When compiling a custom kernel, it is a good idea to set this kernel configuration option according to the uses of your system. From this number, the kernel is given most of its pre-defined limits. Even though a production machine may not actually have 256 users connected at once, the resources needed may be similar to a high-scale web server.</para> <para>The variable <varname>kern.maxusers</varname> is automatically sized at boot based on the amount of memory available in the system, and may be determined at run-time by inspecting the value of the read-only <varname>kern.maxusers</varname> sysctl. Some sites will require larger or smaller values of <varname>kern.maxusers</varname> and may set it as a loader tunable; values of 64, 128, and 256 are not uncommon. We do not recommend going above 256 unless you need a huge number of file descriptors; many of the tunable values set to their defaults by <varname>kern.maxusers</varname> may be individually overridden at boot-time or run-time in <filename>/boot/loader.conf</filename> (see the &man.loader.conf.5; manual page or the <filename>/boot/defaults/loader.conf</filename> file for some hints) or as described elsewhere in this document.</para> <para>In older releases, the system will auto-tune <literal>maxusers</literal> for you if you explicitly set it to <literal>0</literal> <footnote><para>The auto-tuning algorithm sets <literal>maxusers</literal> equal to the amount of memory in the system, with a minimum of 32, and a maximum of 384.</para></footnote>. When setting this option, you will want to set <literal>maxusers</literal> to at least 4, especially if you are using the X Window System or compiling software. The reason is that the most important table set by <literal>maxusers</literal> is the maximum number of processes, which is set to <literal>20 + 16 * maxusers</literal>, so if you set <literal>maxusers</literal> to 1, then you can only have 36 simultaneous processes, including the 18 or so that the system starts up at boot time and the 15 or so you will probably create when you start the X Window System. Even a simple task like reading a manual page will start up nine processes to filter, decompress, and view it. Setting <literal>maxusers</literal> to 64 will allow you to have up to 1044 simultaneous processes, which should be enough for nearly all uses. If, however, you see the dreaded <errortype>proc table full</errortype> error when trying to start another program, or are running a server with a large number of simultaneous users (like <hostid role="fqdn">ftp.FreeBSD.org</hostid>), you can always increase the number and rebuild.</para> <note> <para><literal>maxusers</literal> does <emphasis>not</emphasis> limit the number of users which can log into your machine. It simply sets various table sizes to reasonable values considering the maximum number of users you will likely have on your system and how many processes each of them will be running.</para> </note> </sect3> <sect3> <title><varname>kern.ipc.somaxconn</varname></title> <indexterm> <primary><varname>kern.ipc.somaxconn</varname></primary> </indexterm> <para>The <varname>kern.ipc.somaxconn</varname> sysctl variable limits the size of the listen queue for accepting new TCP connections. The default value of <literal>128</literal> is typically too low for robust handling of new connections in a heavily loaded web server environment. For such environments, it is recommended to increase this value to <literal>1024</literal> or higher. The service daemon may itself limit the listen queue size (e.g., &man.sendmail.8;, or <application>Apache</application>) but will often have a directive in its configuration file to adjust the queue size. Large listen queues also do a better job of avoiding Denial of Service (<abbrev>DoS</abbrev>) attacks.</para> </sect3> </sect2> <sect2 id="nmbclusters"> <title>Network Limits</title> <para>The <literal>NMBCLUSTERS</literal> kernel configuration option dictates the amount of network Mbufs available to the system. A heavily-trafficked server with a low number of Mbufs will hinder &os;'s ability. Each cluster represents approximately 2 K of memory, so a value of 1024 represents 2 megabytes of kernel memory reserved for network buffers. A simple calculation can be done to figure out how many are needed. If you have a web server which maxes out at 1000 simultaneous connections, and each connection eats a 16 K receive and 16 K send buffer, you need approximately 32 MB worth of network buffers to cover the web server. A good rule of thumb is to multiply by 2, so 2x32 MB / 2 KB = 64 MB / 2 kB = 32768. We recommend values between 4096 and 32768 for machines with greater amounts of memory. Under no circumstances should you specify an arbitrarily high value for this parameter as it could lead to a boot time crash. The <option>-m</option> option to &man.netstat.1; may be used to observe network cluster use.</para> <para><varname>kern.ipc.nmbclusters</varname> loader tunable should be used to tune this at boot time. Only older versions of &os; will require you to use the <literal>NMBCLUSTERS</literal> kernel &man.config.8; option.</para> <para>For busy servers that make extensive use of the &man.sendfile.2; system call, it may be necessary to increase the number of &man.sendfile.2; buffers via the <literal>NSFBUFS</literal> kernel configuration option or by setting its value in <filename>/boot/loader.conf</filename> (see &man.loader.8; for details). A common indicator that this parameter needs to be adjusted is when processes are seen in the <literal>sfbufa</literal> state. The sysctl variable <varname>kern.ipc.nsfbufs</varname> is a read-only glimpse at the kernel configured variable. This parameter nominally scales with <varname>kern.maxusers</varname>, however it may be necessary to tune accordingly.</para> <important> <para>Even though a socket has been marked as non-blocking, calling &man.sendfile.2; on the non-blocking socket may result in the &man.sendfile.2; call blocking until enough <literal>struct sf_buf</literal>'s are made available.</para> </important> <sect3> <title><varname>net.inet.ip.portrange.*</varname></title> <indexterm> <primary>net.inet.ip.portrange.*</primary> </indexterm> <para>The <varname>net.inet.ip.portrange.*</varname> sysctl variables control the port number ranges automatically bound to TCP and UDP sockets. There are three ranges: a low range, a default range, and a high range. Most network programs use the default range which is controlled by the <varname>net.inet.ip.portrange.first</varname> and <varname>net.inet.ip.portrange.last</varname>, which default to 1024 and 5000, respectively. Bound port ranges are used for outgoing connections, and it is possible to run the system out of ports under certain circumstances. This most commonly occurs when you are running a heavily loaded web proxy. The port range is not an issue when running servers which handle mainly incoming connections, such as a normal web server, or has a limited number of outgoing connections, such as a mail relay. For situations where you may run yourself out of ports, it is recommended to increase <varname>net.inet.ip.portrange.last</varname> modestly. A value of <literal>10000</literal>, <literal>20000</literal> or <literal>30000</literal> may be reasonable. You should also consider firewall effects when changing the port range. Some firewalls may block large ranges of ports (usually low-numbered ports) and expect systems to use higher ranges of ports for outgoing connections — for this reason it is not recommended that <varname>net.inet.ip.portrange.first</varname> be lowered.</para> </sect3> <sect3> <title>TCP Bandwidth Delay Product</title> <indexterm> <primary>TCP Bandwidth Delay Product Limiting</primary> <secondary><varname>net.inet.tcp.inflight.enable</varname></secondary> </indexterm> <para>The TCP Bandwidth Delay Product Limiting is similar to TCP/Vegas in NetBSD. It can be enabled by setting <varname>net.inet.tcp.inflight.enable</varname> sysctl variable to <literal>1</literal>. The system will attempt to calculate the bandwidth delay product for each connection and limit the amount of data queued to the network to just the amount required to maintain optimum throughput.</para> <para>This feature is useful if you are serving data over modems, Gigabit Ethernet, or even high speed WAN links (or any other link with a high bandwidth delay product), especially if you are also using window scaling or have configured a large send window. If you enable this option, you should also be sure to set <varname>net.inet.tcp.inflight.debug</varname> to <literal>0</literal> (disable debugging), and for production use setting <varname>net.inet.tcp.inflight.min</varname> to at least <literal>6144</literal> may be beneficial. However, note that setting high minimums may effectively disable bandwidth limiting depending on the link. The limiting feature reduces the amount of data built up in intermediate route and switch packet queues as well as reduces the amount of data built up in the local host's interface queue. With fewer packets queued up, interactive connections, especially over slow modems, will also be able to operate with lower <emphasis>Round Trip Times</emphasis>. However, note that this feature only effects data transmission (uploading / server side). It has no effect on data reception (downloading).</para> <para>Adjusting <varname>net.inet.tcp.inflight.stab</varname> is <emphasis>not</emphasis> recommended. This parameter defaults to 20, representing 2 maximal packets added to the bandwidth delay product window calculation. The additional window is required to stabilize the algorithm and improve responsiveness to changing conditions, but it can also result in higher ping times over slow links (though still much lower than you would get without the inflight algorithm). In such cases, you may wish to try reducing this parameter to 15, 10, or 5; and may also have to reduce <varname>net.inet.tcp.inflight.min</varname> (for example, to 3500) to get the desired effect. Reducing these parameters should be done as a last resort only.</para> </sect3> </sect2> <sect2> <title>Virtual Memory</title> <sect3> <title><varname>kern.maxvnodes</varname></title> <para>A vnode is the internal representation of a file or directory. So increasing the number of vnodes available to the operating system cuts down on disk I/O. Normally this is handled by the operating system and does not need to be changed. In some cases where disk I/O is a bottleneck and the system is running out of vnodes, this setting will need to be increased. The amount of inactive and free RAM will need to be taken into account.</para> <para>To see the current number of vnodes in use:</para> <screen>&prompt.root; <userinput>sysctl vfs.numvnodes</userinput> vfs.numvnodes: 91349</screen> <para>To see the maximum vnodes:</para> <screen>&prompt.root; <userinput>sysctl kern.maxvnodes</userinput> kern.maxvnodes: 100000</screen> <para>If the current vnode usage is near the maximum, increasing <varname>kern.maxvnodes</varname> by a value of 1,000 is probably a good idea. Keep an eye on the number of <varname>vfs.numvnodes</varname>. If it climbs up to the maximum again, <varname>kern.maxvnodes</varname> will need to be increased further. A shift in your memory usage as reported by &man.top.1; should be visible. More memory should be active.</para> </sect3> </sect2> </sect1> <sect1 id="adding-swap-space"> <title>Adding Swap Space</title> <para>No matter how well you plan, sometimes a system does not run as you expect. If you find you need more swap space, it is simple enough to add. You have three ways to increase swap space: adding a new hard drive, enabling swap over NFS, and creating a swap file on an existing partition.</para> <para>For information on how to encrypt swap space, what options for this task exist and why it should be done, please refer to <xref linkend="swap-encrypting"/> of the Handbook.</para> <sect2 id="new-drive-swap"> <title>Swap on a New or Existing Hard Drive</title> <para>Adding a new hard drive for swap gives better performance than adding a partition on an existing drive. Setting up partitions and hard drives is explained in <xref linkend="disks-adding"/>. <xref linkend="configtuning-initial"/> discusses partition layouts and swap partition size considerations.</para> <para>Use &man.swapon.8; to add a swap partition to the system. For example:</para> <screen>&prompt.root; <userinput>swapon<replaceable> /dev/ada1s1b</replaceable></userinput></screen> <warning> <para>It is possible to use any partition not currently mounted, even if it already contains data. Using &man.swapon.8; on a partition that contains data will overwrite and destroy that data. Make sure that the partition to be added as swap is really the intended partition before running &man.swapon.8;.</para> </warning> <para>To automatically add this swap partition on boot, add an entry to <filename>/etc/fstab</filename> for the partition:</para> <programlisting><replaceable>/dev/ada1s1b</replaceable> none swap sw 0 0</programlisting> <para>See &man.fstab.5; for an explanation of the entries in <filename>/etc/fstab</filename>.</para> </sect2> <sect2 id="nfs-swap"> <title>Swapping over NFS</title> <para>Swapping over NFS is only recommended if you do not have a local hard disk to swap to; NFS swapping will be limited by the available network bandwidth and puts an additional burden on the NFS server.</para> </sect2> <sect2 id="create-swapfile"> <title>Swapfiles</title> <para>You can create a file of a specified size to use as a swap file. In our example here we will use a 64MB file called <filename>/usr/swap0</filename>. You can use any name you want, of course.</para> <example> <title>Creating a Swapfile on &os;</title> <orderedlist> <listitem> <para>The <filename>GENERIC</filename> kernel already includes the memory disk driver (&man.md.4;) required for this operation. When building a custom kernel, make sure to include the following line in your custom configuration file:</para> <programlisting>device md</programlisting> <para>For information on building your own kernel, please refer to <xref linkend="kernelconfig"/>.</para> </listitem> <listitem> <para>Create a swapfile (<filename>/usr/swap0</filename>):</para> <screen>&prompt.root; <userinput>dd if=/dev/zero of=/usr/swap0 bs=1024k count=64</userinput></screen> </listitem> <listitem> <para>Set proper permissions on (<filename>/usr/swap0</filename>):</para> <screen>&prompt.root; <userinput>chmod 0600 /usr/swap0</userinput></screen> </listitem> <listitem> <para>Enable the swap file in <filename>/etc/rc.conf</filename>:</para> <programlisting>swapfile="/usr/swap0" # Set to name of swapfile if aux swapfile desired.</programlisting> </listitem> <listitem> <para>Reboot the machine or to enable the swap file immediately, type:</para> <screen>&prompt.root; <userinput>mdconfig -a -t vnode -f /usr/swap0 -u 0 && swapon /dev/md0</userinput></screen> </listitem> </orderedlist> </example> </sect2> </sect1> <sect1 id="acpi-overview"> <sect1info> <authorgroup> <author> <firstname>Hiten</firstname> <surname>Pandya</surname> <contrib>Written by </contrib> </author> <author> <firstname>Tom</firstname> <surname>Rhodes</surname> </author> </authorgroup> </sect1info> <title>Power and Resource Management</title> <para>It is important to utilize hardware resources in an efficient manner. Before <acronym>ACPI</acronym> was introduced, it was difficult and inflexible for operating systems to manage the power usage and thermal properties of a system. The hardware was managed by the <acronym>BIOS</acronym> and thus the user had less control and visibility into the power management settings. Some limited configurability was available via <emphasis>Advanced Power Management (APM)</emphasis>. Power and resource management is one of the key components of a modern operating system. For example, you may want an operating system to monitor system limits (and possibly alert you) in case your system temperature increased unexpectedly.</para> <para>In this section of the &os; Handbook, we will provide comprehensive information about <acronym>ACPI</acronym>. References will be provided for further reading at the end.</para> <sect2 id="acpi-intro"> <title>What Is ACPI?</title> <indexterm> <primary>ACPI</primary> </indexterm> <indexterm> <primary>APM</primary> </indexterm> <para>Advanced Configuration and Power Interface (<acronym>ACPI</acronym>) is a standard written by an alliance of vendors to provide a standard interface for hardware resources and power management (hence the name). It is a key element in <emphasis>Operating System-directed configuration and Power Management</emphasis>, i.e.: it provides more control and flexibility to the operating system (<acronym>OS</acronym>). Modern systems <quote>stretched</quote> the limits of the current Plug and Play interfaces prior to the introduction of <acronym>ACPI</acronym>. <acronym>ACPI</acronym> is the direct successor to <acronym>APM</acronym> (Advanced Power Management).</para> </sect2> <sect2 id="acpi-old-spec"> <title>Shortcomings of Advanced Power Management (APM)</title> <para>The <emphasis>Advanced Power Management (APM)</emphasis> facility controls the power usage of a system based on its activity. The APM BIOS is supplied by the (system) vendor and it is specific to the hardware platform. An APM driver in the OS mediates access to the <emphasis>APM Software Interface</emphasis>, which allows management of power levels. APM should still be used for systems manufactured at or before the year 2000.</para> <para>There are four major problems in APM. Firstly, power management is done by the (vendor-specific) BIOS, and the OS does not have any knowledge of it. One example of this, is when the user sets idle-time values for a hard drive in the APM BIOS, that when exceeded, it (BIOS) would spin down the hard drive, without the consent of the OS. Secondly, the APM logic is embedded in the BIOS, and it operates outside the scope of the OS. This means users can only fix problems in their APM BIOS by flashing a new one into the ROM; which is a very dangerous procedure with the potential to leave the system in an unrecoverable state if it fails. Thirdly, APM is a vendor-specific technology, which means that there is a lot of parity (duplication of efforts) and bugs found in one vendor's BIOS, may not be solved in others. Last but not the least, the APM BIOS did not have enough room to implement a sophisticated power policy, or one that can adapt very well to the purpose of the machine.</para> <para><emphasis>Plug and Play BIOS (PNPBIOS)</emphasis> was unreliable in many situations. PNPBIOS is 16-bit technology, so the OS has to use 16-bit emulation in order to <quote>interface</quote> with PNPBIOS methods.</para> <para>The &os; <acronym>APM</acronym> driver is documented in the &man.apm.4; manual page.</para> </sect2> <sect2 id="acpi-config"> <title>Configuring <acronym>ACPI</acronym></title> <para>The <filename>acpi.ko</filename> driver is loaded by default at start up by the &man.loader.8; and should <emphasis>not</emphasis> be compiled into the kernel. The reasoning behind this is that modules are easier to work with, say if switching to another <filename>acpi.ko</filename> without doing a kernel rebuild. This has the advantage of making testing easier. Another reason is that starting <acronym>ACPI</acronym> after a system has been brought up often doesn't work well. If you are experiencing problems, you can disable <acronym>ACPI</acronym> altogether. This driver should not and can not be unloaded because the system bus uses it for various hardware interactions. <acronym>ACPI</acronym> can be disabled by setting <literal>hint.acpi.0.disabled="1"</literal> in <filename>/boot/loader.conf</filename> or at the &man.loader.8; prompt.</para> <note> <para><acronym>ACPI</acronym> and <acronym>APM</acronym> cannot coexist and should be used separately. The last one to load will terminate if the driver notices the other running.</para> </note> <para><acronym>ACPI</acronym> can be used to put the system into a sleep mode with &man.acpiconf.8;, the <option>-s</option> flag, and a <literal>1-5</literal> option. Most users will only need <literal>1</literal> or <literal>3</literal> (suspend to RAM). Option <literal>5</literal> will do a soft-off which is the same action as:</para> <screen>&prompt.root; <userinput>halt -p</userinput></screen> <para>Other options are available via &man.sysctl.8;. Check out the &man.acpi.4; and &man.acpiconf.8; manual pages for more information.</para> </sect2> </sect1> <sect1 id="ACPI-debug"> <sect1info> <authorgroup> <author> <firstname>Nate</firstname> <surname>Lawson</surname> <contrib>Written by </contrib> </author> </authorgroup> <authorgroup> <author> <firstname>Peter</firstname> <surname>Schultz</surname> <contrib>With contributions from </contrib> </author> <author> <firstname>Tom</firstname> <surname>Rhodes</surname> </author> </authorgroup> </sect1info> <title>Using and Debugging &os; <acronym>ACPI</acronym></title> <indexterm> <primary>ACPI</primary> <secondary>problems</secondary> </indexterm> <para><acronym>ACPI</acronym> is a fundamentally new way of discovering devices, managing power usage, and providing standardized access to various hardware previously managed by the <acronym>BIOS</acronym>. Progress is being made toward <acronym>ACPI</acronym> working on all systems, but bugs in some motherboards' <firstterm><acronym>ACPI</acronym> Machine Language</firstterm> (<acronym>AML</acronym>) bytecode, incompleteness in &os;'s kernel subsystems, and bugs in the &intel; <acronym>ACPI-CA</acronym> interpreter continue to appear.</para> <para>This document is intended to help you assist the &os; <acronym>ACPI</acronym> maintainers in identifying the root cause of problems you observe and debugging and developing a solution. Thanks for reading this and we hope we can solve your system's problems.</para> <sect2 id="ACPI-submitdebug"> <title>Submitting Debugging Information</title> <note> <para>Before submitting a problem, be sure you are running the latest <acronym>BIOS</acronym> version and, if available, embedded controller firmware version.</para> </note> <para>For those of you that want to submit a problem right away, please send the following information to <ulink url="mailto:freebsd-acpi@FreeBSD.org"> freebsd-acpi@FreeBSD.org</ulink>:</para> <itemizedlist> <listitem> <para>Description of the buggy behavior, including system type and model and anything that causes the bug to appear. Also, please note as accurately as possible when the bug began occurring if it is new for you.</para> </listitem> <listitem> <para>The &man.dmesg.8; output after <command>boot -v</command>, including any error messages generated by you exercising the bug.</para> </listitem> <listitem> <para>The &man.dmesg.8; output from <command>boot -v</command> with <acronym>ACPI</acronym> disabled, if disabling it helps fix the problem.</para> </listitem> <listitem> <para>Output from <command>sysctl hw.acpi</command>. This is also a good way of figuring out what features your system offers.</para> </listitem> <listitem> <para><acronym>URL</acronym> where your <firstterm><acronym>ACPI</acronym> Source Language</firstterm> (<acronym>ASL</acronym>) can be found. Do <emphasis>not</emphasis> send the <acronym>ASL</acronym> directly to the list as it can be very large. Generate a copy of your <acronym>ASL</acronym> by running this command:</para> <screen>&prompt.root; <userinput>acpidump -dt > <replaceable>name</replaceable>-<replaceable>system</replaceable>.asl</userinput></screen> <para>(Substitute your login name for <replaceable>name</replaceable> and manufacturer/model for <replaceable>system</replaceable>. Example: <filename>njl-FooCo6000.asl</filename>)</para> </listitem> </itemizedlist> <para>Most of the developers watch the &a.current; but please submit problems to &a.acpi.name; to be sure it is seen. Please be patient, all of us have full-time jobs elsewhere. If your bug is not immediately apparent, we will probably ask you to submit a <acronym>PR</acronym> via &man.send-pr.1;. When entering a <acronym>PR</acronym>, please include the same information as requested above. This will help us track the problem and resolve it. Do not send a <acronym>PR</acronym> without emailing &a.acpi.name; first as we use <acronym>PR</acronym>s as reminders of existing problems, not a reporting mechanism. It is likely that your problem has been reported by someone before.</para> </sect2> <sect2 id="ACPI-background"> <title>Background</title> <indexterm> <primary>ACPI</primary> </indexterm> <para><acronym>ACPI</acronym> is present in all modern computers that conform to the ia32 (x86), ia64 (Itanium), and amd64 (AMD) architectures. The full standard has many features including <acronym>CPU</acronym> performance management, power planes control, thermal zones, various battery systems, embedded controllers, and bus enumeration. Most systems implement less than the full standard. For instance, a desktop system usually only implements the bus enumeration parts while a laptop might have cooling and battery management support as well. Laptops also have suspend and resume, with their own associated complexity.</para> <para>An <acronym>ACPI</acronym>-compliant system has various components. The <acronym>BIOS</acronym> and chipset vendors provide various fixed tables (e.g., <acronym>FADT</acronym>) in memory that specify things like the <acronym>APIC</acronym> map (used for <acronym>SMP</acronym>), config registers, and simple configuration values. Additionally, a table of bytecode (the <firstterm>Differentiated System Description Table</firstterm> <acronym>DSDT</acronym>) is provided that specifies a tree-like name space of devices and methods.</para> <para>The <acronym>ACPI</acronym> driver must parse the fixed tables, implement an interpreter for the bytecode, and modify device drivers and the kernel to accept information from the <acronym>ACPI</acronym> subsystem. For &os;, &intel; has provided an interpreter (<acronym>ACPI-CA</acronym>) that is shared with Linux and NetBSD. The path to the <acronym>ACPI-CA</acronym> source code is <filename class="directory">src/sys/contrib/dev/acpica</filename>. The glue code that allows <acronym>ACPI-CA</acronym> to work on &os; is in <filename class="directory">src/sys/dev/acpica/Osd</filename>. Finally, drivers that implement various <acronym>ACPI</acronym> devices are found in <filename class="directory">src/sys/dev/acpica</filename>.</para> </sect2> <sect2 id="ACPI-comprob"> <title>Common Problems</title> <indexterm> <primary>ACPI</primary> <secondary>problems</secondary> </indexterm> <para>For <acronym>ACPI</acronym> to work correctly, all the parts have to work correctly. Here are some common problems, in order of frequency of appearance, and some possible workarounds or fixes.</para> <sect3> <title>Mouse Issues</title> <para>In some cases, resuming from a suspend operation will cause the mouse to fail. A known work around is to add <literal>hint.psm.0.flags="0x3000"</literal> to the <filename>/boot/loader.conf</filename> file. If this does not work then please consider sending a bug report as described above.</para> </sect3> <sect3> <title>Suspend/Resume</title> <para><acronym>ACPI</acronym> has three suspend to <acronym>RAM</acronym> (<acronym>STR</acronym>) states, <literal>S1</literal>-<literal>S3</literal>, and one suspend to disk state (<literal>STD</literal>), called <literal>S4</literal>. <literal>S5</literal> is <quote>soft off</quote> and is the normal state your system is in when plugged in but not powered up. <literal>S4</literal> can actually be implemented two separate ways. <literal>S4</literal><acronym>BIOS</acronym> is a <acronym>BIOS</acronym>-assisted suspend to disk. <literal>S4</literal><acronym>OS</acronym> is implemented entirely by the operating system.</para> <para>Start by checking <command>sysctl hw.acpi</command> for the suspend-related items. Here are the results for a Thinkpad:</para> <screen>hw.acpi.supported_sleep_state: S3 S4 S5 hw.acpi.s4bios: 0</screen> <para>This means that we can use <command>acpiconf -s</command> to test <literal>S3</literal>, <literal>S4</literal><acronym>OS</acronym>, and <literal>S5</literal>. If <option>s4bios</option> was one (<literal>1</literal>), we would have <literal>S4</literal><acronym>BIOS</acronym> support instead of <literal>S4</literal> <acronym>OS</acronym>.</para> <para>When testing suspend/resume, start with <literal>S1</literal>, if supported. This state is most likely to work since it does not require much driver support. No one has implemented <literal>S2</literal> but if you have it, it is similar to <literal>S1</literal>. The next thing to try is <literal>S3</literal>. This is the deepest <acronym>STR</acronym> state and requires a lot of driver support to properly reinitialize your hardware. If you have problems resuming, feel free to email the &a.acpi.name; list but do not expect the problem to be resolved since there are a lot of drivers/hardware that need more testing and work.</para> <para>A common problem with suspend/resume is that many device drivers do not save, restore, or reinitialize their firmware, registers, or device memory properly. As a first attempt at debugging the problem, try:</para> <screen>&prompt.root; <userinput>sysctl debug.bootverbose=1</userinput> &prompt.root; <userinput>sysctl debug.acpi.suspend_bounce=1</userinput> &prompt.root; <userinput>acpiconf -s 3</userinput></screen> <para>This test emulates suspend/resume cycle of all device drivers without actually going into <literal>S3</literal> state. In some cases, you can easily catch problems with this method (e.g., losing firmware state, device watchdog time out, and retrying forever). Note that the system will not really enter <literal>S3</literal> state, which means devices may not lose power, and many will work fine even if suspend/resume methods are totally missing, unlike real <literal>S3</literal> state.</para> <para>Harder cases require additional hardware, i.e., serial port/cable for serial console or Firewire port/cable for &man.dcons.4;, and kernel debugging skills.</para> <para>To help isolate the problem, remove as many drivers from your kernel as possible. If it works, you can narrow down which driver is the problem by loading drivers until it fails again. Typically binary drivers like <filename>nvidia.ko</filename>, X11 display drivers, and <acronym>USB</acronym> will have the most problems while Ethernet interfaces usually work fine. If you can properly load/unload the drivers, you can automate this by putting the appropriate commands in <filename>/etc/rc.suspend</filename> and <filename>/etc/rc.resume</filename>. There is a commented-out example for unloading and loading a driver. Try setting <option>hw.acpi.reset_video</option> to zero (<literal>0</literal>) if your display is messed up after resume. Try setting longer or shorter values for <option>hw.acpi.sleep_delay</option> to see if that helps.</para> <para>Another thing to try is load a recent Linux distribution with <acronym>ACPI</acronym> support and test their suspend/resume support on the same hardware. If it works on Linux, it is likely a &os; driver problem and narrowing down which driver causes the problems will help us fix the problem. Note that the <acronym>ACPI</acronym> maintainers do not usually maintain other drivers (e.g., sound, <acronym>ATA</acronym>, etc.) so any work done on tracking down a driver problem should probably eventually be posted to the &a.current.name; list and mailed to the driver maintainer. If you are feeling adventurous, go ahead and start putting some debugging &man.printf.3;s in a problematic driver to track down where in its resume function it hangs.</para> <para>Finally, try disabling <acronym>ACPI</acronym> and enabling <acronym>APM</acronym> instead. If suspend/resume works with <acronym>APM</acronym>, you may be better off sticking with <acronym>APM</acronym>, especially on older hardware (pre-2000). It took vendors a while to get <acronym>ACPI</acronym> support correct and older hardware is more likely to have <acronym>BIOS</acronym> problems with <acronym>ACPI</acronym>.</para> </sect3> <sect3> <title>System Hangs (Temporary or Permanent)</title> <para>Most system hangs are a result of lost interrupts or an interrupt storm. Chipsets have a lot of problems based on how the <acronym>BIOS</acronym> configures interrupts before boot, correctness of the <acronym>APIC</acronym> (<acronym>MADT</acronym>) table, and routing of the <firstterm>System Control Interrupt</firstterm> (<acronym>SCI</acronym>).</para> <indexterm> <primary>interrupt storms</primary> </indexterm> <para>Interrupt storms can be distinguished from lost interrupts by checking the output of <command>vmstat -i</command> and looking at the line that has <literal>acpi0</literal>. If the counter is increasing at more than a couple per second, you have an interrupt storm. If the system appears hung, try breaking to <acronym>DDB</acronym> (<keycombo action="simul"> <keycap>CTRL</keycap> <keycap>ALT</keycap> <keycap>ESC</keycap> </keycombo> on console) and type <literal>show interrupts</literal>.</para> <indexterm> <primary>APIC</primary> <secondary>disabling</secondary> </indexterm> <para>Your best hope when dealing with interrupt problems is to try disabling <acronym>APIC</acronym> support with <literal>hint.apic.0.disabled="1"</literal> in <filename>loader.conf</filename>.</para> </sect3> <sect3> <title>Panics</title> <para>Panics are relatively rare for <acronym>ACPI</acronym> and are the top priority to be fixed. The first step is to isolate the steps to reproduce the panic (if possible) and get a backtrace. Follow the advice for enabling <literal>options DDB</literal> and setting up a serial console (see <xref linkend="serialconsole-ddb"/>) or setting up a &man.dump.8; partition. You can get a backtrace in <acronym>DDB</acronym> with <literal>tr</literal>. If you have to handwrite the backtrace, be sure to at least get the lowest five (5) and top five (5) lines in the trace.</para> <para>Then, try to isolate the problem by booting with <acronym>ACPI</acronym> disabled. If that works, you can isolate the <acronym>ACPI</acronym> subsystem by using various values of <option>debug.acpi.disable</option>. See the &man.acpi.4; manual page for some examples.</para> </sect3> <sect3> <title>System Powers Up After Suspend or Shutdown</title> <para>First, try setting <literal>hw.acpi.disable_on_poweroff="0"</literal> in &man.loader.conf.5;. This keeps <acronym>ACPI</acronym> from disabling various events during the shutdown process. Some systems need this value set to <literal>1</literal> (the default) for the same reason. This usually fixes the problem of a system powering up spontaneously after a suspend or poweroff.</para> </sect3> <sect3> <title>Other Problems</title> <para>If you have other problems with <acronym>ACPI</acronym> (working with a docking station, devices not detected, etc.), please email a description to the mailing list as well; however, some of these issues may be related to unfinished parts of the <acronym>ACPI</acronym> subsystem so they might take a while to be implemented. Please be patient and prepared to test patches we may send you.</para> </sect3> </sect2> <sect2 id="ACPI-aslanddump"> <title><acronym>ASL</acronym>, <command>acpidump</command>, and <acronym>IASL</acronym></title> <indexterm> <primary>ACPI</primary> <secondary>ASL</secondary> </indexterm> <para>The most common problem is the <acronym>BIOS</acronym> vendors providing incorrect (or outright buggy!) bytecode. This is usually manifested by kernel console messages like this:</para> <screen>ACPI-1287: *** Error: Method execution failed [\\_SB_.PCI0.LPC0.FIGD._STA] \\ (Node 0xc3f6d160), AE_NOT_FOUND</screen> <para>Often, you can resolve these problems by updating your <acronym>BIOS</acronym> to the latest revision. Most console messages are harmless but if you have other problems like battery status not working, they are a good place to start looking for problems in the <acronym>AML</acronym>. The bytecode, known as <acronym>AML</acronym>, is compiled from a source language called <acronym>ASL</acronym>. The <acronym>AML</acronym> is found in the table known as the <acronym>DSDT</acronym>. To get a copy of your <acronym>ASL</acronym>, use &man.acpidump.8;. You should use both the <option>-t</option> (show contents of the fixed tables) and <option>-d</option> (disassemble <acronym>AML</acronym> to <acronym>ASL</acronym>) options. See the <link linkend="ACPI-submitdebug">Submitting Debugging Information</link> section for an example syntax.</para> <para>The simplest first check you can do is to recompile your <acronym>ASL</acronym> to check for errors. Warnings can usually be ignored but errors are bugs that will usually prevent <acronym>ACPI</acronym> from working correctly. To recompile your <acronym>ASL</acronym>, issue the following command:</para> <screen>&prompt.root; <userinput>iasl your.asl</userinput></screen> </sect2> <sect2 id="ACPI-fixasl"> <title>Fixing Your <acronym>ASL</acronym></title> <indexterm> <primary>ACPI</primary> <secondary>ASL</secondary> </indexterm> <para>In the long run, our goal is for almost everyone to have <acronym>ACPI</acronym> work without any user intervention. At this point, however, we are still developing workarounds for common mistakes made by the <acronym>BIOS</acronym> vendors. The µsoft; interpreter (<filename>acpi.sys</filename> and <filename>acpiec.sys</filename>) does not strictly check for adherence to the standard, and thus many <acronym>BIOS</acronym> vendors who only test <acronym>ACPI</acronym> under &windows; never fix their <acronym>ASL</acronym>. We hope to continue to identify and document exactly what non-standard behavior is allowed by µsoft;'s interpreter and replicate it so &os; can work without forcing users to fix the <acronym>ASL</acronym>. As a workaround and to help us identify behavior, you can fix the <acronym>ASL</acronym> manually. If this works for you, please send a &man.diff.1; of the old and new <acronym>ASL</acronym> so we can possibly work around the buggy behavior in <acronym>ACPI-CA</acronym> and thus make your fix unnecessary.</para> <indexterm> <primary>ACPI</primary> <secondary>error messages</secondary> </indexterm> <para>Here is a list of common error messages, their cause, and how to fix them:</para> <sect3> <title>_OS Dependencies</title> <para>Some <acronym>AML</acronym> assumes the world consists of various &windows; versions. You can tell &os; to claim it is any <acronym>OS</acronym> to see if this fixes problems you may have. An easy way to override this is to set <literal>hw.acpi.osname="Windows 2001"</literal> in <filename>/boot/loader.conf</filename> or other similar strings you find in the <acronym>ASL</acronym>.</para> </sect3> <sect3> <title>Missing Return Statements</title> <para>Some methods do not explicitly return a value as the standard requires. While <acronym>ACPI-CA</acronym> does not handle this, &os; has a workaround that allows it to return the value implicitly. You can also add explicit Return statements where required if you know what value should be returned. To force <command>iasl</command> to compile the <acronym>ASL</acronym>, use the <option>-f</option> flag.</para> </sect3> <sect3> <title>Overriding the Default <acronym>AML</acronym></title> <para>After you customize <filename>your.asl</filename>, you will want to compile it, run:</para> <screen>&prompt.root; <userinput>iasl your.asl</userinput></screen> <para>You can add the <option>-f</option> flag to force creation of the <acronym>AML</acronym>, even if there are errors during compilation. Remember that some errors (e.g., missing Return statements) are automatically worked around by the interpreter.</para> <para><filename>DSDT.aml</filename> is the default output filename for <command>iasl</command>. You can load this instead of your <acronym>BIOS</acronym>'s buggy copy (which is still present in flash memory) by editing <filename>/boot/loader.conf</filename> as follows:</para> <programlisting>acpi_dsdt_load="YES" acpi_dsdt_name="/boot/DSDT.aml"</programlisting> <para>Be sure to copy your <filename>DSDT.aml</filename> to the <filename class="directory">/boot</filename> directory.</para> </sect3> </sect2> <sect2 id="ACPI-debugoutput"> <title>Getting Debugging Output from <acronym>ACPI</acronym></title> <indexterm> <primary>ACPI</primary> <secondary>problems</secondary> </indexterm> <indexterm> <primary>ACPI</primary> <secondary>debugging</secondary> </indexterm> <para>The <acronym>ACPI</acronym> driver has a very flexible debugging facility. It allows you to specify a set of subsystems as well as the level of verbosity. The subsystems you wish to debug are specified as <quote>layers</quote> and are broken down into <acronym>ACPI-CA</acronym> components (ACPI_ALL_COMPONENTS) and <acronym>ACPI</acronym> hardware support (ACPI_ALL_DRIVERS). The verbosity of debugging output is specified as the <quote>level</quote> and ranges from ACPI_LV_ERROR (just report errors) to ACPI_LV_VERBOSE (everything). The <quote>level</quote> is a bitmask so multiple options can be set at once, separated by spaces. In practice, you will want to use a serial console to log the output if it is so long it flushes the console message buffer. A full list of the individual layers and levels is found in the &man.acpi.4; manual page.</para> <para>Debugging output is not enabled by default. To enable it, add <literal>options ACPI_DEBUG</literal> to your kernel configuration file if <acronym>ACPI</acronym> is compiled into the kernel. You can add <literal>ACPI_DEBUG=1</literal> to your <filename>/etc/make.conf</filename> to enable it globally. If it is a module, you can recompile just your <filename>acpi.ko</filename> module as follows:</para> <screen>&prompt.root; <userinput>cd /sys/modules/acpi/acpi && make clean && make ACPI_DEBUG=1</userinput></screen> <para>Install <filename>acpi.ko</filename> in <filename class="directory">/boot/kernel</filename> and add your desired level and layer to <filename>loader.conf</filename>. This example enables debug messages for all <acronym>ACPI-CA</acronym> components and all <acronym>ACPI</acronym> hardware drivers (<acronym>CPU</acronym>, <acronym>LID</acronym>, etc.). It will only output error messages, the least verbose level.</para> <programlisting>debug.acpi.layer="ACPI_ALL_COMPONENTS ACPI_ALL_DRIVERS" debug.acpi.level="ACPI_LV_ERROR"</programlisting> <para>If the information you want is triggered by a specific event (say, a suspend and then resume), you can leave out changes to <filename>loader.conf</filename> and instead use <command>sysctl</command> to specify the layer and level after booting and preparing your system for the specific event. The <command>sysctl</command>s are named the same as the tunables in <filename>loader.conf</filename>.</para> </sect2> <sect2 id="ACPI-References"> <title>References</title> <para>More information about <acronym>ACPI</acronym> may be found in the following locations:</para> <itemizedlist> <listitem> <para>The &a.acpi;</para> </listitem> <listitem> <para>The <acronym>ACPI</acronym> Mailing List Archives <ulink url="http://lists.freebsd.org/pipermail/freebsd-acpi/"></ulink></para> </listitem> <listitem> <para>The old <acronym>ACPI</acronym> Mailing List Archives <ulink url="http://home.jp.FreeBSD.org/mail-list/acpi-jp/"></ulink></para> </listitem> <listitem> <para>The <acronym>ACPI</acronym> 2.0 Specification <ulink url="http://acpi.info/spec.htm"></ulink></para> </listitem> <listitem> <para>&os; Manual pages: &man.acpi.4;, &man.acpi.thermal.4;, &man.acpidump.8;, &man.iasl.8;, &man.acpidb.8;</para> </listitem> <listitem> <para><ulink url="http://www.cpqlinux.com/acpi-howto.html#fix_broken_dsdt"> <acronym>DSDT</acronym> debugging resource</ulink>. (Uses Compaq as an example but generally useful.)</para> </listitem> </itemizedlist> </sect2> </sect1> </chapter>