Whitespace-only fixes, translators please ignore.

This commit is contained in:
Warren Block 2014-03-14 02:13:48 +00:00
parent 823ae79c06
commit 55fdd66f9b
Notes: svn2git 2020-12-08 03:00:23 +00:00
svn path=/head/; revision=44228

View file

@ -4,264 +4,295 @@
$FreeBSD$ $FreeBSD$
--> -->
<chapter xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0" xml:id="vm"> <chapter xmlns="http://docbook.org/ns/docbook"
<info><title>Virtual Memory System</title> xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0"
xml:id="vm">
<info>
<title>Virtual Memory System</title>
<authorgroup> <authorgroup>
<author><personname><firstname>Matthew</firstname><surname>Dillon</surname></personname><contrib>Contributed by </contrib></author> <author>
<personname>
<firstname>Matthew</firstname>
<surname>Dillon</surname>
</personname>
<contrib>Contributed by </contrib>
</author>
</authorgroup> </authorgroup>
</info> </info>
<sect1 xml:id="vm-physmem">
<title>Management of Physical
Memory&mdash;<literal>vm_page_t</literal></title>
<sect1 xml:id="vm-physmem"> <indexterm><primary>virtual memory</primary></indexterm>
<title>Management of Physical <indexterm><primary>physical memory</primary></indexterm>
Memory&mdash;<literal>vm_page_t</literal></title> <indexterm>
<primary><literal>vm_page_t</literal> structure</primary>
</indexterm>
<indexterm><primary>virtual memory</primary></indexterm> <para>Physical memory is managed on a page-by-page basis through
<indexterm><primary>physical memory</primary></indexterm> the <literal>vm_page_t</literal> structure. Pages of physical
<indexterm><primary><literal>vm_page_t</literal> structure</primary></indexterm> memory are categorized through the placement of their respective
<para>Physical memory is managed on a page-by-page basis through the <literal>vm_page_t</literal> structures on one of several paging
<literal>vm_page_t</literal> structure. Pages of physical memory are queues.</para>
categorized through the placement of their respective
<literal>vm_page_t</literal> structures on one of several paging
queues.</para>
<para>A page can be in a wired, active, inactive, cache, or free state. <para>A page can be in a wired, active, inactive, cache, or free
Except for the wired state, the page is typically placed in a doubly state. Except for the wired state, the page is typically placed
link list queue representing the state that it is in. Wired pages in a doubly link list queue representing the state that it is
are not placed on any queue.</para> in. Wired pages are not placed on any queue.</para>
<para>FreeBSD implements a more involved paging queue for cached and <para>FreeBSD implements a more involved paging queue for cached
free pages in order to implement page coloring. Each of these states and free pages in order to implement page coloring. Each of
involves multiple queues arranged according to the size of the these states involves multiple queues arranged according to the
processor's L1 and L2 caches. When a new page needs to be allocated, size of the processor's L1 and L2 caches. When a new page needs
FreeBSD attempts to obtain one that is reasonably well aligned from to be allocated, FreeBSD attempts to obtain one that is
the point of view of the L1 and L2 caches relative to the VM object reasonably well aligned from the point of view of the L1 and L2
the page is being allocated for.</para> caches relative to the VM object the page is being allocated
for.</para>
<para>Additionally, a page may be held with a reference count or locked <para>Additionally, a page may be held with a reference count or
with a busy count. The VM system also implements an <quote>ultimate locked with a busy count. The VM system also implements an
locked</quote> state for a page using the PG_BUSY bit in the page's <quote>ultimate locked</quote> state for a page using the
flags.</para> PG_BUSY bit in the page's flags.</para>
<para>In general terms, each of the paging queues operates in a LRU <para>In general terms, each of the paging queues operates in a
fashion. A page is typically placed in a wired or active state LRU fashion. A page is typically placed in a wired or active
initially. When wired, the page is usually associated with a page state initially. When wired, the page is usually associated
table somewhere. The VM system ages the page by scanning pages in a with a page table somewhere. The VM system ages the page by
more active paging queue (LRU) in order to move them to a less-active scanning pages in a more active paging queue (LRU) in order to
paging queue. Pages that get moved into the cache are still move them to a less-active paging queue. Pages that get moved
associated with a VM object but are candidates for immediate reuse. into the cache are still associated with a VM object but are
Pages in the free queue are truly free. FreeBSD attempts to minimize candidates for immediate reuse. Pages in the free queue are
the number of pages in the free queue, but a certain minimum number of truly free. FreeBSD attempts to minimize the number of pages in
truly free pages must be maintained in order to accommodate page the free queue, but a certain minimum number of truly free pages
allocation at interrupt time.</para> must be maintained in order to accommodate page allocation at
interrupt time.</para>
<para>If a process attempts to access a page that does not exist in its <para>If a process attempts to access a page that does not exist
page table but does exist in one of the paging queues (such as the in its page table but does exist in one of the paging queues
inactive or cache queues), a relatively inexpensive page reactivation (such as the inactive or cache queues), a relatively inexpensive
fault occurs which causes the page to be reactivated. If the page page reactivation fault occurs which causes the page to be
does not exist in system memory at all, the process must block while reactivated. If the page does not exist in system memory at
the page is brought in from disk.</para> all, the process must block while the page is brought in from
disk.</para>
<indexterm><primary>paging queues</primary></indexterm> <indexterm><primary>paging queues</primary></indexterm>
<para>FreeBSD dynamically tunes its paging queues and attempts to <para>FreeBSD dynamically tunes its paging queues and attempts to
maintain reasonable ratios of pages in the various queues as well as maintain reasonable ratios of pages in the various queues as
attempts to maintain a reasonable breakdown of clean versus dirty pages. well as attempts to maintain a reasonable breakdown of clean
The amount of rebalancing that occurs depends on the system's memory versus dirty pages. The amount of rebalancing that occurs
load. This rebalancing is implemented by the pageout daemon and depends on the system's memory load. This rebalancing is
involves laundering dirty pages (syncing them with their backing implemented by the pageout daemon and involves laundering dirty
store), noticing when pages are activity referenced (resetting their pages (syncing them with their backing store), noticing when
position in the LRU queues or moving them between queues), migrating pages are activity referenced (resetting their position in the
pages between queues when the queues are out of balance, and so forth. LRU queues or moving them between queues), migrating pages
FreeBSD's VM system is willing to take a reasonable number of between queues when the queues are out of balance, and so forth.
reactivation page faults to determine how active or how idle a page FreeBSD's VM system is willing to take a reasonable number of
actually is. This leads to better decisions being made as to when to reactivation page faults to determine how active or how idle a
launder or swap-out a page.</para> page actually is. This leads to better decisions being made as
</sect1> to when to launder or swap-out a page.</para>
</sect1>
<sect1 xml:id="vm-cache"> <sect1 xml:id="vm-cache">
<title>The Unified Buffer <title>The Unified Buffer
Cache&mdash;<literal>vm_object_t</literal></title> Cache&mdash;<literal>vm_object_t</literal></title>
<indexterm><primary>unified buffer cache</primary></indexterm> <indexterm><primary>unified buffer cache</primary></indexterm>
<indexterm><primary><literal>vm_object_t</literal> structure</primary></indexterm> <indexterm>
<primary><literal>vm_object_t</literal> structure</primary>
</indexterm>
<para>FreeBSD implements the idea of a generic <quote>VM object</quote>. <para>FreeBSD implements the idea of a generic
VM objects can be associated with backing store of various <quote>VM object</quote>. VM objects can be associated with
types&mdash;unbacked, swap-backed, physical device-backed, or backing store of various types&mdash;unbacked, swap-backed,
file-backed storage. Since the filesystem uses the same VM objects to physical device-backed, or file-backed storage. Since the
manage in-core data relating to files, the result is a unified buffer filesystem uses the same VM objects to manage in-core data
cache.</para> relating to files, the result is a unified buffer cache.</para>
<para>VM objects can be <emphasis>shadowed</emphasis>. That is, they <para>VM objects can be <emphasis>shadowed</emphasis>. That is,
can be stacked on top of each other. For example, you might have a they can be stacked on top of each other. For example, you
swap-backed VM object stacked on top of a file-backed VM object in might have a swap-backed VM object stacked on top of a
order to implement a MAP_PRIVATE mmap()ing. This stacking is also file-backed VM object in order to implement a MAP_PRIVATE
used to implement various sharing properties, including mmap()ing. This stacking is also used to implement various
copy-on-write, for forked address spaces.</para> sharing properties, including copy-on-write, for forked address
spaces.</para>
<para>It should be noted that a <literal>vm_page_t</literal> can only be <para>It should be noted that a <literal>vm_page_t</literal> can
associated with one VM object at a time. The VM object shadowing only be associated with one VM object at a time. The VM object
implements the perceived sharing of the same page across multiple shadowing implements the perceived sharing of the same page
instances.</para> across multiple instances.</para>
</sect1> </sect1>
<sect1 xml:id="vm-fileio"> <sect1 xml:id="vm-fileio">
<title>Filesystem I/O&mdash;<literal>struct buf</literal></title> <title>Filesystem I/O&mdash;<literal>struct buf</literal></title>
<indexterm><primary>vnode</primary></indexterm> <indexterm><primary>vnode</primary></indexterm>
<para>vnode-backed VM objects, such as file-backed objects, generally <para>vnode-backed VM objects, such as file-backed objects,
need to maintain their own clean/dirty info independent from the VM generally need to maintain their own clean/dirty info
system's idea of clean/dirty. For example, when the VM system decides independent from the VM system's idea of clean/dirty. For
to synchronize a physical page to its backing store, the VM system example, when the VM system decides to synchronize a physical
needs to mark the page clean before the page is actually written to page to its backing store, the VM system needs to mark the page
its backing store. Additionally, filesystems need to be able to map clean before the page is actually written to its backing store.
portions of a file or file metadata into KVM in order to operate on Additionally, filesystems need to be able to map portions of a
it.</para> file or file metadata into KVM in order to operate on it.</para>
<para>The entities used to manage this are known as filesystem buffers, <para>The entities used to manage this are known as filesystem
<literal>struct buf</literal>'s, or buffers, <literal>struct buf</literal>'s, or
<literal>bp</literal>'s. When a filesystem needs to operate on a <literal>bp</literal>'s. When a filesystem needs to operate on
portion of a VM object, it typically maps part of the object into a a portion of a VM object, it typically maps part of the object
struct buf and the maps the pages in the struct buf into KVM. In the into a struct buf and the maps the pages in the struct buf into
same manner, disk I/O is typically issued by mapping portions of KVM. In the same manner, disk I/O is typically issued by
objects into buffer structures and then issuing the I/O on the buffer mapping portions of objects into buffer structures and then
structures. The underlying vm_page_t's are typically busied for the issuing the I/O on the buffer structures. The underlying
duration of the I/O. Filesystem buffers also have their own notion of vm_page_t's are typically busied for the duration of the I/O.
being busy, which is useful to filesystem driver code which would Filesystem buffers also have their own notion of being busy,
rather operate on filesystem buffers instead of hard VM pages.</para> which is useful to filesystem driver code which would rather
operate on filesystem buffers instead of hard VM pages.</para>
<para>FreeBSD reserves a limited amount of KVM to hold mappings from <para>FreeBSD reserves a limited amount of KVM to hold mappings
struct bufs, but it should be made clear that this KVM is used solely from struct bufs, but it should be made clear that this KVM is
to hold mappings and does not limit the ability to cache data. used solely to hold mappings and does not limit the ability to
Physical data caching is strictly a function of cache data. Physical data caching is strictly a function of
<literal>vm_page_t</literal>'s, not filesystem buffers. However, <literal>vm_page_t</literal>'s, not filesystem buffers.
since filesystem buffers are used to placehold I/O, they do inherently However, since filesystem buffers are used to placehold I/O,
limit the amount of concurrent I/O possible. However, as there are usually a they do inherently limit the amount of concurrent I/O possible.
few thousand filesystem buffers available, this is not usually a However, as there are usually a few thousand filesystem buffers
problem.</para> available, this is not usually a problem.</para>
</sect1> </sect1>
<sect1 xml:id="vm-pagetables"> <sect1 xml:id="vm-pagetables">
<title>Mapping Page Tables&mdash;<literal>vm_map_t, vm_entry_t</literal></title> <title>Mapping Page Tables&mdash;<literal>vm_map_t,
vm_entry_t</literal></title>
<indexterm><primary>page tables</primary></indexterm> <indexterm><primary>page tables</primary></indexterm>
<para>FreeBSD separates the physical page table topology from the VM <para>FreeBSD separates the physical page table topology from the
system. All hard per-process page tables can be reconstructed on the VM system. All hard per-process page tables can be
fly and are usually considered throwaway. Special page tables such as reconstructed on the fly and are usually considered throwaway.
those managing KVM are typically permanently preallocated. These page Special page tables such as those managing KVM are typically
tables are not throwaway.</para> permanently preallocated. These page tables are not
throwaway.</para>
<para>FreeBSD associates portions of vm_objects with address ranges in <para>FreeBSD associates portions of vm_objects with address
virtual memory through <literal>vm_map_t</literal> and ranges in virtual memory through <literal>vm_map_t</literal> and
<literal>vm_entry_t</literal> structures. Page tables are directly <literal>vm_entry_t</literal> structures. Page tables are
synthesized from the directly synthesized from the
<literal>vm_map_t</literal>/<literal>vm_entry_t</literal>/ <literal>vm_map_t</literal>/<literal>vm_entry_t</literal>/
<literal>vm_object_t</literal> hierarchy. Recall that I mentioned <literal>vm_object_t</literal> hierarchy. Recall that I
that physical pages are only directly associated with a mentioned that physical pages are only directly associated with
<literal>vm_object</literal>; that is not quite true. a <literal>vm_object</literal>; that is not quite true.
<literal>vm_page_t</literal>'s are also linked into page tables that <literal>vm_page_t</literal>'s are also linked into page tables
they are actively associated with. One <literal>vm_page_t</literal> that they are actively associated with. One
can be linked into several <emphasis>pmaps</emphasis>, as page tables <literal>vm_page_t</literal> can be linked into several
are called. However, the hierarchical association holds, so all <emphasis>pmaps</emphasis>, as page tables are called. However,
references to the same page in the same object reference the same the hierarchical association holds, so all references to the
<literal>vm_page_t</literal> and thus give us buffer cache unification same page in the same object reference the same
across the board.</para> <literal>vm_page_t</literal> and thus give us buffer cache
</sect1> unification across the board.</para>
</sect1>
<sect1 xml:id="vm-kvm"> <sect1 xml:id="vm-kvm">
<title>KVM Memory Mapping</title> <title>KVM Memory Mapping</title>
<para>FreeBSD uses KVM to hold various kernel structures. The single <para>FreeBSD uses KVM to hold various kernel structures. The
largest entity held in KVM is the filesystem buffer cache. That is, single largest entity held in KVM is the filesystem buffer
mappings relating to <literal>struct buf</literal> entities.</para> cache. That is, mappings relating to
<literal>struct buf</literal> entities.</para>
<para>Unlike Linux, FreeBSD does <emphasis>not</emphasis> map all of physical memory into <para>Unlike Linux, FreeBSD does <emphasis>not</emphasis> map all
KVM. This means that FreeBSD can handle memory configurations up to of physical memory into KVM. This means that FreeBSD can handle
4G on 32 bit platforms. In fact, if the mmu were capable of it, memory configurations up to 4G on 32 bit platforms. In fact, if
FreeBSD could theoretically handle memory configurations up to 8TB on the mmu were capable of it, FreeBSD could theoretically handle
a 32 bit platform. However, since most 32 bit platforms are only memory configurations up to 8TB on a 32 bit platform. However,
capable of mapping 4GB of ram, this is a moot point.</para> since most 32 bit platforms are only capable of mapping 4GB of
ram, this is a moot point.</para>
<para>KVM is managed through several mechanisms. The main mechanism <para>KVM is managed through several mechanisms. The main
used to manage KVM is the <emphasis>zone allocator</emphasis>. The mechanism used to manage KVM is the
zone allocator takes a chunk of KVM and splits it up into <emphasis>zone allocator</emphasis>. The zone allocator takes a
constant-sized blocks of memory in order to allocate a specific type chunk of KVM and splits it up into constant-sized blocks of
of structure. You can use <command>vmstat -m</command> to get an memory in order to allocate a specific type of structure. You
overview of current KVM utilization broken down by zone.</para> can use <command>vmstat -m</command> to get an overview of
</sect1> current KVM utilization broken down by zone.</para>
</sect1>
<sect1 xml:id="vm-tuning"> <sect1 xml:id="vm-tuning">
<title>Tuning the FreeBSD VM System</title> <title>Tuning the FreeBSD VM System</title>
<para>A concerted effort has been made to make the FreeBSD kernel <para>A concerted effort has been made to make the FreeBSD kernel
dynamically tune itself. Typically you do not need to mess with dynamically tune itself. Typically you do not need to mess with
anything beyond the <option>maxusers</option> and anything beyond the <option>maxusers</option> and
<option>NMBCLUSTERS</option> kernel config options. That is, kernel <option>NMBCLUSTERS</option> kernel config options. That is,
compilation options specified in (typically) kernel compilation options specified in (typically)
<filename>/usr/src/sys/i386/conf/<replaceable>CONFIG_FILE</replaceable></filename>. <filename>/usr/src/sys/i386/conf/<replaceable>CONFIG_FILE</replaceable></filename>.
A description of all available kernel configuration options can be A description of all available kernel configuration options can
found in <filename>/usr/src/sys/i386/conf/LINT</filename>.</para> be found in
<filename>/usr/src/sys/i386/conf/LINT</filename>.</para>
<para>In a large system configuration you may wish to increase <para>In a large system configuration you may wish to increase
<option>maxusers</option>. Values typically range from 10 to 128. <option>maxusers</option>. Values typically range from 10 to
Note that raising <option>maxusers</option> too high can cause the 128. Note that raising <option>maxusers</option> too high can
system to overflow available KVM resulting in unpredictable operation. cause the system to overflow available KVM resulting in
It is better to leave <option>maxusers</option> at some reasonable number and add other unpredictable operation. It is better to leave
options, such as <option>NMBCLUSTERS</option>, to increase specific <option>maxusers</option> at some reasonable number and add
resources.</para> other options, such as <option>NMBCLUSTERS</option>, to increase
specific resources.</para>
<para>If your system is going to use the network heavily, you may want <para>If your system is going to use the network heavily, you may
to increase <option>NMBCLUSTERS</option>. Typical values range from want to increase <option>NMBCLUSTERS</option>. Typical values
1024 to 4096.</para> range from 1024 to 4096.</para>
<para>The <literal>NBUF</literal> parameter is also traditionally used <para>The <literal>NBUF</literal> parameter is also traditionally
to scale the system. This parameter determines the amount of KVA the used to scale the system. This parameter determines the amount
system can use to map filesystem buffers for I/O. Note that this of KVA the system can use to map filesystem buffers for I/O.
parameter has nothing whatsoever to do with the unified buffer cache! Note that this parameter has nothing whatsoever to do with the
This parameter is dynamically tuned in 3.0-CURRENT and later kernels unified buffer cache! This parameter is dynamically tuned in
and should generally not be adjusted manually. We recommend that you 3.0-CURRENT and later kernels and should generally not be
<emphasis>not</emphasis> try to specify an <literal>NBUF</literal> adjusted manually. We recommend that you
parameter. Let the system pick it. Too small a value can result in <emphasis>not</emphasis> try to specify an
extremely inefficient filesystem operation while too large a value can <literal>NBUF</literal> parameter. Let the system pick it. Too
starve the page queues by causing too many pages to become wired small a value can result in extremely inefficient filesystem
down.</para> operation while too large a value can starve the page queues by
causing too many pages to become wired down.</para>
<para>By default, FreeBSD kernels are not optimized. You can set <para>By default, FreeBSD kernels are not optimized. You can set
debugging and optimization flags with the debugging and optimization flags with the
<literal>makeoptions</literal> directive in the kernel configuration. <literal>makeoptions</literal> directive in the kernel
Note that you should not use <option>-g</option> unless you can configuration. Note that you should not use <option>-g</option>
accommodate the large (typically 7 MB+) kernels that result.</para> unless you can accommodate the large (typically 7 MB+) kernels
that result.</para>
<programlisting>makeoptions DEBUG="-g" <programlisting>makeoptions DEBUG="-g"
makeoptions COPTFLAGS="-O -pipe"</programlisting> makeoptions COPTFLAGS="-O -pipe"</programlisting>
<para>Sysctl provides a way to tune kernel parameters at run-time. You <para>Sysctl provides a way to tune kernel parameters at run-time.
typically do not need to mess with any of the sysctl variables, You typically do not need to mess with any of the sysctl
especially the VM related ones.</para> variables, especially the VM related ones.</para>
<para>Run time VM and system tuning is relatively straightforward. <para>Run time VM and system tuning is relatively straightforward.
First, use Soft Updates on your UFS/FFS filesystems whenever possible. First, use Soft Updates on your UFS/FFS filesystems whenever
<filename>/usr/src/sys/ufs/ffs/README.softupdates</filename> contains possible.
instructions (and restrictions) on how to configure it.</para> <filename>/usr/src/sys/ufs/ffs/README.softupdates</filename>
contains instructions (and restrictions) on how to configure
it.</para>
<indexterm><primary>swap partition</primary></indexterm> <indexterm><primary>swap partition</primary></indexterm>
<para>Second, configure sufficient swap. You should have a swap <para>Second, configure sufficient swap. You should have a swap
partition configured on each physical disk, up to four, even on your partition configured on each physical disk, up to four, even on
<quote>work</quote> disks. You should have at least 2x the swap space your <quote>work</quote> disks. You should have at least 2x the
as you have main memory, and possibly even more if you do not have a swap space as you have main memory, and possibly even more if
lot of memory. You should also size your swap partition based on the you do not have a lot of memory. You should also size your swap
maximum memory configuration you ever intend to put on the machine so partition based on the maximum memory configuration you ever
you do not have to repartition your disks later on. If you want to be intend to put on the machine so you do not have to repartition
able to accommodate a crash dump, your first swap partition must be at your disks later on. If you want to be able to accommodate a
least as large as main memory and <filename>/var/crash</filename> must crash dump, your first swap partition must be at least as large
have sufficient free space to hold the dump.</para> as main memory and <filename>/var/crash</filename> must have
sufficient free space to hold the dump.</para>
<para>NFS-based swap is perfectly acceptable on 4.X or later systems,
but you must be aware that the NFS server will take the brunt of the
paging load.</para>
</sect1>
<para>NFS-based swap is perfectly acceptable on 4.X or later
systems, but you must be aware that the NFS server will take the
brunt of the paging load.</para>
</sect1>
</chapter> </chapter>