s/FreeBSD/&os;/

Suggested by:   Benjamin Lukas (qavvap att googlemail dott com)
This commit is contained in:
Johann Kois 2010-06-17 09:19:58 +00:00
parent 3a6c136eee
commit 77f5bcecff
Notes: svn2git 2020-12-08 03:00:23 +00:00
svn path=/head/; revision=35896

View file

@ -8,7 +8,7 @@
<article>
<articleinfo>
<title>Design elements of the FreeBSD VM system</title>
<title>Design elements of the &os; VM system</title>
<authorgroup>
<author>
@ -36,7 +36,7 @@
<para>The title is really just a fancy way of saying that I am going to
attempt to describe the whole VM enchilada, hopefully in a way that
everyone can follow. For the last year I have concentrated on a number
of major kernel subsystems within FreeBSD, with the VM and Swap
of major kernel subsystems within &os;, with the VM and Swap
subsystems being the most interesting and NFS being <quote>a necessary
chore</quote>. I rewrote only small portions of the code. In the VM
arena the only major rewrite I have done is to the swap subsystem.
@ -53,7 +53,7 @@
<para>This article was originally published in the January 2000 issue of
<ulink url="http://www.daemonnews.org/">DaemonNews</ulink>. This
version of the article may include updates from Matt and other authors
to reflect changes in FreeBSD's VM implementation.</para>
to reflect changes in &os;'s VM implementation.</para>
</legalnotice>
</articleinfo>
@ -71,7 +71,7 @@
operating system by some people, those of us who work on it tend to view
it more as a <quote>mature</quote> codebase which has various components
modified, extended, or replaced with modern code. It has evolved, and
FreeBSD is at the bleeding edge no matter how old some of the code might
&os; is at the bleeding edge no matter how old some of the code might
be. This is an important distinction to make and one that is
unfortunately lost to many people. The biggest error a programmer can
make is to not learn from history, and this is precisely the error that
@ -89,13 +89,13 @@
right because our marketing department says so</quote>. I have little
tolerance for anyone who cannot learn from history.</para>
<para>Much of the apparent complexity of the FreeBSD design, especially in
<para>Much of the apparent complexity of the &os; design, especially in
the VM/Swap subsystem, is a direct result of having to solve serious
performance issues that occur under various conditions. These issues
are not due to bad algorithmic design but instead rise from
environmental factors. In any direct comparison between platforms,
these issues become most apparent when system resources begin to get
stressed. As I describe FreeBSD's VM/Swap subsystem the reader should
stressed. As I describe &os;'s VM/Swap subsystem the reader should
always keep two points in mind. First, the most important aspect of
performance design is what is known as <quote>Optimizing the Critical
Path</quote>. It is often the case that performance optimizations add a
@ -117,7 +117,7 @@
<sect1 id="vm-objects">
<title>VM Objects</title>
<para>The best way to begin describing the FreeBSD VM system is to look at
<para>The best way to begin describing the &os; VM system is to look at
it from the perspective of a user-level process. Each user process sees
a single, private, contiguous VM address space containing several types
of memory objects. These objects have various characteristics. Program
@ -157,7 +157,7 @@
(parent and child) expects their own personal post-fork modifications to
remain private to themselves and not effect the other.</para>
<para>FreeBSD manages all of this with a layered VM Object model. The
<para>&os; manages all of this with a layered VM Object model. The
original binary program file winds up being the lowest VM Object layer.
A copy-on-write layer is pushed on top of that to hold those pages which
had to be copied from the original file. If the program modifies a data
@ -235,7 +235,7 @@
The original page in B is now completely hidden since both C1 and C2
have a copy and B could theoretically be destroyed if it does not
represent a <quote>real</quote> file; however, this sort of optimization is not
trivial to make because it is so fine-grained. FreeBSD does not make
trivial to make because it is so fine-grained. &os; does not make
this optimization. Now, suppose (as is often the case) that the child
process does an <function>exec()</function>. Its current address space
is usually replaced by a new address space representing a new file. In
@ -274,7 +274,7 @@
get their own private copies of the page and the original page in B is
no longer accessible by anyone. That page in B can be freed.</para>
<para>FreeBSD solves the deep layering problem with a special optimization
<para>&os; solves the deep layering problem with a special optimization
called the <quote>All Shadowed Case</quote>. This case occurs if either
C1 or C2 take sufficient COW faults to completely shadow all pages in B.
Lets say that C1 achieves this. C1 can now bypass B entirely, so rather
@ -303,7 +303,7 @@
copying need take place. The disadvantage is that you can build a
relatively complex VM Object layering that slows page fault handling
down a little, and you spend memory managing the VM Object structures.
The optimizations FreeBSD makes proves to reduce the problems enough
The optimizations &os; makes proves to reduce the problems enough
that they can be ignored, leaving no real disadvantage.</para>
</sect1>
@ -315,12 +315,12 @@
backing object (usually a file) can no longer be used to save a copy of
the page when the VM system needs to reuse it for other purposes. This
is where SWAP comes in. SWAP is allocated to create backing store for
memory that does not otherwise have it. FreeBSD allocates the swap
memory that does not otherwise have it. &os; allocates the swap
management structure for a VM Object only when it is actually needed.
However, the swap management structure has had problems
historically.</para>
<para>Under FreeBSD 3.X the swap management structure preallocates an
<para>Under &os; 3.X the swap management structure preallocates an
array that encompasses the entire object requiring swap backing
store&mdash;even if only a few pages of that object are swap-backed.
This creates a kernel memory fragmentation problem when large objects
@ -337,7 +337,7 @@
fly for additional swap management structures when a swapout occurs. It
is evident that there was plenty of room for improvement.</para>
<para>For FreeBSD 4.X, I completely rewrote the swap subsystem. With this
<para>For &os; 4.X, I completely rewrote the swap subsystem. With this
rewrite, swap management structures are allocated through a hash table
rather than a linear array giving them a fixed allocation size and much
finer granularity. Rather then using a linearly linked list to keep
@ -373,7 +373,7 @@
hundreds of thousands of CPU cycles and a noticeable stall of the
affected processes, so we are willing to endure a significant amount of
overhead in order to be sure that the right page is chosen. This is why
FreeBSD tends to outperform other systems when memory resources become
&os; tends to outperform other systems when memory resources become
stressed.</para>
<para>The free page determination algorithm is built upon a history of the
@ -403,10 +403,10 @@
then have to go to disk.</para>
</sidebar>
<para>FreeBSD makes use of several page queues to further refine the
<para>&os; makes use of several page queues to further refine the
selection of pages to reuse as well as to determine when dirty pages
must be flushed to their backing store. Since page tables are dynamic
entities under FreeBSD, it costs virtually nothing to unmap a page from
entities under &os;, it costs virtually nothing to unmap a page from
the address space of any processes using it. When a page candidate has
been chosen based on the page-use counter, this is precisely what is
done. The system must make a distinction between clean pages which can
@ -423,7 +423,7 @@
in an LRU (least-recently used) fashion when the system needs to
allocate new memory.</para>
<para>It is important to note that the FreeBSD VM system attempts to
<para>It is important to note that the &os; VM system attempts to
separate clean and dirty pages for the express reason of avoiding
unnecessary flushes of dirty pages (which eats I/O bandwidth), nor does
it move pages between the various page queues gratuitously when the
@ -433,8 +433,8 @@
becomes more stressed, it makes a greater effort to maintain the various
page queues at the levels determined to be the most effective. An urban
myth has circulated for years that Linux did a better job avoiding
swapouts than FreeBSD, but this in fact is not true. What was actually
occurring was that FreeBSD was proactively paging out unused pages in
swapouts than &os;, but this in fact is not true. What was actually
occurring was that &os; was proactively paging out unused pages in
order to make room for more disk cache while Linux was keeping unused
pages in core and leaving less memory available for cache and process
pages. I do not know whether this is still true today.</para>
@ -451,9 +451,9 @@
not mapped into the page table, then all the pages that will be accessed
by the program will have to be faulted in every time the program is run.
This is unnecessary when the pages in question are already in the VM
Cache, so FreeBSD will attempt to pre-populate a process's page tables
Cache, so &os; will attempt to pre-populate a process's page tables
with those pages that are already in the VM Cache. One thing that
FreeBSD does not yet do is pre-copy-on-write certain pages on exec. For
&os; does not yet do is pre-copy-on-write certain pages on exec. For
example, if you run the &man.ls.1; program while running <command>vmstat
1</command> you will notice that it always takes a certain number of
page faults, even when you run it over and over again. These are
@ -480,7 +480,7 @@
<title>Page Table Optimizations</title>
<para>The page table optimizations make up the most contentious part of
the FreeBSD VM design and they have shown some strain with the advent of
the &os; VM design and they have shown some strain with the advent of
serious use of <function>mmap()</function>. I think this is actually a
feature of most BSDs though I am not sure when it was first introduced.
There are two major optimizations. The first is that hardware page
@ -488,23 +488,23 @@
any time with only a minor amount of management overhead. The second is
that every active page table entry in the system has a governing
<literal>pv_entry</literal> structure which is tied into the
<literal>vm_page</literal> structure. FreeBSD can simply iterate
<literal>vm_page</literal> structure. &os; can simply iterate
through those mappings that are known to exist while Linux must check
all page tables that <emphasis>might</emphasis> contain a specific
mapping to see if it does, which can achieve O(n^2) overhead in certain
situations. It is because of this that FreeBSD tends to make better
situations. It is because of this that &os; tends to make better
choices on which pages to reuse or swap when memory is stressed, giving
it better performance under load. However, FreeBSD requires kernel
it better performance under load. However, &os; requires kernel
tuning to accommodate large-shared-address-space situations such as
those that can occur in a news system because it may run out of
<literal>pv_entry</literal> structures.</para>
<para>Both Linux and FreeBSD need work in this area. FreeBSD is trying to
<para>Both Linux and &os; need work in this area. &os; is trying to
maximize the advantage of a potentially sparse active-mapping model (not
all processes need to map all pages of a shared library, for example),
whereas Linux is trying to simplify its algorithms. FreeBSD generally
whereas Linux is trying to simplify its algorithms. &os; generally
has the performance advantage here at the cost of wasting a little extra
memory, but FreeBSD breaks down in the case where a large file is
memory, but &os; breaks down in the case where a large file is
massively shared across hundreds of processes. Linux, on the other hand,
breaks down in the case where many processes are sparsely-mapping the
same shared library and also runs non-optimally when trying to determine
@ -530,7 +530,7 @@
even with multi-way set-associative caches (though the effect is
mitigated somewhat).</para>
<para>FreeBSD's memory allocation code implements page coloring
<para>&os;'s memory allocation code implements page coloring
optimizations, which means that the memory allocation code will attempt
to locate free pages that are contiguous from the point of view of the
cache. For example, if page 16 of physical memory is assigned to page 0
@ -554,7 +554,7 @@
modular and algorithmic approach that BSD has historically taken allows
us to study and understand the current implementation as well as
relatively cleanly replace large sections of the code. There have been a
number of improvements to the FreeBSD VM system in the last several
number of improvements to the &os; VM system in the last several
years, and work is ongoing.</para>
</sect1>
@ -566,23 +566,23 @@
<qandaentry>
<question>
<para>What is <quote>the interleaving algorithm</quote> that you
refer to in your listing of the ills of the FreeBSD 3.X swap
refer to in your listing of the ills of the &os; 3.X swap
arrangements?</para>
</question>
<answer>
<para>FreeBSD uses a fixed swap interleave which defaults to 4. This
means that FreeBSD reserves space for four swap areas even if you
<para>&os; uses a fixed swap interleave which defaults to 4. This
means that &os; reserves space for four swap areas even if you
only have one, two, or three. Since swap is interleaved the linear
address space representing the <quote>four swap areas</quote> will be
fragmented if you do not actually have four swap areas. For
example, if you have two swap areas A and B FreeBSD's address
example, if you have two swap areas A and B &os;'s address
space representation for that swap area will be interleaved in
blocks of 16 pages:</para>
<literallayout>A B C D A B C D A B C D A B C D</literallayout>
<para>FreeBSD 3.X uses a <quote>sequential list of free
<para>&os; 3.X uses a <quote>sequential list of free
regions</quote> approach to accounting for the free swap areas.
The idea is that large blocks of free linear space can be
represented with a single list node
@ -626,7 +626,7 @@
<para>I do not get the following:</para>
<blockquote>
<para>It is important to note that the FreeBSD VM system attempts
<para>It is important to note that the &os; VM system attempts
to separate clean and dirty pages for the express reason of
avoiding unnecessary flushes of dirty pages (which eats I/O
bandwidth), nor does it move pages between the various page
@ -649,7 +649,7 @@
separate the pages but the reality is that if we are not in a
memory crunch, we do not really have to.</para>
<para>What this means is that FreeBSD will not try very hard to
<para>What this means is that &os; will not try very hard to
separate out dirty pages (inactive queue) from clean pages (cache
queue) when the system is not being stressed, nor will it try to
deactivate pages (active queue -> inactive queue) when the system
@ -663,14 +663,14 @@
would not some of the page faults be data page faults (COW from
executable file to private page)? I.e., I would expect the page
faults to be some zero-fill and some program data. Or are you
implying that FreeBSD does do pre-COW for the program data?</para>
implying that &os; does do pre-COW for the program data?</para>
</question>
<answer>
<para>A COW fault can be either zero-fill or program-data. The
mechanism is the same either way because the backing program-data
is almost certainly already in the cache. I am indeed lumping the
two together. FreeBSD does not pre-COW program data or zero-fill,
two together. &os; does not pre-COW program data or zero-fill,
but it <emphasis>does</emphasis> pre-map pages that exist in its
cache.</para>
</answer>
@ -685,7 +685,7 @@
McKusick, Bostic, Karel, Quarterman)? Specifically, what kind of
operation/reaction would require scanning the mappings?</para>
<para>How does Linux do in the case where FreeBSD breaks down
<para>How does Linux do in the case where &os; breaks down
(sharing a large file mapping over many processes)?</para>
</question>
@ -717,7 +717,7 @@
index into the page table for each of those 50 processes even if
only 10 of them have actually mapped the page. So Linux is
trading off the simplicity of its design against performance.
Many VM algorithms which are O(1) or (small N) under FreeBSD wind
Many VM algorithms which are O(1) or (small N) under &os; wind
up being O(N), O(N^2), or worse under Linux. Since the pte's
representing a particular page in an object tend to be at the same
offset in all the page tables they are mapped in, reducing the
@ -725,12 +725,12 @@
will often avoid blowing away the L1 cache line for that offset,
which can lead to better performance.</para>
<para>FreeBSD has added complexity (the <literal>pv_entry</literal>
<para>&os; has added complexity (the <literal>pv_entry</literal>
scheme) in order to increase performance (to limit page table
accesses to <emphasis>only</emphasis> those pte's that need to be
modified).</para>
<para>But FreeBSD has a scaling problem that Linux does not in that
<para>But &os; has a scaling problem that Linux does not in that
there are a limited number of <literal>pv_entry</literal>
structures and this causes problems when you have massive sharing
of data. In this case you may run out of
@ -744,10 +744,10 @@
<literal>pv_entry</literal> scheme: Linux uses
<quote>permanent</quote> page tables that are not throw away, but
does not need a <literal>pv_entry</literal> for each potentially
mapped pte. FreeBSD uses <quote>throw away</quote> page tables but
mapped pte. &os; uses <quote>throw away</quote> page tables but
adds in a <literal>pv_entry</literal> structure for each
actually-mapped pte. I think memory utilization winds up being
about the same, giving FreeBSD an algorithmic advantage with its
about the same, giving &os; an algorithmic advantage with its
ability to throw away page tables at will with very low
overhead.</para>
</answer>