s/FreeBSD/&os;/
Suggested by: Benjamin Lukas (qavvap att googlemail dott com)
This commit is contained in:
parent
3a6c136eee
commit
77f5bcecff
Notes:
svn2git
2020-12-08 03:00:23 +00:00
svn path=/head/; revision=35896
1 changed files with 46 additions and 46 deletions
|
@ -8,7 +8,7 @@
|
||||||
|
|
||||||
<article>
|
<article>
|
||||||
<articleinfo>
|
<articleinfo>
|
||||||
<title>Design elements of the FreeBSD VM system</title>
|
<title>Design elements of the &os; VM system</title>
|
||||||
|
|
||||||
<authorgroup>
|
<authorgroup>
|
||||||
<author>
|
<author>
|
||||||
|
@ -36,7 +36,7 @@
|
||||||
<para>The title is really just a fancy way of saying that I am going to
|
<para>The title is really just a fancy way of saying that I am going to
|
||||||
attempt to describe the whole VM enchilada, hopefully in a way that
|
attempt to describe the whole VM enchilada, hopefully in a way that
|
||||||
everyone can follow. For the last year I have concentrated on a number
|
everyone can follow. For the last year I have concentrated on a number
|
||||||
of major kernel subsystems within FreeBSD, with the VM and Swap
|
of major kernel subsystems within &os;, with the VM and Swap
|
||||||
subsystems being the most interesting and NFS being <quote>a necessary
|
subsystems being the most interesting and NFS being <quote>a necessary
|
||||||
chore</quote>. I rewrote only small portions of the code. In the VM
|
chore</quote>. I rewrote only small portions of the code. In the VM
|
||||||
arena the only major rewrite I have done is to the swap subsystem.
|
arena the only major rewrite I have done is to the swap subsystem.
|
||||||
|
@ -53,7 +53,7 @@
|
||||||
<para>This article was originally published in the January 2000 issue of
|
<para>This article was originally published in the January 2000 issue of
|
||||||
<ulink url="http://www.daemonnews.org/">DaemonNews</ulink>. This
|
<ulink url="http://www.daemonnews.org/">DaemonNews</ulink>. This
|
||||||
version of the article may include updates from Matt and other authors
|
version of the article may include updates from Matt and other authors
|
||||||
to reflect changes in FreeBSD's VM implementation.</para>
|
to reflect changes in &os;'s VM implementation.</para>
|
||||||
</legalnotice>
|
</legalnotice>
|
||||||
</articleinfo>
|
</articleinfo>
|
||||||
|
|
||||||
|
@ -71,7 +71,7 @@
|
||||||
operating system by some people, those of us who work on it tend to view
|
operating system by some people, those of us who work on it tend to view
|
||||||
it more as a <quote>mature</quote> codebase which has various components
|
it more as a <quote>mature</quote> codebase which has various components
|
||||||
modified, extended, or replaced with modern code. It has evolved, and
|
modified, extended, or replaced with modern code. It has evolved, and
|
||||||
FreeBSD is at the bleeding edge no matter how old some of the code might
|
&os; is at the bleeding edge no matter how old some of the code might
|
||||||
be. This is an important distinction to make and one that is
|
be. This is an important distinction to make and one that is
|
||||||
unfortunately lost to many people. The biggest error a programmer can
|
unfortunately lost to many people. The biggest error a programmer can
|
||||||
make is to not learn from history, and this is precisely the error that
|
make is to not learn from history, and this is precisely the error that
|
||||||
|
@ -89,13 +89,13 @@
|
||||||
right because our marketing department says so</quote>. I have little
|
right because our marketing department says so</quote>. I have little
|
||||||
tolerance for anyone who cannot learn from history.</para>
|
tolerance for anyone who cannot learn from history.</para>
|
||||||
|
|
||||||
<para>Much of the apparent complexity of the FreeBSD design, especially in
|
<para>Much of the apparent complexity of the &os; design, especially in
|
||||||
the VM/Swap subsystem, is a direct result of having to solve serious
|
the VM/Swap subsystem, is a direct result of having to solve serious
|
||||||
performance issues that occur under various conditions. These issues
|
performance issues that occur under various conditions. These issues
|
||||||
are not due to bad algorithmic design but instead rise from
|
are not due to bad algorithmic design but instead rise from
|
||||||
environmental factors. In any direct comparison between platforms,
|
environmental factors. In any direct comparison between platforms,
|
||||||
these issues become most apparent when system resources begin to get
|
these issues become most apparent when system resources begin to get
|
||||||
stressed. As I describe FreeBSD's VM/Swap subsystem the reader should
|
stressed. As I describe &os;'s VM/Swap subsystem the reader should
|
||||||
always keep two points in mind. First, the most important aspect of
|
always keep two points in mind. First, the most important aspect of
|
||||||
performance design is what is known as <quote>Optimizing the Critical
|
performance design is what is known as <quote>Optimizing the Critical
|
||||||
Path</quote>. It is often the case that performance optimizations add a
|
Path</quote>. It is often the case that performance optimizations add a
|
||||||
|
@ -117,7 +117,7 @@
|
||||||
<sect1 id="vm-objects">
|
<sect1 id="vm-objects">
|
||||||
<title>VM Objects</title>
|
<title>VM Objects</title>
|
||||||
|
|
||||||
<para>The best way to begin describing the FreeBSD VM system is to look at
|
<para>The best way to begin describing the &os; VM system is to look at
|
||||||
it from the perspective of a user-level process. Each user process sees
|
it from the perspective of a user-level process. Each user process sees
|
||||||
a single, private, contiguous VM address space containing several types
|
a single, private, contiguous VM address space containing several types
|
||||||
of memory objects. These objects have various characteristics. Program
|
of memory objects. These objects have various characteristics. Program
|
||||||
|
@ -157,7 +157,7 @@
|
||||||
(parent and child) expects their own personal post-fork modifications to
|
(parent and child) expects their own personal post-fork modifications to
|
||||||
remain private to themselves and not effect the other.</para>
|
remain private to themselves and not effect the other.</para>
|
||||||
|
|
||||||
<para>FreeBSD manages all of this with a layered VM Object model. The
|
<para>&os; manages all of this with a layered VM Object model. The
|
||||||
original binary program file winds up being the lowest VM Object layer.
|
original binary program file winds up being the lowest VM Object layer.
|
||||||
A copy-on-write layer is pushed on top of that to hold those pages which
|
A copy-on-write layer is pushed on top of that to hold those pages which
|
||||||
had to be copied from the original file. If the program modifies a data
|
had to be copied from the original file. If the program modifies a data
|
||||||
|
@ -235,7 +235,7 @@
|
||||||
The original page in B is now completely hidden since both C1 and C2
|
The original page in B is now completely hidden since both C1 and C2
|
||||||
have a copy and B could theoretically be destroyed if it does not
|
have a copy and B could theoretically be destroyed if it does not
|
||||||
represent a <quote>real</quote> file; however, this sort of optimization is not
|
represent a <quote>real</quote> file; however, this sort of optimization is not
|
||||||
trivial to make because it is so fine-grained. FreeBSD does not make
|
trivial to make because it is so fine-grained. &os; does not make
|
||||||
this optimization. Now, suppose (as is often the case) that the child
|
this optimization. Now, suppose (as is often the case) that the child
|
||||||
process does an <function>exec()</function>. Its current address space
|
process does an <function>exec()</function>. Its current address space
|
||||||
is usually replaced by a new address space representing a new file. In
|
is usually replaced by a new address space representing a new file. In
|
||||||
|
@ -274,7 +274,7 @@
|
||||||
get their own private copies of the page and the original page in B is
|
get their own private copies of the page and the original page in B is
|
||||||
no longer accessible by anyone. That page in B can be freed.</para>
|
no longer accessible by anyone. That page in B can be freed.</para>
|
||||||
|
|
||||||
<para>FreeBSD solves the deep layering problem with a special optimization
|
<para>&os; solves the deep layering problem with a special optimization
|
||||||
called the <quote>All Shadowed Case</quote>. This case occurs if either
|
called the <quote>All Shadowed Case</quote>. This case occurs if either
|
||||||
C1 or C2 take sufficient COW faults to completely shadow all pages in B.
|
C1 or C2 take sufficient COW faults to completely shadow all pages in B.
|
||||||
Lets say that C1 achieves this. C1 can now bypass B entirely, so rather
|
Lets say that C1 achieves this. C1 can now bypass B entirely, so rather
|
||||||
|
@ -303,7 +303,7 @@
|
||||||
copying need take place. The disadvantage is that you can build a
|
copying need take place. The disadvantage is that you can build a
|
||||||
relatively complex VM Object layering that slows page fault handling
|
relatively complex VM Object layering that slows page fault handling
|
||||||
down a little, and you spend memory managing the VM Object structures.
|
down a little, and you spend memory managing the VM Object structures.
|
||||||
The optimizations FreeBSD makes proves to reduce the problems enough
|
The optimizations &os; makes proves to reduce the problems enough
|
||||||
that they can be ignored, leaving no real disadvantage.</para>
|
that they can be ignored, leaving no real disadvantage.</para>
|
||||||
</sect1>
|
</sect1>
|
||||||
|
|
||||||
|
@ -315,12 +315,12 @@
|
||||||
backing object (usually a file) can no longer be used to save a copy of
|
backing object (usually a file) can no longer be used to save a copy of
|
||||||
the page when the VM system needs to reuse it for other purposes. This
|
the page when the VM system needs to reuse it for other purposes. This
|
||||||
is where SWAP comes in. SWAP is allocated to create backing store for
|
is where SWAP comes in. SWAP is allocated to create backing store for
|
||||||
memory that does not otherwise have it. FreeBSD allocates the swap
|
memory that does not otherwise have it. &os; allocates the swap
|
||||||
management structure for a VM Object only when it is actually needed.
|
management structure for a VM Object only when it is actually needed.
|
||||||
However, the swap management structure has had problems
|
However, the swap management structure has had problems
|
||||||
historically.</para>
|
historically.</para>
|
||||||
|
|
||||||
<para>Under FreeBSD 3.X the swap management structure preallocates an
|
<para>Under &os; 3.X the swap management structure preallocates an
|
||||||
array that encompasses the entire object requiring swap backing
|
array that encompasses the entire object requiring swap backing
|
||||||
store—even if only a few pages of that object are swap-backed.
|
store—even if only a few pages of that object are swap-backed.
|
||||||
This creates a kernel memory fragmentation problem when large objects
|
This creates a kernel memory fragmentation problem when large objects
|
||||||
|
@ -337,7 +337,7 @@
|
||||||
fly for additional swap management structures when a swapout occurs. It
|
fly for additional swap management structures when a swapout occurs. It
|
||||||
is evident that there was plenty of room for improvement.</para>
|
is evident that there was plenty of room for improvement.</para>
|
||||||
|
|
||||||
<para>For FreeBSD 4.X, I completely rewrote the swap subsystem. With this
|
<para>For &os; 4.X, I completely rewrote the swap subsystem. With this
|
||||||
rewrite, swap management structures are allocated through a hash table
|
rewrite, swap management structures are allocated through a hash table
|
||||||
rather than a linear array giving them a fixed allocation size and much
|
rather than a linear array giving them a fixed allocation size and much
|
||||||
finer granularity. Rather then using a linearly linked list to keep
|
finer granularity. Rather then using a linearly linked list to keep
|
||||||
|
@ -373,7 +373,7 @@
|
||||||
hundreds of thousands of CPU cycles and a noticeable stall of the
|
hundreds of thousands of CPU cycles and a noticeable stall of the
|
||||||
affected processes, so we are willing to endure a significant amount of
|
affected processes, so we are willing to endure a significant amount of
|
||||||
overhead in order to be sure that the right page is chosen. This is why
|
overhead in order to be sure that the right page is chosen. This is why
|
||||||
FreeBSD tends to outperform other systems when memory resources become
|
&os; tends to outperform other systems when memory resources become
|
||||||
stressed.</para>
|
stressed.</para>
|
||||||
|
|
||||||
<para>The free page determination algorithm is built upon a history of the
|
<para>The free page determination algorithm is built upon a history of the
|
||||||
|
@ -403,10 +403,10 @@
|
||||||
then have to go to disk.</para>
|
then have to go to disk.</para>
|
||||||
</sidebar>
|
</sidebar>
|
||||||
|
|
||||||
<para>FreeBSD makes use of several page queues to further refine the
|
<para>&os; makes use of several page queues to further refine the
|
||||||
selection of pages to reuse as well as to determine when dirty pages
|
selection of pages to reuse as well as to determine when dirty pages
|
||||||
must be flushed to their backing store. Since page tables are dynamic
|
must be flushed to their backing store. Since page tables are dynamic
|
||||||
entities under FreeBSD, it costs virtually nothing to unmap a page from
|
entities under &os;, it costs virtually nothing to unmap a page from
|
||||||
the address space of any processes using it. When a page candidate has
|
the address space of any processes using it. When a page candidate has
|
||||||
been chosen based on the page-use counter, this is precisely what is
|
been chosen based on the page-use counter, this is precisely what is
|
||||||
done. The system must make a distinction between clean pages which can
|
done. The system must make a distinction between clean pages which can
|
||||||
|
@ -423,7 +423,7 @@
|
||||||
in an LRU (least-recently used) fashion when the system needs to
|
in an LRU (least-recently used) fashion when the system needs to
|
||||||
allocate new memory.</para>
|
allocate new memory.</para>
|
||||||
|
|
||||||
<para>It is important to note that the FreeBSD VM system attempts to
|
<para>It is important to note that the &os; VM system attempts to
|
||||||
separate clean and dirty pages for the express reason of avoiding
|
separate clean and dirty pages for the express reason of avoiding
|
||||||
unnecessary flushes of dirty pages (which eats I/O bandwidth), nor does
|
unnecessary flushes of dirty pages (which eats I/O bandwidth), nor does
|
||||||
it move pages between the various page queues gratuitously when the
|
it move pages between the various page queues gratuitously when the
|
||||||
|
@ -433,8 +433,8 @@
|
||||||
becomes more stressed, it makes a greater effort to maintain the various
|
becomes more stressed, it makes a greater effort to maintain the various
|
||||||
page queues at the levels determined to be the most effective. An urban
|
page queues at the levels determined to be the most effective. An urban
|
||||||
myth has circulated for years that Linux did a better job avoiding
|
myth has circulated for years that Linux did a better job avoiding
|
||||||
swapouts than FreeBSD, but this in fact is not true. What was actually
|
swapouts than &os;, but this in fact is not true. What was actually
|
||||||
occurring was that FreeBSD was proactively paging out unused pages in
|
occurring was that &os; was proactively paging out unused pages in
|
||||||
order to make room for more disk cache while Linux was keeping unused
|
order to make room for more disk cache while Linux was keeping unused
|
||||||
pages in core and leaving less memory available for cache and process
|
pages in core and leaving less memory available for cache and process
|
||||||
pages. I do not know whether this is still true today.</para>
|
pages. I do not know whether this is still true today.</para>
|
||||||
|
@ -451,9 +451,9 @@
|
||||||
not mapped into the page table, then all the pages that will be accessed
|
not mapped into the page table, then all the pages that will be accessed
|
||||||
by the program will have to be faulted in every time the program is run.
|
by the program will have to be faulted in every time the program is run.
|
||||||
This is unnecessary when the pages in question are already in the VM
|
This is unnecessary when the pages in question are already in the VM
|
||||||
Cache, so FreeBSD will attempt to pre-populate a process's page tables
|
Cache, so &os; will attempt to pre-populate a process's page tables
|
||||||
with those pages that are already in the VM Cache. One thing that
|
with those pages that are already in the VM Cache. One thing that
|
||||||
FreeBSD does not yet do is pre-copy-on-write certain pages on exec. For
|
&os; does not yet do is pre-copy-on-write certain pages on exec. For
|
||||||
example, if you run the &man.ls.1; program while running <command>vmstat
|
example, if you run the &man.ls.1; program while running <command>vmstat
|
||||||
1</command> you will notice that it always takes a certain number of
|
1</command> you will notice that it always takes a certain number of
|
||||||
page faults, even when you run it over and over again. These are
|
page faults, even when you run it over and over again. These are
|
||||||
|
@ -480,7 +480,7 @@
|
||||||
<title>Page Table Optimizations</title>
|
<title>Page Table Optimizations</title>
|
||||||
|
|
||||||
<para>The page table optimizations make up the most contentious part of
|
<para>The page table optimizations make up the most contentious part of
|
||||||
the FreeBSD VM design and they have shown some strain with the advent of
|
the &os; VM design and they have shown some strain with the advent of
|
||||||
serious use of <function>mmap()</function>. I think this is actually a
|
serious use of <function>mmap()</function>. I think this is actually a
|
||||||
feature of most BSDs though I am not sure when it was first introduced.
|
feature of most BSDs though I am not sure when it was first introduced.
|
||||||
There are two major optimizations. The first is that hardware page
|
There are two major optimizations. The first is that hardware page
|
||||||
|
@ -488,23 +488,23 @@
|
||||||
any time with only a minor amount of management overhead. The second is
|
any time with only a minor amount of management overhead. The second is
|
||||||
that every active page table entry in the system has a governing
|
that every active page table entry in the system has a governing
|
||||||
<literal>pv_entry</literal> structure which is tied into the
|
<literal>pv_entry</literal> structure which is tied into the
|
||||||
<literal>vm_page</literal> structure. FreeBSD can simply iterate
|
<literal>vm_page</literal> structure. &os; can simply iterate
|
||||||
through those mappings that are known to exist while Linux must check
|
through those mappings that are known to exist while Linux must check
|
||||||
all page tables that <emphasis>might</emphasis> contain a specific
|
all page tables that <emphasis>might</emphasis> contain a specific
|
||||||
mapping to see if it does, which can achieve O(n^2) overhead in certain
|
mapping to see if it does, which can achieve O(n^2) overhead in certain
|
||||||
situations. It is because of this that FreeBSD tends to make better
|
situations. It is because of this that &os; tends to make better
|
||||||
choices on which pages to reuse or swap when memory is stressed, giving
|
choices on which pages to reuse or swap when memory is stressed, giving
|
||||||
it better performance under load. However, FreeBSD requires kernel
|
it better performance under load. However, &os; requires kernel
|
||||||
tuning to accommodate large-shared-address-space situations such as
|
tuning to accommodate large-shared-address-space situations such as
|
||||||
those that can occur in a news system because it may run out of
|
those that can occur in a news system because it may run out of
|
||||||
<literal>pv_entry</literal> structures.</para>
|
<literal>pv_entry</literal> structures.</para>
|
||||||
|
|
||||||
<para>Both Linux and FreeBSD need work in this area. FreeBSD is trying to
|
<para>Both Linux and &os; need work in this area. &os; is trying to
|
||||||
maximize the advantage of a potentially sparse active-mapping model (not
|
maximize the advantage of a potentially sparse active-mapping model (not
|
||||||
all processes need to map all pages of a shared library, for example),
|
all processes need to map all pages of a shared library, for example),
|
||||||
whereas Linux is trying to simplify its algorithms. FreeBSD generally
|
whereas Linux is trying to simplify its algorithms. &os; generally
|
||||||
has the performance advantage here at the cost of wasting a little extra
|
has the performance advantage here at the cost of wasting a little extra
|
||||||
memory, but FreeBSD breaks down in the case where a large file is
|
memory, but &os; breaks down in the case where a large file is
|
||||||
massively shared across hundreds of processes. Linux, on the other hand,
|
massively shared across hundreds of processes. Linux, on the other hand,
|
||||||
breaks down in the case where many processes are sparsely-mapping the
|
breaks down in the case where many processes are sparsely-mapping the
|
||||||
same shared library and also runs non-optimally when trying to determine
|
same shared library and also runs non-optimally when trying to determine
|
||||||
|
@ -530,7 +530,7 @@
|
||||||
even with multi-way set-associative caches (though the effect is
|
even with multi-way set-associative caches (though the effect is
|
||||||
mitigated somewhat).</para>
|
mitigated somewhat).</para>
|
||||||
|
|
||||||
<para>FreeBSD's memory allocation code implements page coloring
|
<para>&os;'s memory allocation code implements page coloring
|
||||||
optimizations, which means that the memory allocation code will attempt
|
optimizations, which means that the memory allocation code will attempt
|
||||||
to locate free pages that are contiguous from the point of view of the
|
to locate free pages that are contiguous from the point of view of the
|
||||||
cache. For example, if page 16 of physical memory is assigned to page 0
|
cache. For example, if page 16 of physical memory is assigned to page 0
|
||||||
|
@ -554,7 +554,7 @@
|
||||||
modular and algorithmic approach that BSD has historically taken allows
|
modular and algorithmic approach that BSD has historically taken allows
|
||||||
us to study and understand the current implementation as well as
|
us to study and understand the current implementation as well as
|
||||||
relatively cleanly replace large sections of the code. There have been a
|
relatively cleanly replace large sections of the code. There have been a
|
||||||
number of improvements to the FreeBSD VM system in the last several
|
number of improvements to the &os; VM system in the last several
|
||||||
years, and work is ongoing.</para>
|
years, and work is ongoing.</para>
|
||||||
</sect1>
|
</sect1>
|
||||||
|
|
||||||
|
@ -566,23 +566,23 @@
|
||||||
<qandaentry>
|
<qandaentry>
|
||||||
<question>
|
<question>
|
||||||
<para>What is <quote>the interleaving algorithm</quote> that you
|
<para>What is <quote>the interleaving algorithm</quote> that you
|
||||||
refer to in your listing of the ills of the FreeBSD 3.X swap
|
refer to in your listing of the ills of the &os; 3.X swap
|
||||||
arrangements?</para>
|
arrangements?</para>
|
||||||
</question>
|
</question>
|
||||||
|
|
||||||
<answer>
|
<answer>
|
||||||
<para>FreeBSD uses a fixed swap interleave which defaults to 4. This
|
<para>&os; uses a fixed swap interleave which defaults to 4. This
|
||||||
means that FreeBSD reserves space for four swap areas even if you
|
means that &os; reserves space for four swap areas even if you
|
||||||
only have one, two, or three. Since swap is interleaved the linear
|
only have one, two, or three. Since swap is interleaved the linear
|
||||||
address space representing the <quote>four swap areas</quote> will be
|
address space representing the <quote>four swap areas</quote> will be
|
||||||
fragmented if you do not actually have four swap areas. For
|
fragmented if you do not actually have four swap areas. For
|
||||||
example, if you have two swap areas A and B FreeBSD's address
|
example, if you have two swap areas A and B &os;'s address
|
||||||
space representation for that swap area will be interleaved in
|
space representation for that swap area will be interleaved in
|
||||||
blocks of 16 pages:</para>
|
blocks of 16 pages:</para>
|
||||||
|
|
||||||
<literallayout>A B C D A B C D A B C D A B C D</literallayout>
|
<literallayout>A B C D A B C D A B C D A B C D</literallayout>
|
||||||
|
|
||||||
<para>FreeBSD 3.X uses a <quote>sequential list of free
|
<para>&os; 3.X uses a <quote>sequential list of free
|
||||||
regions</quote> approach to accounting for the free swap areas.
|
regions</quote> approach to accounting for the free swap areas.
|
||||||
The idea is that large blocks of free linear space can be
|
The idea is that large blocks of free linear space can be
|
||||||
represented with a single list node
|
represented with a single list node
|
||||||
|
@ -626,7 +626,7 @@
|
||||||
<para>I do not get the following:</para>
|
<para>I do not get the following:</para>
|
||||||
|
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<para>It is important to note that the FreeBSD VM system attempts
|
<para>It is important to note that the &os; VM system attempts
|
||||||
to separate clean and dirty pages for the express reason of
|
to separate clean and dirty pages for the express reason of
|
||||||
avoiding unnecessary flushes of dirty pages (which eats I/O
|
avoiding unnecessary flushes of dirty pages (which eats I/O
|
||||||
bandwidth), nor does it move pages between the various page
|
bandwidth), nor does it move pages between the various page
|
||||||
|
@ -649,7 +649,7 @@
|
||||||
separate the pages but the reality is that if we are not in a
|
separate the pages but the reality is that if we are not in a
|
||||||
memory crunch, we do not really have to.</para>
|
memory crunch, we do not really have to.</para>
|
||||||
|
|
||||||
<para>What this means is that FreeBSD will not try very hard to
|
<para>What this means is that &os; will not try very hard to
|
||||||
separate out dirty pages (inactive queue) from clean pages (cache
|
separate out dirty pages (inactive queue) from clean pages (cache
|
||||||
queue) when the system is not being stressed, nor will it try to
|
queue) when the system is not being stressed, nor will it try to
|
||||||
deactivate pages (active queue -> inactive queue) when the system
|
deactivate pages (active queue -> inactive queue) when the system
|
||||||
|
@ -663,14 +663,14 @@
|
||||||
would not some of the page faults be data page faults (COW from
|
would not some of the page faults be data page faults (COW from
|
||||||
executable file to private page)? I.e., I would expect the page
|
executable file to private page)? I.e., I would expect the page
|
||||||
faults to be some zero-fill and some program data. Or are you
|
faults to be some zero-fill and some program data. Or are you
|
||||||
implying that FreeBSD does do pre-COW for the program data?</para>
|
implying that &os; does do pre-COW for the program data?</para>
|
||||||
</question>
|
</question>
|
||||||
|
|
||||||
<answer>
|
<answer>
|
||||||
<para>A COW fault can be either zero-fill or program-data. The
|
<para>A COW fault can be either zero-fill or program-data. The
|
||||||
mechanism is the same either way because the backing program-data
|
mechanism is the same either way because the backing program-data
|
||||||
is almost certainly already in the cache. I am indeed lumping the
|
is almost certainly already in the cache. I am indeed lumping the
|
||||||
two together. FreeBSD does not pre-COW program data or zero-fill,
|
two together. &os; does not pre-COW program data or zero-fill,
|
||||||
but it <emphasis>does</emphasis> pre-map pages that exist in its
|
but it <emphasis>does</emphasis> pre-map pages that exist in its
|
||||||
cache.</para>
|
cache.</para>
|
||||||
</answer>
|
</answer>
|
||||||
|
@ -685,7 +685,7 @@
|
||||||
McKusick, Bostic, Karel, Quarterman)? Specifically, what kind of
|
McKusick, Bostic, Karel, Quarterman)? Specifically, what kind of
|
||||||
operation/reaction would require scanning the mappings?</para>
|
operation/reaction would require scanning the mappings?</para>
|
||||||
|
|
||||||
<para>How does Linux do in the case where FreeBSD breaks down
|
<para>How does Linux do in the case where &os; breaks down
|
||||||
(sharing a large file mapping over many processes)?</para>
|
(sharing a large file mapping over many processes)?</para>
|
||||||
</question>
|
</question>
|
||||||
|
|
||||||
|
@ -717,7 +717,7 @@
|
||||||
index into the page table for each of those 50 processes even if
|
index into the page table for each of those 50 processes even if
|
||||||
only 10 of them have actually mapped the page. So Linux is
|
only 10 of them have actually mapped the page. So Linux is
|
||||||
trading off the simplicity of its design against performance.
|
trading off the simplicity of its design against performance.
|
||||||
Many VM algorithms which are O(1) or (small N) under FreeBSD wind
|
Many VM algorithms which are O(1) or (small N) under &os; wind
|
||||||
up being O(N), O(N^2), or worse under Linux. Since the pte's
|
up being O(N), O(N^2), or worse under Linux. Since the pte's
|
||||||
representing a particular page in an object tend to be at the same
|
representing a particular page in an object tend to be at the same
|
||||||
offset in all the page tables they are mapped in, reducing the
|
offset in all the page tables they are mapped in, reducing the
|
||||||
|
@ -725,12 +725,12 @@
|
||||||
will often avoid blowing away the L1 cache line for that offset,
|
will often avoid blowing away the L1 cache line for that offset,
|
||||||
which can lead to better performance.</para>
|
which can lead to better performance.</para>
|
||||||
|
|
||||||
<para>FreeBSD has added complexity (the <literal>pv_entry</literal>
|
<para>&os; has added complexity (the <literal>pv_entry</literal>
|
||||||
scheme) in order to increase performance (to limit page table
|
scheme) in order to increase performance (to limit page table
|
||||||
accesses to <emphasis>only</emphasis> those pte's that need to be
|
accesses to <emphasis>only</emphasis> those pte's that need to be
|
||||||
modified).</para>
|
modified).</para>
|
||||||
|
|
||||||
<para>But FreeBSD has a scaling problem that Linux does not in that
|
<para>But &os; has a scaling problem that Linux does not in that
|
||||||
there are a limited number of <literal>pv_entry</literal>
|
there are a limited number of <literal>pv_entry</literal>
|
||||||
structures and this causes problems when you have massive sharing
|
structures and this causes problems when you have massive sharing
|
||||||
of data. In this case you may run out of
|
of data. In this case you may run out of
|
||||||
|
@ -744,10 +744,10 @@
|
||||||
<literal>pv_entry</literal> scheme: Linux uses
|
<literal>pv_entry</literal> scheme: Linux uses
|
||||||
<quote>permanent</quote> page tables that are not throw away, but
|
<quote>permanent</quote> page tables that are not throw away, but
|
||||||
does not need a <literal>pv_entry</literal> for each potentially
|
does not need a <literal>pv_entry</literal> for each potentially
|
||||||
mapped pte. FreeBSD uses <quote>throw away</quote> page tables but
|
mapped pte. &os; uses <quote>throw away</quote> page tables but
|
||||||
adds in a <literal>pv_entry</literal> structure for each
|
adds in a <literal>pv_entry</literal> structure for each
|
||||||
actually-mapped pte. I think memory utilization winds up being
|
actually-mapped pte. I think memory utilization winds up being
|
||||||
about the same, giving FreeBSD an algorithmic advantage with its
|
about the same, giving &os; an algorithmic advantage with its
|
||||||
ability to throw away page tables at will with very low
|
ability to throw away page tables at will with very low
|
||||||
overhead.</para>
|
overhead.</para>
|
||||||
</answer>
|
</answer>
|
||||||
|
|
Loading…
Reference in a new issue