This is Matt Dillon's VM Design article from DaemonNews. It's here
a) Because it's a cool piece of documentation b) It's a real world example of using images in the documentation. It's not turned on in the upper level Makefile yet, as I expect the specifics of the toolchain to change over the next week or so as people play around with this, and I don't want to the doc build mirrors to have to suddenly update the ports they have installed. Once this has stabilised it can be turned on.
This commit is contained in:
parent
3197343458
commit
68b9d2851a
Notes:
svn2git
2020-12-08 03:00:23 +00:00
svn path=/head/; revision=8116
12 changed files with 2678 additions and 0 deletions
16
en_US.ISO8859-1/articles/vm-design/Makefile
Normal file
16
en_US.ISO8859-1/articles/vm-design/Makefile
Normal file
|
@ -0,0 +1,16 @@
|
|||
# $FreeBSD: doc/en_US.ISO_8859-1/articles/mh/Makefile,v 1.8 1999/09/06 06:52:37 peter Exp $
|
||||
|
||||
DOC?= article
|
||||
|
||||
FORMATS?= html
|
||||
|
||||
IMAGES= fig1.eps fig2.eps fig3.eps fig4.eps
|
||||
|
||||
INSTALL_COMPRESSED?=gz
|
||||
INSTALL_ONLY_COMPRESSED?=
|
||||
|
||||
SRCS= article.sgml
|
||||
|
||||
DOC_PREFIX?= ${.CURDIR}/../../..
|
||||
|
||||
.include "${DOC_PREFIX}/share/mk/doc.project.mk"
|
838
en_US.ISO8859-1/articles/vm-design/article.sgml
Normal file
838
en_US.ISO8859-1/articles/vm-design/article.sgml
Normal file
|
@ -0,0 +1,838 @@
|
|||
<!-- $FreeBSD: doc/en_US.ISO_8859-1/articles/mh/article.sgml,v 1.7 1999/10/10 20:20:38 jhb Exp $ -->
|
||||
<!-- FreeBSD Documentation Project -->
|
||||
|
||||
<!DOCTYPE ARTICLE PUBLIC "-//FreeBSD//DTD DocBook V3.1-Based Extension//EN" [
|
||||
<!ENTITY % man PUBLIC "-//FreeBSD//ENTITIES DocBook Manual Page Entities//EN">
|
||||
%man;
|
||||
]>
|
||||
|
||||
<article>
|
||||
<artheader>
|
||||
<title>Design elements of the FreeBSD VM system</title>
|
||||
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>Matthew</firstname>
|
||||
|
||||
<surname>Dillon</surname>
|
||||
|
||||
<affiliation>
|
||||
<address>
|
||||
<email>dillon@apollo.backplane.com</email>
|
||||
</address>
|
||||
</affiliation>
|
||||
</author>
|
||||
</authorgroup>
|
||||
|
||||
<abstract>
|
||||
<para>The title is really just a fancy way of saying that I am going to
|
||||
attempt to describe the whole VM enchilada, hopefully in a way that
|
||||
everyone can follow. For the last year I have concentrated on a number
|
||||
of major kernel subsystems within FreeBSD, with the VM and Swap
|
||||
subsystems being the most interesting and NFS being ‘a necessary
|
||||
chore’. I rewrote only small portions of the code. In the VM
|
||||
arena the only major rewrite I have done is to the swap subsystem.
|
||||
Most of my work was cleanup and maintenance, with only moderate code
|
||||
rewriting and no major algorithmic adjustments within the VM
|
||||
subsystem. The bulk of the VM subsystem's theoretical base remains
|
||||
unchanged and a lot of the credit for the modernization effort in the
|
||||
last few years belongs to John Dyson and David Greenman. Not being a
|
||||
historian like Kirk I will not attempt to tag all the various features
|
||||
with peoples names, since I will invariably get it wrong.</para>
|
||||
</abstract>
|
||||
|
||||
<legalnotice>
|
||||
<para>This article was originally published in the January 2000 issue of
|
||||
<ulink url="http://www.daemonnews.org/">DaemonNews</ulink>. This
|
||||
version of the article may include updates from Matt and other authors
|
||||
to reflect changes in FreeBSD's VM implementation.</para>
|
||||
</legalnotice>
|
||||
</artheader>
|
||||
|
||||
<sect1>
|
||||
<title>Introduction</title>
|
||||
|
||||
<para>Before moving along to the actual design let's spend a little time
|
||||
on the necessity of maintaining and modernizing any long-living
|
||||
codebase. In the programming world, algorithms tend to be more
|
||||
important than code and it is precisely due to BSD's academic roots that
|
||||
a great deal of attention was paid to algorithm design from the
|
||||
beginning. More attention paid to the design generally leads to a clean
|
||||
and flexible codebase that can be fairly easily modified, extended, or
|
||||
replaced over time. While BSD is considered an ‘old’
|
||||
operating system by some people, those of us who work on it tend to view
|
||||
it more as a ‘mature’ codebase which has various components
|
||||
modified, extended, or replaced with modern code. It has evolved, and
|
||||
FreeBSD is at the bleeding edge no matter how old some of the code might
|
||||
be. This is an important distinction to make and one that is
|
||||
unfortunately lost to many people. The biggest error a programmer can
|
||||
make is to not learn from history, and this is precisely the error that
|
||||
many other modern operating systems have made. NT is the best example
|
||||
of this, and the consequences have been dire. Linux also makes this
|
||||
mistake to some degree—enough that we BSD folk can make small
|
||||
jokes about it every once in a while, anyway. Linux's problem is simply
|
||||
one of a lack of experience and history to compare ideas against, a
|
||||
problem that is easily and rapidly being addressed by the Linux
|
||||
community in the same way it has been addressed in the BSD
|
||||
community—by continuous code development. The NT folk, on the
|
||||
other hand, repeatedly make the same mistakes solved by UNIX decades ago
|
||||
and then spend years fixing them. Over and over again. They have a
|
||||
severe case of ‘not designed here’ and ‘we are always
|
||||
right because our marketing department says so’. I have little
|
||||
tolerance for anyone who cannot learn from history.</para>
|
||||
|
||||
<para>Much of the apparent complexity of the FreeBSD design, especially in
|
||||
the VM/Swap subsystem, is a direct result of having to solve serious
|
||||
performance issues that occur under various conditions. These issues
|
||||
are not due to bad algorithmic design but instead rise from
|
||||
environmental factors. In any direct comparison between platforms,
|
||||
these issues become most apparent when system resources begin to get
|
||||
stressed. As I describe FreeBSD's VM/Swap subsystem the reader should
|
||||
always keep two points in mind. First, the most important aspect of
|
||||
performance design is what is known as “Optimizing the Critical
|
||||
Path”. It is often the case that performance optimizations add a
|
||||
little bloat to the code in order to make the critical path perform
|
||||
better. Second, a solid, generalized design outperforms a
|
||||
heavily-optimized design over the long run. While a generalized design
|
||||
may end up being slower than an heavily-optimized design when they are
|
||||
first implemented, the generalized design tends to be easier to adapt to
|
||||
changing conditions and the heavily-optimized design winds up having to
|
||||
be thrown away. Any codebase that will survive and be maintainable for
|
||||
years must therefore be designed properly from the beginning even if it
|
||||
costs some performance. Twenty years ago people were still arguing that
|
||||
programming in assembly was better than programming in a high-level
|
||||
language because it produced code that was ten times as fast. Today,
|
||||
the fallibility of that argument is obvious—as are the parallels
|
||||
to algorithmic design and code generalization.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>VM Objects</title>
|
||||
|
||||
<para>The best way to begin describing the FreeBSD VM system is to look at
|
||||
it from the perspective of a user-level process. Each user process sees
|
||||
a single, private, contiguous VM address space containing several types
|
||||
of memory objects. These objects have various characteristics. Program
|
||||
code and program data are effectively a single memory-mapped file (the
|
||||
binary file being run), but program code is read-only while program data
|
||||
is copy-on-write. Program BSS is just memory allocated and filled with
|
||||
zeros on demand, called demand zero page fill. Arbitrary files can be
|
||||
memory-mapped into the address space as well, which is how the shared
|
||||
library mechanism works. Such mappings can require modifications to
|
||||
remain private to the process making them. The fork system call adds an
|
||||
entirely new dimension to the VM management problem on top of the
|
||||
complexity already given.</para>
|
||||
|
||||
<para>A program binary data page (which is a basic copy-on-write page)
|
||||
illustrates the complexity. A program binary contains a preinitialized
|
||||
data section which is initially mapped directly from the program file.
|
||||
When a program is loaded into a process's VM space, this area is
|
||||
initially memory-mapped and backed by the program binary itself,
|
||||
allowing the VM system to free/reuse the page and later load it back in
|
||||
from the binary. The moment a process modifies this data, however, the
|
||||
VM system must make a private copy of the page for that process. Since
|
||||
the private copy has been modified, the VM system may no longer free it,
|
||||
because there is no longer any way to restore it later on.</para>
|
||||
|
||||
<para>You will notice immediately that what was originally a simple file
|
||||
mapping has become much more complex. Data may be modified on a
|
||||
page-by-page basis whereas the file mapping encompasses many pages at
|
||||
once. The complexity further increases when a process forks. When a
|
||||
process forks, the result is two processes—each with their own
|
||||
private address spaces, including any modifications made by the original
|
||||
process prior to the call to <function>fork()</function>. It would be
|
||||
silly for the VM system to make a complete copy of the data at the time
|
||||
of the <function>fork()</function> because it is quite possible that at
|
||||
least one of the two processes will only need to read from that page
|
||||
from then on, allowing the original page to continue to be used. What
|
||||
was a private page is made copy-on-write again, since each process
|
||||
(parent and child) expects their own personal post-fork modifications to
|
||||
remain private to themselves and not effect the other.</para>
|
||||
|
||||
<para>FreeBSD manages all of this with a layered VM Object model. The
|
||||
original binary program file winds up being the lowest VM Object layer.
|
||||
A copy-on-write layer is pushed on top of that to hold those pages which
|
||||
had to be copied from the original file. If the program modifies a data
|
||||
page belonging to the original file the VM system takes a fault and
|
||||
makes a copy of the page in the higher layer. When a process forks,
|
||||
additional VM Object layers are pushed on. This might make a little
|
||||
more sense with a fairly basic example. A <function>fork()</function>
|
||||
is a common operation for any *BSD system, so this example will consider
|
||||
a program that starts up, and forks. When the process starts, the VM
|
||||
system creates an object layer, let's call this A:</para>
|
||||
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata fileref="fig1">
|
||||
</imageobject>
|
||||
|
||||
<textobject>
|
||||
<literallayout>+---------------+
|
||||
| A |
|
||||
+---------------+</literallayout>
|
||||
</textobject>
|
||||
|
||||
<textobject>
|
||||
<phrase>A picture</phrase>
|
||||
</textobject>
|
||||
</mediaobject>
|
||||
|
||||
<para>A represents the file—pages may be paged in and out of the
|
||||
file's physical media as necessary. Paging in from the disk is
|
||||
reasonable for a program, but we really don't want to page back out and
|
||||
overwrite the executable. The VM system therefore creates a second
|
||||
layer, B, that will be physically backed by swap space:</para>
|
||||
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata fileref="fig2">
|
||||
</imageobject>
|
||||
|
||||
<textobject>
|
||||
<literallayout>+---------------+
|
||||
| B |
|
||||
+---------------+
|
||||
| A |
|
||||
+---------------+</literallayout>
|
||||
</textobject>
|
||||
</mediaobject>
|
||||
|
||||
<para>On the first write to a page after this, a new page is created in B,
|
||||
and its contents are initialized from A. All pages in B can be paged in
|
||||
or out to a swap device. When the program forks, the VM system creates
|
||||
two new object layers—C1 for the parent, and C2 for the
|
||||
child—that rest on top of B:</para>
|
||||
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata fileref="fig3">
|
||||
</imageobject>
|
||||
|
||||
<textobject>
|
||||
<literallayout>+-------+-------+
|
||||
| C1 | C2 |
|
||||
+-------+-------+
|
||||
| B |
|
||||
+---------------+
|
||||
| A |
|
||||
+---------------+</literallayout>
|
||||
</textobject>
|
||||
</mediaobject>
|
||||
|
||||
<para>In this case, let's say a page in B is modified by the original
|
||||
parent process. The process will take a copy-on-write fault and
|
||||
duplicate the page in C1, leaving the original page in B untouched.
|
||||
Now, let's say the same page in B is modified by the child process. The
|
||||
process will take a copy-on-write fault and duplicate the page in C2.
|
||||
The original page in B is now completely hidden since both C1 and C2
|
||||
have a copy and B could theoretically be destroyed if it does not
|
||||
represent a 'real' file). However, this sort of optimization is not
|
||||
trivial to make because it is so fine-grained. FreeBSD does not make
|
||||
this optimization. Now, suppose (as is often the case) that the child
|
||||
process does an <function>exec()</function>. Its current address space
|
||||
is usually replaced by a new address space representing a new file. In
|
||||
this case, the C2 layer is destroyed:</para>
|
||||
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata fileref="fig4">
|
||||
</imageobject>
|
||||
|
||||
<textobject>
|
||||
<literallayout>+-------+
|
||||
| C1 |
|
||||
+-------+-------+
|
||||
| B |
|
||||
+---------------+
|
||||
| A |
|
||||
+---------------+</literallayout>
|
||||
</textobject>
|
||||
</mediaobject>
|
||||
|
||||
<para>In this case, the number of children of B drops to one, and all
|
||||
accesses to B now go through C1. This means that B and C1 can be
|
||||
collapsed together. Any pages in B that also exist in C1 are deleted
|
||||
from B during the collapse. Thus, even though the optimization in the
|
||||
previous step could not be made, we can recover the dead pages when
|
||||
either of the processes exit or <function>exec()</function>.</para>
|
||||
|
||||
<para>This model creates a number of potential problems. The first is that
|
||||
you can wind up with a relatively deep stack of layered VM Objects which
|
||||
can cost scanning time and memory when you when you take a fault. Deep
|
||||
layering can occur when processes fork and then fork again (either
|
||||
parent or child). The second problem is that you can wind up with dead,
|
||||
inaccessible pages deep in the stack of VM Objects. In our last example
|
||||
if both the parent and child processes modify the same page, they both
|
||||
get their own private copies of the page and the original page in B is
|
||||
no longer accessible by anyone. That page in B can be freed.</para>
|
||||
|
||||
<para>FreeBSD solves the deep layering problem with a special optimization
|
||||
called the “All Shadowed Case”. This case occurs if either
|
||||
C1 or C2 take sufficient COW faults to completely shadow all pages in B.
|
||||
Lets say that C1 achieves this. C1 can now bypass B entirely, so rather
|
||||
then have C1->B->A and C2->B->A we now have C1->A and C2->B->A. But
|
||||
look what also happened—now B has only one reference (C2), so we
|
||||
can collapse B and C2 together. The end result is that B is deleted
|
||||
entirely and we have C1->A and C2->A. It is often the case that B will
|
||||
contain a large number of pages and neither C1 nor C2 will be able to
|
||||
completely overshadow it. If we fork again and create a set of D
|
||||
layers, however, it is much more likely that one of the D layers will
|
||||
eventually be able to completely overshadow the much smaller dataset
|
||||
reprsented by C1 or C2. The same optimization will work at any point in
|
||||
the graph and the grand result of this is that even on a heavily forked
|
||||
machine VM Object stacks tend to not get much deeper then 4. This is
|
||||
true of both the parent and the children and true whether the parent is
|
||||
doing the forking or whether the children cascade forks.</para>
|
||||
|
||||
<para>The dead page problem still exists in the case where C1 or C2 do not
|
||||
completely overshadow B. Due to our other optimizations this case does
|
||||
not represent much of a problem and we simply allow the pages to be
|
||||
dead. If the system runs low on memory it will swap them out, eating a
|
||||
little swap, but that's it.</para>
|
||||
|
||||
<para>The advantage to the VM Object model is that
|
||||
<function>fork()</function> is extremely fast, since no real data
|
||||
copying need take place. The disadvantage is that you can build a
|
||||
relatively complex VM Object layering that slows page fault handling
|
||||
down a little, and you spend memory managing the VM Object structures.
|
||||
The optimizations FreeBSD makes proves to reduce the problems enough
|
||||
that they can be ignored, leaving no real disadvantage.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>SWAP Layers</title>
|
||||
|
||||
<para>Private data pages are initially either copy-on-write or zero-fill
|
||||
pages. When a change, and therefore a copy, is made, the original
|
||||
backing object (usually a file) can no longer be used to save a copy of
|
||||
the page when the VM system needs to reuse it for other purposes. This
|
||||
is where SWAP comes in. SWAP is allocated to create backing store for
|
||||
memory that does not otherwise have it. FreeBSD allocates the swap
|
||||
management structure for a VM Object only when it is actually needed.
|
||||
However, the swap management structure has had problems
|
||||
historically.</para>
|
||||
|
||||
<para>Under FreeBSD 3.x the swap management structure preallocates an
|
||||
array that encompasses the entire object requiring swap backing
|
||||
store—even if only a few pages of that object are swap-backed.
|
||||
This creates a kernel memory fragmentation problem when large objects
|
||||
are mapped, or processes with large runsizes (RSS) fork. Also, in order
|
||||
to keep track of swap space, a ‘list of holes’ is kept in
|
||||
kernel memory, and this tends to get severely fragmented as well. Since
|
||||
the 'list of holes' is a linear list, the swap allocation and freeing
|
||||
performance is a non-optimal O(n)-per-page. It also requires kernel
|
||||
memory allocations to take place during the swap freeing process, and
|
||||
that creates low memory deadlock problems. The problem is further
|
||||
exacerbated by holes created due to the interleaving algorithm. Also,
|
||||
the swap block map can become fragmented fairly easily resulting in
|
||||
non-contiguous allocations. Kernel memory must also be allocated on the
|
||||
fly for additional swap management structures when a swapout occurs. It
|
||||
is evident that there was plenty of room for improvement.</para>
|
||||
|
||||
<para>For FreeBSD 4.x, I completely rewrote the swap subsystem. With this
|
||||
rewrite, swap management structures are allocated through a hash table
|
||||
rather than a linear array giving them a fixed allocation size and much
|
||||
finer granularity. Rather then using a linearly linked list to keep
|
||||
track of swap space reservations, it now uses a bitmap of swap blocks
|
||||
arranged in a radix tree structure with free-space hinting in the radix
|
||||
node structures. This effectively makes swap allocation and freeing an
|
||||
O(1) operation. The entire radix tree bitmap is also preallocated in
|
||||
order to avoid having to allocate kernel memory during critical low
|
||||
memory swapping operations. After all, the system tends to swap when it
|
||||
is low on memory so we should avoid allocating kernel memory at such
|
||||
times in order to avoid potential deadlocks. Finally, to reduce
|
||||
fragmentation the radix tree is capable of allocating large contiguous
|
||||
chunks at once, skipping over smaller fragmented chunks. I did not take
|
||||
the final step of having an 'allocating hint pointer' that would trundle
|
||||
through a portion of swap as allocations were made in order to further
|
||||
guarantee contiguous allocations or at least locality of reference, but
|
||||
I ensured that such an addition could be made.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>When to free a page</title>
|
||||
|
||||
<para>Since the VM system uses all available memory for disk caching,
|
||||
there are usually very few truly-free pages. The VM system depends on
|
||||
being able to properly choose pages which are not in use to reuse for
|
||||
new allocations. Selecting the optimal pages to free is possibly the
|
||||
single-most important function any VM system can perform because if it
|
||||
makes a poor selection, the VM system may be forced to unnecessarily
|
||||
retrieve pages from disk, seriously degrading system performance.</para>
|
||||
|
||||
<para>How much overhead are we willing to suffer in the critical path to
|
||||
avoid freeing the wrong page? Each wrong choice we make will cost us
|
||||
hundreds of thousands of CPU cycles and a noticeable stall of the
|
||||
affected processes, so we are willing to endure a significant amount of
|
||||
overhead in order to be sure that the right page is chosen. This is why
|
||||
FreeBSD tends to outperform other systems when memory resources become
|
||||
stressed.</para>
|
||||
|
||||
<para>The free page determination algorithm is built upon a history of the
|
||||
use of memory pages. To acquire this history, the system takes advantage
|
||||
of a page-used bit feature that most hardware page tables have.</para>
|
||||
|
||||
<para>In any case, the page-used bit is cleared and at some later point
|
||||
the VM system comes across the page again and sees that the page-used
|
||||
bit has been set. This indicates that the page is still being actively
|
||||
used. If the bit is still clear it is an indication that the page is not
|
||||
being actively used. By testing this bit periodically, a use history (in
|
||||
the form of a counter) for the physical page is developed. When the VM
|
||||
system later needs to free up some pages, checking this history becomes
|
||||
the cornerstone of determining the best candidate page to reuse.</para>
|
||||
|
||||
<sidebar>
|
||||
<title>What if the hardware has no page-used bit?</title>
|
||||
|
||||
<para>For those platforms that do not have this feature, the system
|
||||
actually emulates a page-used bit. It unmaps or protects a page,
|
||||
forcing a page fault if the page is accessed again. When the page
|
||||
fault is taken, the system simply marks the page as having been used
|
||||
and unprotects the page so that it may be used. While taking such page
|
||||
faults just to determine if a page is being used appears to be an
|
||||
expensive proposition, it is much less expensive than reusing the page
|
||||
for some other purpose only to find that a process needs it back and
|
||||
then have to go to disk.</para>
|
||||
</sidebar>
|
||||
|
||||
<para>FreeBSD makes use of several page queues to further refine the
|
||||
selection of pages to reuse as well as to determine when dirty pages
|
||||
must be flushed to their backing store. Since page tables are dynamic
|
||||
entities under FreeBSD, it costs virtually nothing to unmap a page from
|
||||
the address space of any processes using it. When a page candidate has
|
||||
been chosen based on the page-use counter, this is precisely what is
|
||||
done. The system must make a distinction between clean pages which can
|
||||
theoretically be freed up at any time, and dirty pages which must first
|
||||
be written to their backing store before being reusable. When a page
|
||||
candidate has been found it is moved to the inactive queue if it is
|
||||
dirty, or the cache queue if it is clean. A separate algorithm based on
|
||||
the dirty-to-clean page ratio determines when dirty pages in the
|
||||
inactive queue must be flushed to disk. Once this is accomplished, the
|
||||
flushed pages are moved from the inactive queue to the cache queue. At
|
||||
this point, pages in the cache queue can still be reactivated by a VM
|
||||
fault at relatively low cost. However, pages in the cache queue are
|
||||
considered to be ‘immediately freeable’ and will be reused
|
||||
in an LRU (least-recently used) fashion when the system needs to
|
||||
allocate new memory.</para>
|
||||
|
||||
<para>It is important to note that the FreeBSD VM system attempts to
|
||||
separate clean and dirty pages for the express reason of avoiding
|
||||
unnecessary flushes of dirty pages (which eats I/O bandwidth), nor does
|
||||
it move pages between the various page queues gratuitously when the
|
||||
memory subsystem is not being stressed. This is why you will see some
|
||||
systems with very low cache queue counts and high active queue counts
|
||||
when doing a <command>systat -vm</command> command. As the VM system
|
||||
becomes more stressed, it makes a greater effort to maintain the various
|
||||
page queues at the levels determined to be the most effective. An urban
|
||||
myth has circulated for years that Linux did a better job avoiding
|
||||
swapouts than FreeBSD, but this in fact is not true. What was actually
|
||||
occurring was that FreeBSD was proactively paging out unused pages in
|
||||
order to make room for more disk cache while Linux was keeping unused
|
||||
pages in core and leaving less memory available for cache and process
|
||||
pages. I don't know whether this is still true today.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>Pre-Faulting and Zeroing Optimizations</title>
|
||||
|
||||
<para>Taking a VM fault is not expensive if the underlying page is already
|
||||
in core and can simply be mapped into the process, but it can become
|
||||
expensive if you take a whole lot of them on a regular basis. A good
|
||||
example of this is running a program such as &man.ls.1; or &man.ps.1;
|
||||
over and over again. If the program binary is mapped into memory but
|
||||
not mapped into the page table, then all the pages that will be accessed
|
||||
by the program will have to be faulted in every time the program is run.
|
||||
This is unnecessary when the pages in question are already in the VM
|
||||
Cache, so FreeBSD will attempt to pre-populate a process's page tables
|
||||
with those pages that are already in the VM Cache. One thing that
|
||||
FreeBSD does not yet do is pre-copy-on-write certain pages on exec. For
|
||||
example, if you run the &man.ls.1; program while running <command>vmstat
|
||||
1</command> you will notice that it always takes a certain number of
|
||||
page faults, even when you run it over and over again. These are
|
||||
zero-fill faults, not program code faults (which were pre-faulted in
|
||||
already). Pre-copying pages on exec or fork is an area that could use
|
||||
more study.</para>
|
||||
|
||||
<para>A large percentage of page faults that occur are zero-fill faults.
|
||||
You can usually see this by observing the <command>vmstat -s</command>
|
||||
output. These occur when a process accesses pages in its BSS area. The
|
||||
BSS area is expected to be initially zero but the VM system does not
|
||||
bother to allocate any memory at all until the process actually accesses
|
||||
it. When a fault occurs the VM system must not only allocate a new page,
|
||||
it must zero it as well. To optimize the zeroing operation the VM system
|
||||
has the ability to pre-zero pages and mark them as such, and to request
|
||||
pre-zeroed pages when zero-fill faults occur. The pre-zeroing occurs
|
||||
whenever the CPU is idle but the number of pages the system pre-zeros is
|
||||
limited in order to avoid blowing away the memory caches. This is an
|
||||
excellent example of adding complexity to the VM system in order to
|
||||
optimize the critical path.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>Page Table Optimizations</title>
|
||||
|
||||
<para>The page table optimizations make up the most contentious part of
|
||||
the FreeBSD VM design and they have shown some strain with the advent of
|
||||
serious use of <function>mmap()</function>. I think this is actually a
|
||||
feature of most BSDs though I am not sure when it was first introduced.
|
||||
There are two major optimizations. The first is that hardware page
|
||||
tables do not contain persistent state but instead can be thrown away at
|
||||
any time with only a minor amount of management overhead. The second is
|
||||
that every active page table entry in the system has a governing
|
||||
<literal>pv_entry</literal> structure which is tied into the
|
||||
<literal>vm_page</literal> structure. FreeBSD can simply iterate
|
||||
through those mappings that are known to exist while Linux must check
|
||||
all page tables that <emphasis>might</emphasis> contain a specific
|
||||
mapping to see if it does, which can achieve O(n^2) overhead in certain
|
||||
situations. It is because of this that FreeBSD tends to make better
|
||||
choices on which pages to reuse or swap when memory is stressed, giving
|
||||
it better performance under load. However, FreeBSD requires kernel
|
||||
tuning to accommodate large-shared-address-space situations such as
|
||||
those that can occur in a news system because it may run out of
|
||||
<literal>pv_entry</literal> structures.</para>
|
||||
|
||||
<para>Both Linux and FreeBSD need work in this area. FreeBSD is trying to
|
||||
maximize the advantage of a potentially sparse active-mapping model (not
|
||||
all processes need to map all pages of a shared library, for example),
|
||||
whereas Linux is trying to simplify its algorithms. FreeBSD generally
|
||||
has the performance advantage here at the cost of wasting a little extra
|
||||
memory, but FreeBSD breaks down in the case where a large file is
|
||||
massively shared across hundreds of processes. Linux, on the other hand,
|
||||
breaks down in the case where many processes are sparsely-mapping the
|
||||
same shared library and also runs non-optimally when trying to determine
|
||||
whether a page can be reused or not.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>Page Coloring</title>
|
||||
|
||||
<para>We'll end with the page coloring optimizations. Page coloring is a
|
||||
performance optimization designed to ensure that accesses to contiguous
|
||||
pages in virtual memory make the best use of the processor cache. In
|
||||
ancient times (i.e. 10+ years ago) processor caches tended to map
|
||||
virtual memory rather than physical memory. This led to a huge number of
|
||||
problems including having to clear the cache on every context switch in
|
||||
some cases, and problems with data aliasing in the cache. Modern
|
||||
processor caches map physical memory precisely to solve those problems.
|
||||
This means that two side-by-side pages in a processes address space may
|
||||
not correspond to two side-by-side pages in the cache. In fact, if you
|
||||
aren't careful side-by-side pages in virtual memory could wind up using
|
||||
the same page in the processor cache—leading to cacheable data
|
||||
being thrown away prematurely and reducing CPU performance. This is true
|
||||
even with multi-way set-associative caches (though the effect is
|
||||
mitigated somewhat).</para>
|
||||
|
||||
<para>FreeBSD's memory allocation code implements page coloring
|
||||
optimizations, which means that the memory allocation code will attempt
|
||||
to locate free pages that are contiguous from the point of view of the
|
||||
cache. For example, if page 16 of physical memory is assigned to page 0
|
||||
of a process's virtual memory and the cache can hold 4 pages, the page
|
||||
coloring code will not assign page 20 of physical memory to page 1 of a
|
||||
process's virtual memory. It would, instead, assign page 21 of physical
|
||||
memory. The page coloring code attempts to avoid assigning page 20
|
||||
because this maps over the same cache memory as page 16 and would result
|
||||
in non-optimal caching. This code adds a significant amount of
|
||||
complexity to the VM memory allocation subsystem as you can well
|
||||
imagine, but the result is well worth the effort. Page Coloring makes VM
|
||||
memory as deterministic as physical memory in regards to cache
|
||||
performance.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>Conclusion</title>
|
||||
|
||||
<para>Virtual memory in modern operating systems must address a number of
|
||||
different issues efficiently and for many different usage patterns. The
|
||||
modular and algorithmic approach that BSD has historically taken allows
|
||||
us to study and understand the current implementation as well as
|
||||
relatively cleanly replace large sections of the code. There have been a
|
||||
number of improvements to the FreeBSD VM system in the last several
|
||||
years, and work is ongoing.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>Bonus QA session by Allen Briggs
|
||||
<email>briggs@ninthwonder.com</email></title>
|
||||
|
||||
<qandaset>
|
||||
<qandaentry>
|
||||
<question>
|
||||
<para>What is “the interleaving algorithm” that you
|
||||
refer to in your listing of the ills of the FreeBSD 3.x swap
|
||||
arrangments?</para>
|
||||
</question>
|
||||
|
||||
<answer>
|
||||
<para>FreeBSD uses a fixed swap interleave which defaults to 4. This
|
||||
means that FreeBSD reserves space for four swap areas even if you
|
||||
only have one, two, or three. Since swap is interleaved the linear
|
||||
address space representing the ‘four swap areas’ will be
|
||||
fragmented if you don't actually have four swap areas. For
|
||||
example, if you have two swap areas A and B FreeBSD's address
|
||||
space representation for that swap area will be interleaved in
|
||||
blocks of 16 pages:</para>
|
||||
|
||||
<literallayout>A B C D A B C D A B C D A B C D</literallayout>
|
||||
|
||||
<para>FreeBSD 3.x uses a ‘sequential list of free
|
||||
regions’ approach to accounting for the free swap areas.
|
||||
The idea is that large blocks of free linear space can be
|
||||
represented with a single list node
|
||||
(<filename>kern/subr_rlist.c</filename>). But due to the
|
||||
fragmentation the sequential list winds up being insanely
|
||||
fragmented. In the above example, completely unused swap will
|
||||
have A and B shown as ‘free’ and C and D shown as
|
||||
‘all allocated’. Each A-B sequence requires a list
|
||||
node to account for because C and D are holes, so the list node
|
||||
cannot be combined with the next A-B sequence.</para>
|
||||
|
||||
<para>Why do we interleave our swap space instead of just tack swap
|
||||
areas onto the end and do something fancier? Because it's a whole
|
||||
lot easier to allocate linear swaths of an address space and have
|
||||
the result automatically be interleaved across multiple disks than
|
||||
it is to try to put that sophistication elsewhere.</para>
|
||||
|
||||
<para>The fragmentation causes other problems. Being a linear list
|
||||
under 3.x, and having such a huge amount of inherent
|
||||
fragmentation, allocating and freeing swap winds up being an O(N)
|
||||
algorithm instead of an O(1) algorithm. Combined with other
|
||||
factors (heavy swapping) and you start getting into O(N^2) and
|
||||
O(N^3) levels of overhead, which is bad. The 3.x system may also
|
||||
need to allocate KVM during a swap operation to create a new list
|
||||
node which can lead to a deadlock if the system is trying to
|
||||
pageout pages in a low-memory situation.</para>
|
||||
|
||||
<para>Under 4.x we do not use a sequential list. Instead we use a
|
||||
radix tree and bitmaps of swap blocks rather than ranged list
|
||||
nodes. We take the hit of preallocating all the bitmaps required
|
||||
for the entire swap area up front but it winds up wasting less
|
||||
memory due to the use of a bitmap (one bit per block) instead of a
|
||||
linked list of nodes. The use of a radix tree instead of a
|
||||
sequential list gives us nearly O(1) performance no matter how
|
||||
fragmented the tree becomes.</para>
|
||||
</answer>
|
||||
</qandaentry>
|
||||
|
||||
<qandaentry>
|
||||
<question>
|
||||
<para>I don't get the following:</para>
|
||||
|
||||
<blockquote>
|
||||
<para>It is important to note that the FreeBSD VM system attempts
|
||||
to separate clean and dirty pages for the express reason of
|
||||
avoiding unnecessary flushes of dirty pages (which eats I/O
|
||||
bandwidth), nor does it move pages between the various page
|
||||
queues gratitously when the memory subsystem is not being
|
||||
stressed. This is why you will see some systems with very low
|
||||
cache queue counts and high active queue counts when doing a
|
||||
<command>systat -vm</command> command.</para>
|
||||
</blockquote>
|
||||
|
||||
<para>How is the separation of clean and dirty (inactive) pages
|
||||
related to the situation where you see low cache queue counts and
|
||||
high active queue counts in <command>systat -vm</command>? Do the
|
||||
systat stats roll the active and dirty pages together for the
|
||||
active queue count?</para>
|
||||
</question>
|
||||
|
||||
<answer>
|
||||
<para>Yes, that is confusing. The relationship is
|
||||
“goal” verses “reality”. Our goal is to
|
||||
separate the pages but the reality is that if we are not in a
|
||||
memory crunch, we don't really have to.</para>
|
||||
|
||||
<para>What this means is that FreeBSD will not try very hard to
|
||||
separate out dirty pages (inactive queue) from clean pages (cache
|
||||
queue) when the system is not being stressed, nor will it try to
|
||||
deactivate pages (active queue -> inactive queue) when the system
|
||||
is not being stressed, even if they aren't being used.</para>
|
||||
</answer>
|
||||
</qandaentry>
|
||||
|
||||
<qandaentry>
|
||||
<question>
|
||||
<para> In the &man.ls.1; / <command>vmstat 1</command> example,
|
||||
wouldn't some of the page faults be data page faults (COW from
|
||||
executable file to private page)? I.e., I would expect the page
|
||||
faults to be some zero-fill and some program data. Or are you
|
||||
implying that FreeBSD does do pre-COW for the program data?</para>
|
||||
</question>
|
||||
|
||||
<answer>
|
||||
<para>A COW fault can be either zero-fill or program-data. The
|
||||
mechanism is the same either way because the backing program-data
|
||||
is almost certainly already in the cache. I am indeed lumping the
|
||||
two together. FreeBSD does not pre-COW program data or zero-fill,
|
||||
but it <emphasis>does</emphasis> pre-map pages that exist in its
|
||||
cache.</para>
|
||||
</answer>
|
||||
</qandaentry>
|
||||
|
||||
<qandaentry>
|
||||
<question>
|
||||
<para>In your section on page table optimizations, can you give a
|
||||
little more detail about <literal>pv_entry</literal> and
|
||||
<literal>vm_page</literal> (or should vm_page be
|
||||
<literal>vm_pmap</literal>—as in 4.4, cf. pp. 180-181 of
|
||||
McKusick, Bostic, Karel, Quarterman)? Specifically, what kind of
|
||||
operation/reaction would require scanning the mappings?</para>
|
||||
|
||||
<para>How does Linux do in the case where FreeBSD breaks down
|
||||
(sharing a large file mapping over many processes)?</para>
|
||||
</question>
|
||||
|
||||
<answer>
|
||||
<para>A <literal>vm_page</literal> represents an (object,index#)
|
||||
tuple. A <literal>pv_entry</literal> represents a hardware page
|
||||
table entry (pte). If you have five processes sharing the same
|
||||
physical page, and three of those processes's page tables actually
|
||||
map the page, that page will be represented by a single
|
||||
<literal>vm_page</literal> structure and three
|
||||
<literal>pv_entry</literal> structures.</para>
|
||||
|
||||
<para><literal>pv_entry</literal> structures only represent pages
|
||||
mapped by the MMU (one <literal>pv_entry</literal> represnts one
|
||||
pte). This means that when we need to remove all hardware
|
||||
references to a <literal>vm_page</literal> (in order to reuse the
|
||||
page for something else, page it out, clear it, dirty it, and so
|
||||
forth) we can simply scan the linked list of
|
||||
<literal>pv_entry</literal>'s associated with that
|
||||
<literal>vm_page</literal> to remove or modify the pte's from
|
||||
their page tables.</para>
|
||||
|
||||
<para>Under Linux there is no such linked list. In order to remove
|
||||
all the hardware page table mappings for a
|
||||
<literal>vm_page</literal> linux must index into every VM object
|
||||
that <emphasis>might</emphasis> have mapped the page. For
|
||||
example, if you have 50 processes all mapping the same shared
|
||||
library and want to get rid of page X in that library, you need to
|
||||
index into the page table for each of those 50 processes even if
|
||||
only 10 of them have actually mapped the page. So Linux is
|
||||
trading off the simplicity of its design against performance.
|
||||
Many VM algorithms which are O(1) or (small N) under FreeBSD wind
|
||||
up being O(N), O(N^2), or worse under Linux. Since the pte's
|
||||
representing a particular page in an object tend to be at the same
|
||||
offset in all the page tables they are mapped in, reducing the
|
||||
number of accesses into the page tables at the same pte offset
|
||||
will often avoid blowing away the L1 cache line for that offset,
|
||||
which can lead to better performance.</para>
|
||||
|
||||
<para>FreeBSD has added complexity (the <literal>pv_entry</literal>
|
||||
scheme) in order to increase performance (to limit page table
|
||||
accesses to <emphasis>only</emphasis> those pte's that need to be
|
||||
modified).</para>
|
||||
|
||||
<para>But FreeBSD has a scaling problem that Linux does not in that
|
||||
there are a limited number of <literal>pv_entry</literal>
|
||||
structures and this causes problems when you have massive sharing
|
||||
of data. In this case you may run out of
|
||||
<literal>pv_entry</literal> structures even though there is plenty
|
||||
of free memory available. This can be fixed easily enough by
|
||||
bumping up the number of <literal>pv_entry</literal> structures in
|
||||
the kernel config, but we really need to find a better way to do
|
||||
it.</para>
|
||||
|
||||
<para>In regards to the memory overhead of a page table verses the
|
||||
<literal>pv_entry</literal> scheme: Linux uses
|
||||
‘permanent’ page tables that are not throw away, but
|
||||
does not need a <literal>pv_entry</literal> for each potentially
|
||||
mapped pte. FreeBSD uses ‘throw away’ page tables but
|
||||
adds in a <literal>pv_entry</literal> structure for each
|
||||
actually-mapped pte. I think memory utilization winds up being
|
||||
about the same, giving FreeBSD an algorithmic advantage with its
|
||||
ability to throw away page tables at will with very low
|
||||
overhead.</para>
|
||||
</answer>
|
||||
</qandaentry>
|
||||
|
||||
<qandaentry>
|
||||
<question>
|
||||
<para>Finally, in the page coloring section, it might help to have a
|
||||
little more description of what you mean here. I didn't quite
|
||||
follow it.</para>
|
||||
</question>
|
||||
|
||||
<answer>
|
||||
<para>Do you know how an L1 hardware memory cache works? I'll
|
||||
explain: Consider a machine with 16MB of main memory but only 128K
|
||||
of L1 cache. Generally the way this cache works is that each 128K
|
||||
block of main memory uses the <emphasis>same</emphasis> 128K of
|
||||
cache. If you access offset 0 in main memory and then offset
|
||||
offset 128K in main memory you can wind up throwing away the
|
||||
cached data you read from offset 0!</para>
|
||||
|
||||
<para>Now, I am simplifying things greatly. What I just described
|
||||
is what is called a ‘direct mapped’ hardware memory
|
||||
cache. Most modern caches are what are called
|
||||
2-way-set-associative or 4-way-set-associative caches. The
|
||||
set-associatively allows you to access up to N different memory
|
||||
regions that overlap the same cache memory without destroying the
|
||||
previously cached data. But only N.</para>
|
||||
|
||||
<para>So if I have a 4-way set associative cache I can access offset
|
||||
0, offset 128K, 256K and offset 384K and still be able to access
|
||||
offset 0 again and have it come from the L1 cache. If I then
|
||||
access offset 512K, however, one of the four previously cached
|
||||
data objects will be thrown away by the cache.</para>
|
||||
|
||||
<para>It is extremely important…
|
||||
<emphasis>extremely</emphasis> important for most of a processor's
|
||||
memory accesses to be able to come from the L1 cache, because the
|
||||
L1 cache operates at the processor frequency. The moment you have
|
||||
an L1 cahe miss and have to go to the L2 cache or to main memory,
|
||||
the processor will stall and potentially sit twidling its fingers
|
||||
for <emphasis>hundreds</emphasis> of instructions worth of time
|
||||
waiting for a read from main memory to complete. Main memory (the
|
||||
dynamic ram you stuff into a computer) is
|
||||
<emphasis>slow</emphasis>, when compared to the speed of a modern
|
||||
processor core.</para>
|
||||
|
||||
<para>Ok, so now onto page coloring: All modern memory caches are
|
||||
what are known as <emphasis>physical</emphasis> caches. They
|
||||
cache physical memory addresses, not virtual memory addresses.
|
||||
This allows the cache to be left alone across a process context
|
||||
switch, which is very important.</para>
|
||||
|
||||
<para>But in the UNIX world you are dealing with virtual address
|
||||
spaces, not physical address spaces. Any program you write will
|
||||
see the virtual address space given to it. The actual
|
||||
<emphasis>physical</emphasis> pages underlying that virtual
|
||||
address space are not necessarily physically contiguous! In fact,
|
||||
you might have two pages that are side by side in a processes
|
||||
address space which wind up being at offset 0 and offset 128K in
|
||||
<emphasis>physical</emphasis> memory.</para>
|
||||
|
||||
<para>A program normally assumes that two side-by-side pages will be
|
||||
optimally cached. That is, that you can access data objects in
|
||||
both pages without having them blow away each other's cache entry.
|
||||
But this is only true if the physical pages underlying the virtual
|
||||
address space are contiguous (insofar as the cache is
|
||||
concerned).</para>
|
||||
|
||||
<para>This is what Page coloring does. Instead of assigning
|
||||
<emphasis>random</emphasis> physical pages to virtual addresses,
|
||||
which may result in non-optimal cache performance , Page coloring
|
||||
assigns <emphasis>reasonably-contiguous</emphasis> physical pages
|
||||
to virtual addresses. Thus programs can be written under the
|
||||
assumption that the characteristics of the underlying hardware
|
||||
cache are the same for their virtual address space as they would
|
||||
be if the program had been run directly in a physical address
|
||||
space.</para>
|
||||
|
||||
<para>Note that I say ‘reasonably’ contiguous rather
|
||||
than simply ‘contiguous’. From the point of view of a
|
||||
128K direct mapped cache, the physical address 0 is the same as
|
||||
the physical address 128K. So two side-by-side pages in your
|
||||
virtual address space may wind up being offset 128K and offset
|
||||
132K in physical memory, but could also easily be offset 128K and
|
||||
offset 4K in physical memory and still retain the same cache
|
||||
performance characteristics. So page-coloring does
|
||||
<emphasis>not</emphasis> have to assign truly contiguous pages of
|
||||
physical memory to contiguous pages of virtual memory, it just
|
||||
needs to make sure it assigns contiguous pages from the point of
|
||||
view of cache performance and operation.</para>
|
||||
</answer>
|
||||
</qandaentry>
|
||||
</qandaset>
|
||||
</sect1>
|
||||
</article>
|
104
en_US.ISO8859-1/articles/vm-design/fig1.eps
Normal file
104
en_US.ISO8859-1/articles/vm-design/fig1.eps
Normal file
|
@ -0,0 +1,104 @@
|
|||
%!PS-Adobe-2.0 EPSF-2.0
|
||||
%%Title: fig1.eps
|
||||
%%Creator: fig2dev Version 3.2.3 Patchlevel
|
||||
%%CreationDate: Sun Oct 8 19:54:25 2000
|
||||
%%For: nik@canyon.nothing-going-on.org (Nik Clayton)
|
||||
%%BoundingBox: 0 0 119 65
|
||||
%%Magnification: 1.0000
|
||||
%%EndComments
|
||||
/$F2psDict 200 dict def
|
||||
$F2psDict begin
|
||||
$F2psDict /mtrx matrix put
|
||||
/col-1 {0 setgray} bind def
|
||||
/col0 {0.000 0.000 0.000 srgb} bind def
|
||||
/col1 {0.000 0.000 1.000 srgb} bind def
|
||||
/col2 {0.000 1.000 0.000 srgb} bind def
|
||||
/col3 {0.000 1.000 1.000 srgb} bind def
|
||||
/col4 {1.000 0.000 0.000 srgb} bind def
|
||||
/col5 {1.000 0.000 1.000 srgb} bind def
|
||||
/col6 {1.000 1.000 0.000 srgb} bind def
|
||||
/col7 {1.000 1.000 1.000 srgb} bind def
|
||||
/col8 {0.000 0.000 0.560 srgb} bind def
|
||||
/col9 {0.000 0.000 0.690 srgb} bind def
|
||||
/col10 {0.000 0.000 0.820 srgb} bind def
|
||||
/col11 {0.530 0.810 1.000 srgb} bind def
|
||||
/col12 {0.000 0.560 0.000 srgb} bind def
|
||||
/col13 {0.000 0.690 0.000 srgb} bind def
|
||||
/col14 {0.000 0.820 0.000 srgb} bind def
|
||||
/col15 {0.000 0.560 0.560 srgb} bind def
|
||||
/col16 {0.000 0.690 0.690 srgb} bind def
|
||||
/col17 {0.000 0.820 0.820 srgb} bind def
|
||||
/col18 {0.560 0.000 0.000 srgb} bind def
|
||||
/col19 {0.690 0.000 0.000 srgb} bind def
|
||||
/col20 {0.820 0.000 0.000 srgb} bind def
|
||||
/col21 {0.560 0.000 0.560 srgb} bind def
|
||||
/col22 {0.690 0.000 0.690 srgb} bind def
|
||||
/col23 {0.820 0.000 0.820 srgb} bind def
|
||||
/col24 {0.500 0.190 0.000 srgb} bind def
|
||||
/col25 {0.630 0.250 0.000 srgb} bind def
|
||||
/col26 {0.750 0.380 0.000 srgb} bind def
|
||||
/col27 {1.000 0.500 0.500 srgb} bind def
|
||||
/col28 {1.000 0.630 0.630 srgb} bind def
|
||||
/col29 {1.000 0.750 0.750 srgb} bind def
|
||||
/col30 {1.000 0.880 0.880 srgb} bind def
|
||||
/col31 {1.000 0.840 0.000 srgb} bind def
|
||||
|
||||
end
|
||||
save
|
||||
newpath 0 65 moveto 0 0 lineto 119 0 lineto 119 65 lineto closepath clip newpath
|
||||
-143.0 298.0 translate
|
||||
1 -1 scale
|
||||
|
||||
/cp {closepath} bind def
|
||||
/ef {eofill} bind def
|
||||
/gr {grestore} bind def
|
||||
/gs {gsave} bind def
|
||||
/sa {save} bind def
|
||||
/rs {restore} bind def
|
||||
/l {lineto} bind def
|
||||
/m {moveto} bind def
|
||||
/rm {rmoveto} bind def
|
||||
/n {newpath} bind def
|
||||
/s {stroke} bind def
|
||||
/sh {show} bind def
|
||||
/slc {setlinecap} bind def
|
||||
/slj {setlinejoin} bind def
|
||||
/slw {setlinewidth} bind def
|
||||
/srgb {setrgbcolor} bind def
|
||||
/rot {rotate} bind def
|
||||
/sc {scale} bind def
|
||||
/sd {setdash} bind def
|
||||
/ff {findfont} bind def
|
||||
/sf {setfont} bind def
|
||||
/scf {scalefont} bind def
|
||||
/sw {stringwidth} bind def
|
||||
/tr {translate} bind def
|
||||
/tnt {dup dup currentrgbcolor
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb}
|
||||
bind def
|
||||
/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul
|
||||
4 -2 roll mul srgb} bind def
|
||||
/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def
|
||||
/$F2psEnd {$F2psEnteredState restore end} def
|
||||
|
||||
$F2psBegin
|
||||
%%Page: 1 1
|
||||
10 setmiterlimit
|
||||
0.06000 0.06000 sc
|
||||
% Polyline
|
||||
7.500 slw
|
||||
n 2400 4200 m 4050 4200 l 4050 4950 l 2400 4950 l
|
||||
cp gs col0 s gr
|
||||
% Polyline
|
||||
n 4050 4200 m
|
||||
4350 3900 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2400 4200 m 2700 3900 l 4350 3900 l 4350 4650 l
|
||||
4050 4950 l gs col0 s gr
|
||||
/Helvetica-Bold ff 180.00 scf sf
|
||||
3225 4650 m
|
||||
gs 1 -1 sc (A) dup sw pop 2 div neg 0 rm col0 sh gr
|
||||
$F2psEnd
|
||||
rs
|
115
en_US.ISO8859-1/articles/vm-design/fig2.eps
Normal file
115
en_US.ISO8859-1/articles/vm-design/fig2.eps
Normal file
|
@ -0,0 +1,115 @@
|
|||
%!PS-Adobe-2.0 EPSF-2.0
|
||||
%%Title: fig2.eps
|
||||
%%Creator: fig2dev Version 3.2.3 Patchlevel
|
||||
%%CreationDate: Sun Oct 8 19:55:31 2000
|
||||
%%For: nik@canyon.nothing-going-on.org (Nik Clayton)
|
||||
%%BoundingBox: 0 0 120 110
|
||||
%%Magnification: 1.0000
|
||||
%%EndComments
|
||||
/$F2psDict 200 dict def
|
||||
$F2psDict begin
|
||||
$F2psDict /mtrx matrix put
|
||||
/col-1 {0 setgray} bind def
|
||||
/col0 {0.000 0.000 0.000 srgb} bind def
|
||||
/col1 {0.000 0.000 1.000 srgb} bind def
|
||||
/col2 {0.000 1.000 0.000 srgb} bind def
|
||||
/col3 {0.000 1.000 1.000 srgb} bind def
|
||||
/col4 {1.000 0.000 0.000 srgb} bind def
|
||||
/col5 {1.000 0.000 1.000 srgb} bind def
|
||||
/col6 {1.000 1.000 0.000 srgb} bind def
|
||||
/col7 {1.000 1.000 1.000 srgb} bind def
|
||||
/col8 {0.000 0.000 0.560 srgb} bind def
|
||||
/col9 {0.000 0.000 0.690 srgb} bind def
|
||||
/col10 {0.000 0.000 0.820 srgb} bind def
|
||||
/col11 {0.530 0.810 1.000 srgb} bind def
|
||||
/col12 {0.000 0.560 0.000 srgb} bind def
|
||||
/col13 {0.000 0.690 0.000 srgb} bind def
|
||||
/col14 {0.000 0.820 0.000 srgb} bind def
|
||||
/col15 {0.000 0.560 0.560 srgb} bind def
|
||||
/col16 {0.000 0.690 0.690 srgb} bind def
|
||||
/col17 {0.000 0.820 0.820 srgb} bind def
|
||||
/col18 {0.560 0.000 0.000 srgb} bind def
|
||||
/col19 {0.690 0.000 0.000 srgb} bind def
|
||||
/col20 {0.820 0.000 0.000 srgb} bind def
|
||||
/col21 {0.560 0.000 0.560 srgb} bind def
|
||||
/col22 {0.690 0.000 0.690 srgb} bind def
|
||||
/col23 {0.820 0.000 0.820 srgb} bind def
|
||||
/col24 {0.500 0.190 0.000 srgb} bind def
|
||||
/col25 {0.630 0.250 0.000 srgb} bind def
|
||||
/col26 {0.750 0.380 0.000 srgb} bind def
|
||||
/col27 {1.000 0.500 0.500 srgb} bind def
|
||||
/col28 {1.000 0.630 0.630 srgb} bind def
|
||||
/col29 {1.000 0.750 0.750 srgb} bind def
|
||||
/col30 {1.000 0.880 0.880 srgb} bind def
|
||||
/col31 {1.000 0.840 0.000 srgb} bind def
|
||||
|
||||
end
|
||||
save
|
||||
newpath 0 110 moveto 0 0 lineto 120 0 lineto 120 110 lineto closepath clip newpath
|
||||
-174.0 370.0 translate
|
||||
1 -1 scale
|
||||
|
||||
/cp {closepath} bind def
|
||||
/ef {eofill} bind def
|
||||
/gr {grestore} bind def
|
||||
/gs {gsave} bind def
|
||||
/sa {save} bind def
|
||||
/rs {restore} bind def
|
||||
/l {lineto} bind def
|
||||
/m {moveto} bind def
|
||||
/rm {rmoveto} bind def
|
||||
/n {newpath} bind def
|
||||
/s {stroke} bind def
|
||||
/sh {show} bind def
|
||||
/slc {setlinecap} bind def
|
||||
/slj {setlinejoin} bind def
|
||||
/slw {setlinewidth} bind def
|
||||
/srgb {setrgbcolor} bind def
|
||||
/rot {rotate} bind def
|
||||
/sc {scale} bind def
|
||||
/sd {setdash} bind def
|
||||
/ff {findfont} bind def
|
||||
/sf {setfont} bind def
|
||||
/scf {scalefont} bind def
|
||||
/sw {stringwidth} bind def
|
||||
/tr {translate} bind def
|
||||
/tnt {dup dup currentrgbcolor
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb}
|
||||
bind def
|
||||
/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul
|
||||
4 -2 roll mul srgb} bind def
|
||||
/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def
|
||||
/$F2psEnd {$F2psEnteredState restore end} def
|
||||
|
||||
$F2psBegin
|
||||
%%Page: 1 1
|
||||
10 setmiterlimit
|
||||
0.06000 0.06000 sc
|
||||
/Helvetica-Bold ff 180.00 scf sf
|
||||
3750 5100 m
|
||||
gs 1 -1 sc (B) dup sw pop 2 div neg 0 rm col0 sh gr
|
||||
% Polyline
|
||||
7.500 slw
|
||||
n 4871 5100 m 4879 5100 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2925 5400 m 4575 5400 l 4575 6150 l 2925 6150 l
|
||||
cp gs col0 s gr
|
||||
% Polyline
|
||||
n 4575 4650 m
|
||||
4875 4350 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2925 4650 m 4575 4650 l 4575 5400 l 2925 5400 l
|
||||
cp gs col0 s gr
|
||||
% Polyline
|
||||
n 2925 4650 m 3225 4350 l 4875 4350 l 4875 5100 l
|
||||
4575 5400 l gs col0 s gr
|
||||
/Helvetica-Bold ff 180.00 scf sf
|
||||
3750 5850 m
|
||||
gs 1 -1 sc (A) dup sw pop 2 div neg 0 rm col0 sh gr
|
||||
% Polyline
|
||||
n 4875 5100 m 4875 5850 l
|
||||
4575 6150 l gs col0 s gr
|
||||
$F2psEnd
|
||||
rs
|
133
en_US.ISO8859-1/articles/vm-design/fig3.eps
Normal file
133
en_US.ISO8859-1/articles/vm-design/fig3.eps
Normal file
|
@ -0,0 +1,133 @@
|
|||
%!PS-Adobe-2.0 EPSF-2.0
|
||||
%%Title: fig3.eps
|
||||
%%Creator: fig2dev Version 3.2.3 Patchlevel
|
||||
%%CreationDate: Sun Oct 8 19:53:51 2000
|
||||
%%For: nik@canyon.nothing-going-on.org (Nik Clayton)
|
||||
%%BoundingBox: 0 0 120 155
|
||||
%%Magnification: 1.0000
|
||||
%%EndComments
|
||||
/$F2psDict 200 dict def
|
||||
$F2psDict begin
|
||||
$F2psDict /mtrx matrix put
|
||||
/col-1 {0 setgray} bind def
|
||||
/col0 {0.000 0.000 0.000 srgb} bind def
|
||||
/col1 {0.000 0.000 1.000 srgb} bind def
|
||||
/col2 {0.000 1.000 0.000 srgb} bind def
|
||||
/col3 {0.000 1.000 1.000 srgb} bind def
|
||||
/col4 {1.000 0.000 0.000 srgb} bind def
|
||||
/col5 {1.000 0.000 1.000 srgb} bind def
|
||||
/col6 {1.000 1.000 0.000 srgb} bind def
|
||||
/col7 {1.000 1.000 1.000 srgb} bind def
|
||||
/col8 {0.000 0.000 0.560 srgb} bind def
|
||||
/col9 {0.000 0.000 0.690 srgb} bind def
|
||||
/col10 {0.000 0.000 0.820 srgb} bind def
|
||||
/col11 {0.530 0.810 1.000 srgb} bind def
|
||||
/col12 {0.000 0.560 0.000 srgb} bind def
|
||||
/col13 {0.000 0.690 0.000 srgb} bind def
|
||||
/col14 {0.000 0.820 0.000 srgb} bind def
|
||||
/col15 {0.000 0.560 0.560 srgb} bind def
|
||||
/col16 {0.000 0.690 0.690 srgb} bind def
|
||||
/col17 {0.000 0.820 0.820 srgb} bind def
|
||||
/col18 {0.560 0.000 0.000 srgb} bind def
|
||||
/col19 {0.690 0.000 0.000 srgb} bind def
|
||||
/col20 {0.820 0.000 0.000 srgb} bind def
|
||||
/col21 {0.560 0.000 0.560 srgb} bind def
|
||||
/col22 {0.690 0.000 0.690 srgb} bind def
|
||||
/col23 {0.820 0.000 0.820 srgb} bind def
|
||||
/col24 {0.500 0.190 0.000 srgb} bind def
|
||||
/col25 {0.630 0.250 0.000 srgb} bind def
|
||||
/col26 {0.750 0.380 0.000 srgb} bind def
|
||||
/col27 {1.000 0.500 0.500 srgb} bind def
|
||||
/col28 {1.000 0.630 0.630 srgb} bind def
|
||||
/col29 {1.000 0.750 0.750 srgb} bind def
|
||||
/col30 {1.000 0.880 0.880 srgb} bind def
|
||||
/col31 {1.000 0.840 0.000 srgb} bind def
|
||||
|
||||
end
|
||||
save
|
||||
newpath 0 155 moveto 0 0 lineto 120 0 lineto 120 155 lineto closepath clip newpath
|
||||
-174.0 370.0 translate
|
||||
1 -1 scale
|
||||
|
||||
/cp {closepath} bind def
|
||||
/ef {eofill} bind def
|
||||
/gr {grestore} bind def
|
||||
/gs {gsave} bind def
|
||||
/sa {save} bind def
|
||||
/rs {restore} bind def
|
||||
/l {lineto} bind def
|
||||
/m {moveto} bind def
|
||||
/rm {rmoveto} bind def
|
||||
/n {newpath} bind def
|
||||
/s {stroke} bind def
|
||||
/sh {show} bind def
|
||||
/slc {setlinecap} bind def
|
||||
/slj {setlinejoin} bind def
|
||||
/slw {setlinewidth} bind def
|
||||
/srgb {setrgbcolor} bind def
|
||||
/rot {rotate} bind def
|
||||
/sc {scale} bind def
|
||||
/sd {setdash} bind def
|
||||
/ff {findfont} bind def
|
||||
/sf {setfont} bind def
|
||||
/scf {scalefont} bind def
|
||||
/sw {stringwidth} bind def
|
||||
/tr {translate} bind def
|
||||
/tnt {dup dup currentrgbcolor
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb}
|
||||
bind def
|
||||
/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul
|
||||
4 -2 roll mul srgb} bind def
|
||||
/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def
|
||||
/$F2psEnd {$F2psEnteredState restore end} def
|
||||
|
||||
$F2psBegin
|
||||
%%Page: 1 1
|
||||
10 setmiterlimit
|
||||
0.06000 0.06000 sc
|
||||
/Helvetica-Bold ff 180.00 scf sf
|
||||
4125 4350 m
|
||||
gs 1 -1 sc (C2) dup sw pop 2 div neg 0 rm col0 sh gr
|
||||
% Polyline
|
||||
7.500 slw
|
||||
n 4871 5100 m 4879 5100 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2925 5400 m 4575 5400 l 4575 6150 l 2925 6150 l
|
||||
cp gs col0 s gr
|
||||
% Polyline
|
||||
n 4575 4650 m
|
||||
4875 4350 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2925 4650 m 4575 4650 l 4575 5400 l 2925 5400 l
|
||||
cp gs col0 s gr
|
||||
% Polyline
|
||||
n 4875 3600 m 4875 5100 l
|
||||
4575 5400 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2925 4650 m 2925 3900 l 3225 3600 l
|
||||
4875 3600 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2925 3900 m 4425 3900 l 4575 3900 l
|
||||
4875 3600 l gs col0 s gr
|
||||
% Polyline
|
||||
n 4575 4650 m
|
||||
4575 3900 l gs col0 s gr
|
||||
% Polyline
|
||||
n 3750 4650 m 3750 3900 l
|
||||
4050 3600 l gs col0 s gr
|
||||
/Helvetica-Bold ff 180.00 scf sf
|
||||
3750 5850 m
|
||||
gs 1 -1 sc (A) dup sw pop 2 div neg 0 rm col0 sh gr
|
||||
/Helvetica-Bold ff 180.00 scf sf
|
||||
3750 5100 m
|
||||
gs 1 -1 sc (B) dup sw pop 2 div neg 0 rm col0 sh gr
|
||||
/Helvetica-Bold ff 180.00 scf sf
|
||||
3375 4350 m
|
||||
gs 1 -1 sc (C1) dup sw pop 2 div neg 0 rm col0 sh gr
|
||||
% Polyline
|
||||
n 4875 5100 m 4875 5850 l
|
||||
4575 6150 l gs col0 s gr
|
||||
$F2psEnd
|
||||
rs
|
133
en_US.ISO8859-1/articles/vm-design/fig4.eps
Normal file
133
en_US.ISO8859-1/articles/vm-design/fig4.eps
Normal file
|
@ -0,0 +1,133 @@
|
|||
%!PS-Adobe-2.0 EPSF-2.0
|
||||
%%Title: fig4.eps
|
||||
%%Creator: fig2dev Version 3.2.3 Patchlevel
|
||||
%%CreationDate: Sun Oct 8 19:55:53 2000
|
||||
%%For: nik@canyon.nothing-going-on.org (Nik Clayton)
|
||||
%%BoundingBox: 0 0 120 155
|
||||
%%Magnification: 1.0000
|
||||
%%EndComments
|
||||
/$F2psDict 200 dict def
|
||||
$F2psDict begin
|
||||
$F2psDict /mtrx matrix put
|
||||
/col-1 {0 setgray} bind def
|
||||
/col0 {0.000 0.000 0.000 srgb} bind def
|
||||
/col1 {0.000 0.000 1.000 srgb} bind def
|
||||
/col2 {0.000 1.000 0.000 srgb} bind def
|
||||
/col3 {0.000 1.000 1.000 srgb} bind def
|
||||
/col4 {1.000 0.000 0.000 srgb} bind def
|
||||
/col5 {1.000 0.000 1.000 srgb} bind def
|
||||
/col6 {1.000 1.000 0.000 srgb} bind def
|
||||
/col7 {1.000 1.000 1.000 srgb} bind def
|
||||
/col8 {0.000 0.000 0.560 srgb} bind def
|
||||
/col9 {0.000 0.000 0.690 srgb} bind def
|
||||
/col10 {0.000 0.000 0.820 srgb} bind def
|
||||
/col11 {0.530 0.810 1.000 srgb} bind def
|
||||
/col12 {0.000 0.560 0.000 srgb} bind def
|
||||
/col13 {0.000 0.690 0.000 srgb} bind def
|
||||
/col14 {0.000 0.820 0.000 srgb} bind def
|
||||
/col15 {0.000 0.560 0.560 srgb} bind def
|
||||
/col16 {0.000 0.690 0.690 srgb} bind def
|
||||
/col17 {0.000 0.820 0.820 srgb} bind def
|
||||
/col18 {0.560 0.000 0.000 srgb} bind def
|
||||
/col19 {0.690 0.000 0.000 srgb} bind def
|
||||
/col20 {0.820 0.000 0.000 srgb} bind def
|
||||
/col21 {0.560 0.000 0.560 srgb} bind def
|
||||
/col22 {0.690 0.000 0.690 srgb} bind def
|
||||
/col23 {0.820 0.000 0.820 srgb} bind def
|
||||
/col24 {0.500 0.190 0.000 srgb} bind def
|
||||
/col25 {0.630 0.250 0.000 srgb} bind def
|
||||
/col26 {0.750 0.380 0.000 srgb} bind def
|
||||
/col27 {1.000 0.500 0.500 srgb} bind def
|
||||
/col28 {1.000 0.630 0.630 srgb} bind def
|
||||
/col29 {1.000 0.750 0.750 srgb} bind def
|
||||
/col30 {1.000 0.880 0.880 srgb} bind def
|
||||
/col31 {1.000 0.840 0.000 srgb} bind def
|
||||
|
||||
end
|
||||
save
|
||||
newpath 0 155 moveto 0 0 lineto 120 0 lineto 120 155 lineto closepath clip newpath
|
||||
-174.0 370.0 translate
|
||||
1 -1 scale
|
||||
|
||||
/cp {closepath} bind def
|
||||
/ef {eofill} bind def
|
||||
/gr {grestore} bind def
|
||||
/gs {gsave} bind def
|
||||
/sa {save} bind def
|
||||
/rs {restore} bind def
|
||||
/l {lineto} bind def
|
||||
/m {moveto} bind def
|
||||
/rm {rmoveto} bind def
|
||||
/n {newpath} bind def
|
||||
/s {stroke} bind def
|
||||
/sh {show} bind def
|
||||
/slc {setlinecap} bind def
|
||||
/slj {setlinejoin} bind def
|
||||
/slw {setlinewidth} bind def
|
||||
/srgb {setrgbcolor} bind def
|
||||
/rot {rotate} bind def
|
||||
/sc {scale} bind def
|
||||
/sd {setdash} bind def
|
||||
/ff {findfont} bind def
|
||||
/sf {setfont} bind def
|
||||
/scf {scalefont} bind def
|
||||
/sw {stringwidth} bind def
|
||||
/tr {translate} bind def
|
||||
/tnt {dup dup currentrgbcolor
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb}
|
||||
bind def
|
||||
/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul
|
||||
4 -2 roll mul srgb} bind def
|
||||
/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def
|
||||
/$F2psEnd {$F2psEnteredState restore end} def
|
||||
|
||||
$F2psBegin
|
||||
%%Page: 1 1
|
||||
10 setmiterlimit
|
||||
0.06000 0.06000 sc
|
||||
/Helvetica-Bold ff 180.00 scf sf
|
||||
3375 4350 m
|
||||
gs 1 -1 sc (C1) dup sw pop 2 div neg 0 rm col0 sh gr
|
||||
% Polyline
|
||||
7.500 slw
|
||||
n 4871 5100 m 4879 5100 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2925 5400 m 4575 5400 l 4575 6150 l 2925 6150 l
|
||||
cp gs col0 s gr
|
||||
% Polyline
|
||||
n 4575 4650 m
|
||||
4875 4350 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2925 4650 m 4575 4650 l 4575 5400 l 2925 5400 l
|
||||
cp gs col0 s gr
|
||||
% Polyline
|
||||
n 4875 4350 m 4875 5100 l
|
||||
4575 5400 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2925 4650 m 2925 3900 l 3225 3600 l
|
||||
4050 3600 l gs col0 s gr
|
||||
% Polyline
|
||||
n 3750 4650 m 3750 3900 l
|
||||
4050 3600 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2925 3900 m
|
||||
3750 3900 l gs col0 s gr
|
||||
% Polyline
|
||||
n 3750 4650 m 4050 4350 l
|
||||
4875 4350 l gs col0 s gr
|
||||
% Polyline
|
||||
n 4050 4350 m
|
||||
4050 3600 l gs col0 s gr
|
||||
/Helvetica-Bold ff 180.00 scf sf
|
||||
3750 5850 m
|
||||
gs 1 -1 sc (A) dup sw pop 2 div neg 0 rm col0 sh gr
|
||||
/Helvetica-Bold ff 180.00 scf sf
|
||||
3750 5100 m
|
||||
gs 1 -1 sc (B) dup sw pop 2 div neg 0 rm col0 sh gr
|
||||
% Polyline
|
||||
n 4875 5100 m 4875 5850 l
|
||||
4575 6150 l gs col0 s gr
|
||||
$F2psEnd
|
||||
rs
|
16
en_US.ISO_8859-1/articles/vm-design/Makefile
Normal file
16
en_US.ISO_8859-1/articles/vm-design/Makefile
Normal file
|
@ -0,0 +1,16 @@
|
|||
# $FreeBSD: doc/en_US.ISO_8859-1/articles/mh/Makefile,v 1.8 1999/09/06 06:52:37 peter Exp $
|
||||
|
||||
DOC?= article
|
||||
|
||||
FORMATS?= html
|
||||
|
||||
IMAGES= fig1.eps fig2.eps fig3.eps fig4.eps
|
||||
|
||||
INSTALL_COMPRESSED?=gz
|
||||
INSTALL_ONLY_COMPRESSED?=
|
||||
|
||||
SRCS= article.sgml
|
||||
|
||||
DOC_PREFIX?= ${.CURDIR}/../../..
|
||||
|
||||
.include "${DOC_PREFIX}/share/mk/doc.project.mk"
|
838
en_US.ISO_8859-1/articles/vm-design/article.sgml
Normal file
838
en_US.ISO_8859-1/articles/vm-design/article.sgml
Normal file
|
@ -0,0 +1,838 @@
|
|||
<!-- $FreeBSD: doc/en_US.ISO_8859-1/articles/mh/article.sgml,v 1.7 1999/10/10 20:20:38 jhb Exp $ -->
|
||||
<!-- FreeBSD Documentation Project -->
|
||||
|
||||
<!DOCTYPE ARTICLE PUBLIC "-//FreeBSD//DTD DocBook V3.1-Based Extension//EN" [
|
||||
<!ENTITY % man PUBLIC "-//FreeBSD//ENTITIES DocBook Manual Page Entities//EN">
|
||||
%man;
|
||||
]>
|
||||
|
||||
<article>
|
||||
<artheader>
|
||||
<title>Design elements of the FreeBSD VM system</title>
|
||||
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>Matthew</firstname>
|
||||
|
||||
<surname>Dillon</surname>
|
||||
|
||||
<affiliation>
|
||||
<address>
|
||||
<email>dillon@apollo.backplane.com</email>
|
||||
</address>
|
||||
</affiliation>
|
||||
</author>
|
||||
</authorgroup>
|
||||
|
||||
<abstract>
|
||||
<para>The title is really just a fancy way of saying that I am going to
|
||||
attempt to describe the whole VM enchilada, hopefully in a way that
|
||||
everyone can follow. For the last year I have concentrated on a number
|
||||
of major kernel subsystems within FreeBSD, with the VM and Swap
|
||||
subsystems being the most interesting and NFS being ‘a necessary
|
||||
chore’. I rewrote only small portions of the code. In the VM
|
||||
arena the only major rewrite I have done is to the swap subsystem.
|
||||
Most of my work was cleanup and maintenance, with only moderate code
|
||||
rewriting and no major algorithmic adjustments within the VM
|
||||
subsystem. The bulk of the VM subsystem's theoretical base remains
|
||||
unchanged and a lot of the credit for the modernization effort in the
|
||||
last few years belongs to John Dyson and David Greenman. Not being a
|
||||
historian like Kirk I will not attempt to tag all the various features
|
||||
with peoples names, since I will invariably get it wrong.</para>
|
||||
</abstract>
|
||||
|
||||
<legalnotice>
|
||||
<para>This article was originally published in the January 2000 issue of
|
||||
<ulink url="http://www.daemonnews.org/">DaemonNews</ulink>. This
|
||||
version of the article may include updates from Matt and other authors
|
||||
to reflect changes in FreeBSD's VM implementation.</para>
|
||||
</legalnotice>
|
||||
</artheader>
|
||||
|
||||
<sect1>
|
||||
<title>Introduction</title>
|
||||
|
||||
<para>Before moving along to the actual design let's spend a little time
|
||||
on the necessity of maintaining and modernizing any long-living
|
||||
codebase. In the programming world, algorithms tend to be more
|
||||
important than code and it is precisely due to BSD's academic roots that
|
||||
a great deal of attention was paid to algorithm design from the
|
||||
beginning. More attention paid to the design generally leads to a clean
|
||||
and flexible codebase that can be fairly easily modified, extended, or
|
||||
replaced over time. While BSD is considered an ‘old’
|
||||
operating system by some people, those of us who work on it tend to view
|
||||
it more as a ‘mature’ codebase which has various components
|
||||
modified, extended, or replaced with modern code. It has evolved, and
|
||||
FreeBSD is at the bleeding edge no matter how old some of the code might
|
||||
be. This is an important distinction to make and one that is
|
||||
unfortunately lost to many people. The biggest error a programmer can
|
||||
make is to not learn from history, and this is precisely the error that
|
||||
many other modern operating systems have made. NT is the best example
|
||||
of this, and the consequences have been dire. Linux also makes this
|
||||
mistake to some degree—enough that we BSD folk can make small
|
||||
jokes about it every once in a while, anyway. Linux's problem is simply
|
||||
one of a lack of experience and history to compare ideas against, a
|
||||
problem that is easily and rapidly being addressed by the Linux
|
||||
community in the same way it has been addressed in the BSD
|
||||
community—by continuous code development. The NT folk, on the
|
||||
other hand, repeatedly make the same mistakes solved by UNIX decades ago
|
||||
and then spend years fixing them. Over and over again. They have a
|
||||
severe case of ‘not designed here’ and ‘we are always
|
||||
right because our marketing department says so’. I have little
|
||||
tolerance for anyone who cannot learn from history.</para>
|
||||
|
||||
<para>Much of the apparent complexity of the FreeBSD design, especially in
|
||||
the VM/Swap subsystem, is a direct result of having to solve serious
|
||||
performance issues that occur under various conditions. These issues
|
||||
are not due to bad algorithmic design but instead rise from
|
||||
environmental factors. In any direct comparison between platforms,
|
||||
these issues become most apparent when system resources begin to get
|
||||
stressed. As I describe FreeBSD's VM/Swap subsystem the reader should
|
||||
always keep two points in mind. First, the most important aspect of
|
||||
performance design is what is known as “Optimizing the Critical
|
||||
Path”. It is often the case that performance optimizations add a
|
||||
little bloat to the code in order to make the critical path perform
|
||||
better. Second, a solid, generalized design outperforms a
|
||||
heavily-optimized design over the long run. While a generalized design
|
||||
may end up being slower than an heavily-optimized design when they are
|
||||
first implemented, the generalized design tends to be easier to adapt to
|
||||
changing conditions and the heavily-optimized design winds up having to
|
||||
be thrown away. Any codebase that will survive and be maintainable for
|
||||
years must therefore be designed properly from the beginning even if it
|
||||
costs some performance. Twenty years ago people were still arguing that
|
||||
programming in assembly was better than programming in a high-level
|
||||
language because it produced code that was ten times as fast. Today,
|
||||
the fallibility of that argument is obvious—as are the parallels
|
||||
to algorithmic design and code generalization.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>VM Objects</title>
|
||||
|
||||
<para>The best way to begin describing the FreeBSD VM system is to look at
|
||||
it from the perspective of a user-level process. Each user process sees
|
||||
a single, private, contiguous VM address space containing several types
|
||||
of memory objects. These objects have various characteristics. Program
|
||||
code and program data are effectively a single memory-mapped file (the
|
||||
binary file being run), but program code is read-only while program data
|
||||
is copy-on-write. Program BSS is just memory allocated and filled with
|
||||
zeros on demand, called demand zero page fill. Arbitrary files can be
|
||||
memory-mapped into the address space as well, which is how the shared
|
||||
library mechanism works. Such mappings can require modifications to
|
||||
remain private to the process making them. The fork system call adds an
|
||||
entirely new dimension to the VM management problem on top of the
|
||||
complexity already given.</para>
|
||||
|
||||
<para>A program binary data page (which is a basic copy-on-write page)
|
||||
illustrates the complexity. A program binary contains a preinitialized
|
||||
data section which is initially mapped directly from the program file.
|
||||
When a program is loaded into a process's VM space, this area is
|
||||
initially memory-mapped and backed by the program binary itself,
|
||||
allowing the VM system to free/reuse the page and later load it back in
|
||||
from the binary. The moment a process modifies this data, however, the
|
||||
VM system must make a private copy of the page for that process. Since
|
||||
the private copy has been modified, the VM system may no longer free it,
|
||||
because there is no longer any way to restore it later on.</para>
|
||||
|
||||
<para>You will notice immediately that what was originally a simple file
|
||||
mapping has become much more complex. Data may be modified on a
|
||||
page-by-page basis whereas the file mapping encompasses many pages at
|
||||
once. The complexity further increases when a process forks. When a
|
||||
process forks, the result is two processes—each with their own
|
||||
private address spaces, including any modifications made by the original
|
||||
process prior to the call to <function>fork()</function>. It would be
|
||||
silly for the VM system to make a complete copy of the data at the time
|
||||
of the <function>fork()</function> because it is quite possible that at
|
||||
least one of the two processes will only need to read from that page
|
||||
from then on, allowing the original page to continue to be used. What
|
||||
was a private page is made copy-on-write again, since each process
|
||||
(parent and child) expects their own personal post-fork modifications to
|
||||
remain private to themselves and not effect the other.</para>
|
||||
|
||||
<para>FreeBSD manages all of this with a layered VM Object model. The
|
||||
original binary program file winds up being the lowest VM Object layer.
|
||||
A copy-on-write layer is pushed on top of that to hold those pages which
|
||||
had to be copied from the original file. If the program modifies a data
|
||||
page belonging to the original file the VM system takes a fault and
|
||||
makes a copy of the page in the higher layer. When a process forks,
|
||||
additional VM Object layers are pushed on. This might make a little
|
||||
more sense with a fairly basic example. A <function>fork()</function>
|
||||
is a common operation for any *BSD system, so this example will consider
|
||||
a program that starts up, and forks. When the process starts, the VM
|
||||
system creates an object layer, let's call this A:</para>
|
||||
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata fileref="fig1">
|
||||
</imageobject>
|
||||
|
||||
<textobject>
|
||||
<literallayout>+---------------+
|
||||
| A |
|
||||
+---------------+</literallayout>
|
||||
</textobject>
|
||||
|
||||
<textobject>
|
||||
<phrase>A picture</phrase>
|
||||
</textobject>
|
||||
</mediaobject>
|
||||
|
||||
<para>A represents the file—pages may be paged in and out of the
|
||||
file's physical media as necessary. Paging in from the disk is
|
||||
reasonable for a program, but we really don't want to page back out and
|
||||
overwrite the executable. The VM system therefore creates a second
|
||||
layer, B, that will be physically backed by swap space:</para>
|
||||
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata fileref="fig2">
|
||||
</imageobject>
|
||||
|
||||
<textobject>
|
||||
<literallayout>+---------------+
|
||||
| B |
|
||||
+---------------+
|
||||
| A |
|
||||
+---------------+</literallayout>
|
||||
</textobject>
|
||||
</mediaobject>
|
||||
|
||||
<para>On the first write to a page after this, a new page is created in B,
|
||||
and its contents are initialized from A. All pages in B can be paged in
|
||||
or out to a swap device. When the program forks, the VM system creates
|
||||
two new object layers—C1 for the parent, and C2 for the
|
||||
child—that rest on top of B:</para>
|
||||
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata fileref="fig3">
|
||||
</imageobject>
|
||||
|
||||
<textobject>
|
||||
<literallayout>+-------+-------+
|
||||
| C1 | C2 |
|
||||
+-------+-------+
|
||||
| B |
|
||||
+---------------+
|
||||
| A |
|
||||
+---------------+</literallayout>
|
||||
</textobject>
|
||||
</mediaobject>
|
||||
|
||||
<para>In this case, let's say a page in B is modified by the original
|
||||
parent process. The process will take a copy-on-write fault and
|
||||
duplicate the page in C1, leaving the original page in B untouched.
|
||||
Now, let's say the same page in B is modified by the child process. The
|
||||
process will take a copy-on-write fault and duplicate the page in C2.
|
||||
The original page in B is now completely hidden since both C1 and C2
|
||||
have a copy and B could theoretically be destroyed if it does not
|
||||
represent a 'real' file). However, this sort of optimization is not
|
||||
trivial to make because it is so fine-grained. FreeBSD does not make
|
||||
this optimization. Now, suppose (as is often the case) that the child
|
||||
process does an <function>exec()</function>. Its current address space
|
||||
is usually replaced by a new address space representing a new file. In
|
||||
this case, the C2 layer is destroyed:</para>
|
||||
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata fileref="fig4">
|
||||
</imageobject>
|
||||
|
||||
<textobject>
|
||||
<literallayout>+-------+
|
||||
| C1 |
|
||||
+-------+-------+
|
||||
| B |
|
||||
+---------------+
|
||||
| A |
|
||||
+---------------+</literallayout>
|
||||
</textobject>
|
||||
</mediaobject>
|
||||
|
||||
<para>In this case, the number of children of B drops to one, and all
|
||||
accesses to B now go through C1. This means that B and C1 can be
|
||||
collapsed together. Any pages in B that also exist in C1 are deleted
|
||||
from B during the collapse. Thus, even though the optimization in the
|
||||
previous step could not be made, we can recover the dead pages when
|
||||
either of the processes exit or <function>exec()</function>.</para>
|
||||
|
||||
<para>This model creates a number of potential problems. The first is that
|
||||
you can wind up with a relatively deep stack of layered VM Objects which
|
||||
can cost scanning time and memory when you when you take a fault. Deep
|
||||
layering can occur when processes fork and then fork again (either
|
||||
parent or child). The second problem is that you can wind up with dead,
|
||||
inaccessible pages deep in the stack of VM Objects. In our last example
|
||||
if both the parent and child processes modify the same page, they both
|
||||
get their own private copies of the page and the original page in B is
|
||||
no longer accessible by anyone. That page in B can be freed.</para>
|
||||
|
||||
<para>FreeBSD solves the deep layering problem with a special optimization
|
||||
called the “All Shadowed Case”. This case occurs if either
|
||||
C1 or C2 take sufficient COW faults to completely shadow all pages in B.
|
||||
Lets say that C1 achieves this. C1 can now bypass B entirely, so rather
|
||||
then have C1->B->A and C2->B->A we now have C1->A and C2->B->A. But
|
||||
look what also happened—now B has only one reference (C2), so we
|
||||
can collapse B and C2 together. The end result is that B is deleted
|
||||
entirely and we have C1->A and C2->A. It is often the case that B will
|
||||
contain a large number of pages and neither C1 nor C2 will be able to
|
||||
completely overshadow it. If we fork again and create a set of D
|
||||
layers, however, it is much more likely that one of the D layers will
|
||||
eventually be able to completely overshadow the much smaller dataset
|
||||
reprsented by C1 or C2. The same optimization will work at any point in
|
||||
the graph and the grand result of this is that even on a heavily forked
|
||||
machine VM Object stacks tend to not get much deeper then 4. This is
|
||||
true of both the parent and the children and true whether the parent is
|
||||
doing the forking or whether the children cascade forks.</para>
|
||||
|
||||
<para>The dead page problem still exists in the case where C1 or C2 do not
|
||||
completely overshadow B. Due to our other optimizations this case does
|
||||
not represent much of a problem and we simply allow the pages to be
|
||||
dead. If the system runs low on memory it will swap them out, eating a
|
||||
little swap, but that's it.</para>
|
||||
|
||||
<para>The advantage to the VM Object model is that
|
||||
<function>fork()</function> is extremely fast, since no real data
|
||||
copying need take place. The disadvantage is that you can build a
|
||||
relatively complex VM Object layering that slows page fault handling
|
||||
down a little, and you spend memory managing the VM Object structures.
|
||||
The optimizations FreeBSD makes proves to reduce the problems enough
|
||||
that they can be ignored, leaving no real disadvantage.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>SWAP Layers</title>
|
||||
|
||||
<para>Private data pages are initially either copy-on-write or zero-fill
|
||||
pages. When a change, and therefore a copy, is made, the original
|
||||
backing object (usually a file) can no longer be used to save a copy of
|
||||
the page when the VM system needs to reuse it for other purposes. This
|
||||
is where SWAP comes in. SWAP is allocated to create backing store for
|
||||
memory that does not otherwise have it. FreeBSD allocates the swap
|
||||
management structure for a VM Object only when it is actually needed.
|
||||
However, the swap management structure has had problems
|
||||
historically.</para>
|
||||
|
||||
<para>Under FreeBSD 3.x the swap management structure preallocates an
|
||||
array that encompasses the entire object requiring swap backing
|
||||
store—even if only a few pages of that object are swap-backed.
|
||||
This creates a kernel memory fragmentation problem when large objects
|
||||
are mapped, or processes with large runsizes (RSS) fork. Also, in order
|
||||
to keep track of swap space, a ‘list of holes’ is kept in
|
||||
kernel memory, and this tends to get severely fragmented as well. Since
|
||||
the 'list of holes' is a linear list, the swap allocation and freeing
|
||||
performance is a non-optimal O(n)-per-page. It also requires kernel
|
||||
memory allocations to take place during the swap freeing process, and
|
||||
that creates low memory deadlock problems. The problem is further
|
||||
exacerbated by holes created due to the interleaving algorithm. Also,
|
||||
the swap block map can become fragmented fairly easily resulting in
|
||||
non-contiguous allocations. Kernel memory must also be allocated on the
|
||||
fly for additional swap management structures when a swapout occurs. It
|
||||
is evident that there was plenty of room for improvement.</para>
|
||||
|
||||
<para>For FreeBSD 4.x, I completely rewrote the swap subsystem. With this
|
||||
rewrite, swap management structures are allocated through a hash table
|
||||
rather than a linear array giving them a fixed allocation size and much
|
||||
finer granularity. Rather then using a linearly linked list to keep
|
||||
track of swap space reservations, it now uses a bitmap of swap blocks
|
||||
arranged in a radix tree structure with free-space hinting in the radix
|
||||
node structures. This effectively makes swap allocation and freeing an
|
||||
O(1) operation. The entire radix tree bitmap is also preallocated in
|
||||
order to avoid having to allocate kernel memory during critical low
|
||||
memory swapping operations. After all, the system tends to swap when it
|
||||
is low on memory so we should avoid allocating kernel memory at such
|
||||
times in order to avoid potential deadlocks. Finally, to reduce
|
||||
fragmentation the radix tree is capable of allocating large contiguous
|
||||
chunks at once, skipping over smaller fragmented chunks. I did not take
|
||||
the final step of having an 'allocating hint pointer' that would trundle
|
||||
through a portion of swap as allocations were made in order to further
|
||||
guarantee contiguous allocations or at least locality of reference, but
|
||||
I ensured that such an addition could be made.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>When to free a page</title>
|
||||
|
||||
<para>Since the VM system uses all available memory for disk caching,
|
||||
there are usually very few truly-free pages. The VM system depends on
|
||||
being able to properly choose pages which are not in use to reuse for
|
||||
new allocations. Selecting the optimal pages to free is possibly the
|
||||
single-most important function any VM system can perform because if it
|
||||
makes a poor selection, the VM system may be forced to unnecessarily
|
||||
retrieve pages from disk, seriously degrading system performance.</para>
|
||||
|
||||
<para>How much overhead are we willing to suffer in the critical path to
|
||||
avoid freeing the wrong page? Each wrong choice we make will cost us
|
||||
hundreds of thousands of CPU cycles and a noticeable stall of the
|
||||
affected processes, so we are willing to endure a significant amount of
|
||||
overhead in order to be sure that the right page is chosen. This is why
|
||||
FreeBSD tends to outperform other systems when memory resources become
|
||||
stressed.</para>
|
||||
|
||||
<para>The free page determination algorithm is built upon a history of the
|
||||
use of memory pages. To acquire this history, the system takes advantage
|
||||
of a page-used bit feature that most hardware page tables have.</para>
|
||||
|
||||
<para>In any case, the page-used bit is cleared and at some later point
|
||||
the VM system comes across the page again and sees that the page-used
|
||||
bit has been set. This indicates that the page is still being actively
|
||||
used. If the bit is still clear it is an indication that the page is not
|
||||
being actively used. By testing this bit periodically, a use history (in
|
||||
the form of a counter) for the physical page is developed. When the VM
|
||||
system later needs to free up some pages, checking this history becomes
|
||||
the cornerstone of determining the best candidate page to reuse.</para>
|
||||
|
||||
<sidebar>
|
||||
<title>What if the hardware has no page-used bit?</title>
|
||||
|
||||
<para>For those platforms that do not have this feature, the system
|
||||
actually emulates a page-used bit. It unmaps or protects a page,
|
||||
forcing a page fault if the page is accessed again. When the page
|
||||
fault is taken, the system simply marks the page as having been used
|
||||
and unprotects the page so that it may be used. While taking such page
|
||||
faults just to determine if a page is being used appears to be an
|
||||
expensive proposition, it is much less expensive than reusing the page
|
||||
for some other purpose only to find that a process needs it back and
|
||||
then have to go to disk.</para>
|
||||
</sidebar>
|
||||
|
||||
<para>FreeBSD makes use of several page queues to further refine the
|
||||
selection of pages to reuse as well as to determine when dirty pages
|
||||
must be flushed to their backing store. Since page tables are dynamic
|
||||
entities under FreeBSD, it costs virtually nothing to unmap a page from
|
||||
the address space of any processes using it. When a page candidate has
|
||||
been chosen based on the page-use counter, this is precisely what is
|
||||
done. The system must make a distinction between clean pages which can
|
||||
theoretically be freed up at any time, and dirty pages which must first
|
||||
be written to their backing store before being reusable. When a page
|
||||
candidate has been found it is moved to the inactive queue if it is
|
||||
dirty, or the cache queue if it is clean. A separate algorithm based on
|
||||
the dirty-to-clean page ratio determines when dirty pages in the
|
||||
inactive queue must be flushed to disk. Once this is accomplished, the
|
||||
flushed pages are moved from the inactive queue to the cache queue. At
|
||||
this point, pages in the cache queue can still be reactivated by a VM
|
||||
fault at relatively low cost. However, pages in the cache queue are
|
||||
considered to be ‘immediately freeable’ and will be reused
|
||||
in an LRU (least-recently used) fashion when the system needs to
|
||||
allocate new memory.</para>
|
||||
|
||||
<para>It is important to note that the FreeBSD VM system attempts to
|
||||
separate clean and dirty pages for the express reason of avoiding
|
||||
unnecessary flushes of dirty pages (which eats I/O bandwidth), nor does
|
||||
it move pages between the various page queues gratuitously when the
|
||||
memory subsystem is not being stressed. This is why you will see some
|
||||
systems with very low cache queue counts and high active queue counts
|
||||
when doing a <command>systat -vm</command> command. As the VM system
|
||||
becomes more stressed, it makes a greater effort to maintain the various
|
||||
page queues at the levels determined to be the most effective. An urban
|
||||
myth has circulated for years that Linux did a better job avoiding
|
||||
swapouts than FreeBSD, but this in fact is not true. What was actually
|
||||
occurring was that FreeBSD was proactively paging out unused pages in
|
||||
order to make room for more disk cache while Linux was keeping unused
|
||||
pages in core and leaving less memory available for cache and process
|
||||
pages. I don't know whether this is still true today.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>Pre-Faulting and Zeroing Optimizations</title>
|
||||
|
||||
<para>Taking a VM fault is not expensive if the underlying page is already
|
||||
in core and can simply be mapped into the process, but it can become
|
||||
expensive if you take a whole lot of them on a regular basis. A good
|
||||
example of this is running a program such as &man.ls.1; or &man.ps.1;
|
||||
over and over again. If the program binary is mapped into memory but
|
||||
not mapped into the page table, then all the pages that will be accessed
|
||||
by the program will have to be faulted in every time the program is run.
|
||||
This is unnecessary when the pages in question are already in the VM
|
||||
Cache, so FreeBSD will attempt to pre-populate a process's page tables
|
||||
with those pages that are already in the VM Cache. One thing that
|
||||
FreeBSD does not yet do is pre-copy-on-write certain pages on exec. For
|
||||
example, if you run the &man.ls.1; program while running <command>vmstat
|
||||
1</command> you will notice that it always takes a certain number of
|
||||
page faults, even when you run it over and over again. These are
|
||||
zero-fill faults, not program code faults (which were pre-faulted in
|
||||
already). Pre-copying pages on exec or fork is an area that could use
|
||||
more study.</para>
|
||||
|
||||
<para>A large percentage of page faults that occur are zero-fill faults.
|
||||
You can usually see this by observing the <command>vmstat -s</command>
|
||||
output. These occur when a process accesses pages in its BSS area. The
|
||||
BSS area is expected to be initially zero but the VM system does not
|
||||
bother to allocate any memory at all until the process actually accesses
|
||||
it. When a fault occurs the VM system must not only allocate a new page,
|
||||
it must zero it as well. To optimize the zeroing operation the VM system
|
||||
has the ability to pre-zero pages and mark them as such, and to request
|
||||
pre-zeroed pages when zero-fill faults occur. The pre-zeroing occurs
|
||||
whenever the CPU is idle but the number of pages the system pre-zeros is
|
||||
limited in order to avoid blowing away the memory caches. This is an
|
||||
excellent example of adding complexity to the VM system in order to
|
||||
optimize the critical path.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>Page Table Optimizations</title>
|
||||
|
||||
<para>The page table optimizations make up the most contentious part of
|
||||
the FreeBSD VM design and they have shown some strain with the advent of
|
||||
serious use of <function>mmap()</function>. I think this is actually a
|
||||
feature of most BSDs though I am not sure when it was first introduced.
|
||||
There are two major optimizations. The first is that hardware page
|
||||
tables do not contain persistent state but instead can be thrown away at
|
||||
any time with only a minor amount of management overhead. The second is
|
||||
that every active page table entry in the system has a governing
|
||||
<literal>pv_entry</literal> structure which is tied into the
|
||||
<literal>vm_page</literal> structure. FreeBSD can simply iterate
|
||||
through those mappings that are known to exist while Linux must check
|
||||
all page tables that <emphasis>might</emphasis> contain a specific
|
||||
mapping to see if it does, which can achieve O(n^2) overhead in certain
|
||||
situations. It is because of this that FreeBSD tends to make better
|
||||
choices on which pages to reuse or swap when memory is stressed, giving
|
||||
it better performance under load. However, FreeBSD requires kernel
|
||||
tuning to accommodate large-shared-address-space situations such as
|
||||
those that can occur in a news system because it may run out of
|
||||
<literal>pv_entry</literal> structures.</para>
|
||||
|
||||
<para>Both Linux and FreeBSD need work in this area. FreeBSD is trying to
|
||||
maximize the advantage of a potentially sparse active-mapping model (not
|
||||
all processes need to map all pages of a shared library, for example),
|
||||
whereas Linux is trying to simplify its algorithms. FreeBSD generally
|
||||
has the performance advantage here at the cost of wasting a little extra
|
||||
memory, but FreeBSD breaks down in the case where a large file is
|
||||
massively shared across hundreds of processes. Linux, on the other hand,
|
||||
breaks down in the case where many processes are sparsely-mapping the
|
||||
same shared library and also runs non-optimally when trying to determine
|
||||
whether a page can be reused or not.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>Page Coloring</title>
|
||||
|
||||
<para>We'll end with the page coloring optimizations. Page coloring is a
|
||||
performance optimization designed to ensure that accesses to contiguous
|
||||
pages in virtual memory make the best use of the processor cache. In
|
||||
ancient times (i.e. 10+ years ago) processor caches tended to map
|
||||
virtual memory rather than physical memory. This led to a huge number of
|
||||
problems including having to clear the cache on every context switch in
|
||||
some cases, and problems with data aliasing in the cache. Modern
|
||||
processor caches map physical memory precisely to solve those problems.
|
||||
This means that two side-by-side pages in a processes address space may
|
||||
not correspond to two side-by-side pages in the cache. In fact, if you
|
||||
aren't careful side-by-side pages in virtual memory could wind up using
|
||||
the same page in the processor cache—leading to cacheable data
|
||||
being thrown away prematurely and reducing CPU performance. This is true
|
||||
even with multi-way set-associative caches (though the effect is
|
||||
mitigated somewhat).</para>
|
||||
|
||||
<para>FreeBSD's memory allocation code implements page coloring
|
||||
optimizations, which means that the memory allocation code will attempt
|
||||
to locate free pages that are contiguous from the point of view of the
|
||||
cache. For example, if page 16 of physical memory is assigned to page 0
|
||||
of a process's virtual memory and the cache can hold 4 pages, the page
|
||||
coloring code will not assign page 20 of physical memory to page 1 of a
|
||||
process's virtual memory. It would, instead, assign page 21 of physical
|
||||
memory. The page coloring code attempts to avoid assigning page 20
|
||||
because this maps over the same cache memory as page 16 and would result
|
||||
in non-optimal caching. This code adds a significant amount of
|
||||
complexity to the VM memory allocation subsystem as you can well
|
||||
imagine, but the result is well worth the effort. Page Coloring makes VM
|
||||
memory as deterministic as physical memory in regards to cache
|
||||
performance.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>Conclusion</title>
|
||||
|
||||
<para>Virtual memory in modern operating systems must address a number of
|
||||
different issues efficiently and for many different usage patterns. The
|
||||
modular and algorithmic approach that BSD has historically taken allows
|
||||
us to study and understand the current implementation as well as
|
||||
relatively cleanly replace large sections of the code. There have been a
|
||||
number of improvements to the FreeBSD VM system in the last several
|
||||
years, and work is ongoing.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>Bonus QA session by Allen Briggs
|
||||
<email>briggs@ninthwonder.com</email></title>
|
||||
|
||||
<qandaset>
|
||||
<qandaentry>
|
||||
<question>
|
||||
<para>What is “the interleaving algorithm” that you
|
||||
refer to in your listing of the ills of the FreeBSD 3.x swap
|
||||
arrangments?</para>
|
||||
</question>
|
||||
|
||||
<answer>
|
||||
<para>FreeBSD uses a fixed swap interleave which defaults to 4. This
|
||||
means that FreeBSD reserves space for four swap areas even if you
|
||||
only have one, two, or three. Since swap is interleaved the linear
|
||||
address space representing the ‘four swap areas’ will be
|
||||
fragmented if you don't actually have four swap areas. For
|
||||
example, if you have two swap areas A and B FreeBSD's address
|
||||
space representation for that swap area will be interleaved in
|
||||
blocks of 16 pages:</para>
|
||||
|
||||
<literallayout>A B C D A B C D A B C D A B C D</literallayout>
|
||||
|
||||
<para>FreeBSD 3.x uses a ‘sequential list of free
|
||||
regions’ approach to accounting for the free swap areas.
|
||||
The idea is that large blocks of free linear space can be
|
||||
represented with a single list node
|
||||
(<filename>kern/subr_rlist.c</filename>). But due to the
|
||||
fragmentation the sequential list winds up being insanely
|
||||
fragmented. In the above example, completely unused swap will
|
||||
have A and B shown as ‘free’ and C and D shown as
|
||||
‘all allocated’. Each A-B sequence requires a list
|
||||
node to account for because C and D are holes, so the list node
|
||||
cannot be combined with the next A-B sequence.</para>
|
||||
|
||||
<para>Why do we interleave our swap space instead of just tack swap
|
||||
areas onto the end and do something fancier? Because it's a whole
|
||||
lot easier to allocate linear swaths of an address space and have
|
||||
the result automatically be interleaved across multiple disks than
|
||||
it is to try to put that sophistication elsewhere.</para>
|
||||
|
||||
<para>The fragmentation causes other problems. Being a linear list
|
||||
under 3.x, and having such a huge amount of inherent
|
||||
fragmentation, allocating and freeing swap winds up being an O(N)
|
||||
algorithm instead of an O(1) algorithm. Combined with other
|
||||
factors (heavy swapping) and you start getting into O(N^2) and
|
||||
O(N^3) levels of overhead, which is bad. The 3.x system may also
|
||||
need to allocate KVM during a swap operation to create a new list
|
||||
node which can lead to a deadlock if the system is trying to
|
||||
pageout pages in a low-memory situation.</para>
|
||||
|
||||
<para>Under 4.x we do not use a sequential list. Instead we use a
|
||||
radix tree and bitmaps of swap blocks rather than ranged list
|
||||
nodes. We take the hit of preallocating all the bitmaps required
|
||||
for the entire swap area up front but it winds up wasting less
|
||||
memory due to the use of a bitmap (one bit per block) instead of a
|
||||
linked list of nodes. The use of a radix tree instead of a
|
||||
sequential list gives us nearly O(1) performance no matter how
|
||||
fragmented the tree becomes.</para>
|
||||
</answer>
|
||||
</qandaentry>
|
||||
|
||||
<qandaentry>
|
||||
<question>
|
||||
<para>I don't get the following:</para>
|
||||
|
||||
<blockquote>
|
||||
<para>It is important to note that the FreeBSD VM system attempts
|
||||
to separate clean and dirty pages for the express reason of
|
||||
avoiding unnecessary flushes of dirty pages (which eats I/O
|
||||
bandwidth), nor does it move pages between the various page
|
||||
queues gratitously when the memory subsystem is not being
|
||||
stressed. This is why you will see some systems with very low
|
||||
cache queue counts and high active queue counts when doing a
|
||||
<command>systat -vm</command> command.</para>
|
||||
</blockquote>
|
||||
|
||||
<para>How is the separation of clean and dirty (inactive) pages
|
||||
related to the situation where you see low cache queue counts and
|
||||
high active queue counts in <command>systat -vm</command>? Do the
|
||||
systat stats roll the active and dirty pages together for the
|
||||
active queue count?</para>
|
||||
</question>
|
||||
|
||||
<answer>
|
||||
<para>Yes, that is confusing. The relationship is
|
||||
“goal” verses “reality”. Our goal is to
|
||||
separate the pages but the reality is that if we are not in a
|
||||
memory crunch, we don't really have to.</para>
|
||||
|
||||
<para>What this means is that FreeBSD will not try very hard to
|
||||
separate out dirty pages (inactive queue) from clean pages (cache
|
||||
queue) when the system is not being stressed, nor will it try to
|
||||
deactivate pages (active queue -> inactive queue) when the system
|
||||
is not being stressed, even if they aren't being used.</para>
|
||||
</answer>
|
||||
</qandaentry>
|
||||
|
||||
<qandaentry>
|
||||
<question>
|
||||
<para> In the &man.ls.1; / <command>vmstat 1</command> example,
|
||||
wouldn't some of the page faults be data page faults (COW from
|
||||
executable file to private page)? I.e., I would expect the page
|
||||
faults to be some zero-fill and some program data. Or are you
|
||||
implying that FreeBSD does do pre-COW for the program data?</para>
|
||||
</question>
|
||||
|
||||
<answer>
|
||||
<para>A COW fault can be either zero-fill or program-data. The
|
||||
mechanism is the same either way because the backing program-data
|
||||
is almost certainly already in the cache. I am indeed lumping the
|
||||
two together. FreeBSD does not pre-COW program data or zero-fill,
|
||||
but it <emphasis>does</emphasis> pre-map pages that exist in its
|
||||
cache.</para>
|
||||
</answer>
|
||||
</qandaentry>
|
||||
|
||||
<qandaentry>
|
||||
<question>
|
||||
<para>In your section on page table optimizations, can you give a
|
||||
little more detail about <literal>pv_entry</literal> and
|
||||
<literal>vm_page</literal> (or should vm_page be
|
||||
<literal>vm_pmap</literal>—as in 4.4, cf. pp. 180-181 of
|
||||
McKusick, Bostic, Karel, Quarterman)? Specifically, what kind of
|
||||
operation/reaction would require scanning the mappings?</para>
|
||||
|
||||
<para>How does Linux do in the case where FreeBSD breaks down
|
||||
(sharing a large file mapping over many processes)?</para>
|
||||
</question>
|
||||
|
||||
<answer>
|
||||
<para>A <literal>vm_page</literal> represents an (object,index#)
|
||||
tuple. A <literal>pv_entry</literal> represents a hardware page
|
||||
table entry (pte). If you have five processes sharing the same
|
||||
physical page, and three of those processes's page tables actually
|
||||
map the page, that page will be represented by a single
|
||||
<literal>vm_page</literal> structure and three
|
||||
<literal>pv_entry</literal> structures.</para>
|
||||
|
||||
<para><literal>pv_entry</literal> structures only represent pages
|
||||
mapped by the MMU (one <literal>pv_entry</literal> represnts one
|
||||
pte). This means that when we need to remove all hardware
|
||||
references to a <literal>vm_page</literal> (in order to reuse the
|
||||
page for something else, page it out, clear it, dirty it, and so
|
||||
forth) we can simply scan the linked list of
|
||||
<literal>pv_entry</literal>'s associated with that
|
||||
<literal>vm_page</literal> to remove or modify the pte's from
|
||||
their page tables.</para>
|
||||
|
||||
<para>Under Linux there is no such linked list. In order to remove
|
||||
all the hardware page table mappings for a
|
||||
<literal>vm_page</literal> linux must index into every VM object
|
||||
that <emphasis>might</emphasis> have mapped the page. For
|
||||
example, if you have 50 processes all mapping the same shared
|
||||
library and want to get rid of page X in that library, you need to
|
||||
index into the page table for each of those 50 processes even if
|
||||
only 10 of them have actually mapped the page. So Linux is
|
||||
trading off the simplicity of its design against performance.
|
||||
Many VM algorithms which are O(1) or (small N) under FreeBSD wind
|
||||
up being O(N), O(N^2), or worse under Linux. Since the pte's
|
||||
representing a particular page in an object tend to be at the same
|
||||
offset in all the page tables they are mapped in, reducing the
|
||||
number of accesses into the page tables at the same pte offset
|
||||
will often avoid blowing away the L1 cache line for that offset,
|
||||
which can lead to better performance.</para>
|
||||
|
||||
<para>FreeBSD has added complexity (the <literal>pv_entry</literal>
|
||||
scheme) in order to increase performance (to limit page table
|
||||
accesses to <emphasis>only</emphasis> those pte's that need to be
|
||||
modified).</para>
|
||||
|
||||
<para>But FreeBSD has a scaling problem that Linux does not in that
|
||||
there are a limited number of <literal>pv_entry</literal>
|
||||
structures and this causes problems when you have massive sharing
|
||||
of data. In this case you may run out of
|
||||
<literal>pv_entry</literal> structures even though there is plenty
|
||||
of free memory available. This can be fixed easily enough by
|
||||
bumping up the number of <literal>pv_entry</literal> structures in
|
||||
the kernel config, but we really need to find a better way to do
|
||||
it.</para>
|
||||
|
||||
<para>In regards to the memory overhead of a page table verses the
|
||||
<literal>pv_entry</literal> scheme: Linux uses
|
||||
‘permanent’ page tables that are not throw away, but
|
||||
does not need a <literal>pv_entry</literal> for each potentially
|
||||
mapped pte. FreeBSD uses ‘throw away’ page tables but
|
||||
adds in a <literal>pv_entry</literal> structure for each
|
||||
actually-mapped pte. I think memory utilization winds up being
|
||||
about the same, giving FreeBSD an algorithmic advantage with its
|
||||
ability to throw away page tables at will with very low
|
||||
overhead.</para>
|
||||
</answer>
|
||||
</qandaentry>
|
||||
|
||||
<qandaentry>
|
||||
<question>
|
||||
<para>Finally, in the page coloring section, it might help to have a
|
||||
little more description of what you mean here. I didn't quite
|
||||
follow it.</para>
|
||||
</question>
|
||||
|
||||
<answer>
|
||||
<para>Do you know how an L1 hardware memory cache works? I'll
|
||||
explain: Consider a machine with 16MB of main memory but only 128K
|
||||
of L1 cache. Generally the way this cache works is that each 128K
|
||||
block of main memory uses the <emphasis>same</emphasis> 128K of
|
||||
cache. If you access offset 0 in main memory and then offset
|
||||
offset 128K in main memory you can wind up throwing away the
|
||||
cached data you read from offset 0!</para>
|
||||
|
||||
<para>Now, I am simplifying things greatly. What I just described
|
||||
is what is called a ‘direct mapped’ hardware memory
|
||||
cache. Most modern caches are what are called
|
||||
2-way-set-associative or 4-way-set-associative caches. The
|
||||
set-associatively allows you to access up to N different memory
|
||||
regions that overlap the same cache memory without destroying the
|
||||
previously cached data. But only N.</para>
|
||||
|
||||
<para>So if I have a 4-way set associative cache I can access offset
|
||||
0, offset 128K, 256K and offset 384K and still be able to access
|
||||
offset 0 again and have it come from the L1 cache. If I then
|
||||
access offset 512K, however, one of the four previously cached
|
||||
data objects will be thrown away by the cache.</para>
|
||||
|
||||
<para>It is extremely important…
|
||||
<emphasis>extremely</emphasis> important for most of a processor's
|
||||
memory accesses to be able to come from the L1 cache, because the
|
||||
L1 cache operates at the processor frequency. The moment you have
|
||||
an L1 cahe miss and have to go to the L2 cache or to main memory,
|
||||
the processor will stall and potentially sit twidling its fingers
|
||||
for <emphasis>hundreds</emphasis> of instructions worth of time
|
||||
waiting for a read from main memory to complete. Main memory (the
|
||||
dynamic ram you stuff into a computer) is
|
||||
<emphasis>slow</emphasis>, when compared to the speed of a modern
|
||||
processor core.</para>
|
||||
|
||||
<para>Ok, so now onto page coloring: All modern memory caches are
|
||||
what are known as <emphasis>physical</emphasis> caches. They
|
||||
cache physical memory addresses, not virtual memory addresses.
|
||||
This allows the cache to be left alone across a process context
|
||||
switch, which is very important.</para>
|
||||
|
||||
<para>But in the UNIX world you are dealing with virtual address
|
||||
spaces, not physical address spaces. Any program you write will
|
||||
see the virtual address space given to it. The actual
|
||||
<emphasis>physical</emphasis> pages underlying that virtual
|
||||
address space are not necessarily physically contiguous! In fact,
|
||||
you might have two pages that are side by side in a processes
|
||||
address space which wind up being at offset 0 and offset 128K in
|
||||
<emphasis>physical</emphasis> memory.</para>
|
||||
|
||||
<para>A program normally assumes that two side-by-side pages will be
|
||||
optimally cached. That is, that you can access data objects in
|
||||
both pages without having them blow away each other's cache entry.
|
||||
But this is only true if the physical pages underlying the virtual
|
||||
address space are contiguous (insofar as the cache is
|
||||
concerned).</para>
|
||||
|
||||
<para>This is what Page coloring does. Instead of assigning
|
||||
<emphasis>random</emphasis> physical pages to virtual addresses,
|
||||
which may result in non-optimal cache performance , Page coloring
|
||||
assigns <emphasis>reasonably-contiguous</emphasis> physical pages
|
||||
to virtual addresses. Thus programs can be written under the
|
||||
assumption that the characteristics of the underlying hardware
|
||||
cache are the same for their virtual address space as they would
|
||||
be if the program had been run directly in a physical address
|
||||
space.</para>
|
||||
|
||||
<para>Note that I say ‘reasonably’ contiguous rather
|
||||
than simply ‘contiguous’. From the point of view of a
|
||||
128K direct mapped cache, the physical address 0 is the same as
|
||||
the physical address 128K. So two side-by-side pages in your
|
||||
virtual address space may wind up being offset 128K and offset
|
||||
132K in physical memory, but could also easily be offset 128K and
|
||||
offset 4K in physical memory and still retain the same cache
|
||||
performance characteristics. So page-coloring does
|
||||
<emphasis>not</emphasis> have to assign truly contiguous pages of
|
||||
physical memory to contiguous pages of virtual memory, it just
|
||||
needs to make sure it assigns contiguous pages from the point of
|
||||
view of cache performance and operation.</para>
|
||||
</answer>
|
||||
</qandaentry>
|
||||
</qandaset>
|
||||
</sect1>
|
||||
</article>
|
104
en_US.ISO_8859-1/articles/vm-design/fig1.eps
Normal file
104
en_US.ISO_8859-1/articles/vm-design/fig1.eps
Normal file
|
@ -0,0 +1,104 @@
|
|||
%!PS-Adobe-2.0 EPSF-2.0
|
||||
%%Title: fig1.eps
|
||||
%%Creator: fig2dev Version 3.2.3 Patchlevel
|
||||
%%CreationDate: Sun Oct 8 19:54:25 2000
|
||||
%%For: nik@canyon.nothing-going-on.org (Nik Clayton)
|
||||
%%BoundingBox: 0 0 119 65
|
||||
%%Magnification: 1.0000
|
||||
%%EndComments
|
||||
/$F2psDict 200 dict def
|
||||
$F2psDict begin
|
||||
$F2psDict /mtrx matrix put
|
||||
/col-1 {0 setgray} bind def
|
||||
/col0 {0.000 0.000 0.000 srgb} bind def
|
||||
/col1 {0.000 0.000 1.000 srgb} bind def
|
||||
/col2 {0.000 1.000 0.000 srgb} bind def
|
||||
/col3 {0.000 1.000 1.000 srgb} bind def
|
||||
/col4 {1.000 0.000 0.000 srgb} bind def
|
||||
/col5 {1.000 0.000 1.000 srgb} bind def
|
||||
/col6 {1.000 1.000 0.000 srgb} bind def
|
||||
/col7 {1.000 1.000 1.000 srgb} bind def
|
||||
/col8 {0.000 0.000 0.560 srgb} bind def
|
||||
/col9 {0.000 0.000 0.690 srgb} bind def
|
||||
/col10 {0.000 0.000 0.820 srgb} bind def
|
||||
/col11 {0.530 0.810 1.000 srgb} bind def
|
||||
/col12 {0.000 0.560 0.000 srgb} bind def
|
||||
/col13 {0.000 0.690 0.000 srgb} bind def
|
||||
/col14 {0.000 0.820 0.000 srgb} bind def
|
||||
/col15 {0.000 0.560 0.560 srgb} bind def
|
||||
/col16 {0.000 0.690 0.690 srgb} bind def
|
||||
/col17 {0.000 0.820 0.820 srgb} bind def
|
||||
/col18 {0.560 0.000 0.000 srgb} bind def
|
||||
/col19 {0.690 0.000 0.000 srgb} bind def
|
||||
/col20 {0.820 0.000 0.000 srgb} bind def
|
||||
/col21 {0.560 0.000 0.560 srgb} bind def
|
||||
/col22 {0.690 0.000 0.690 srgb} bind def
|
||||
/col23 {0.820 0.000 0.820 srgb} bind def
|
||||
/col24 {0.500 0.190 0.000 srgb} bind def
|
||||
/col25 {0.630 0.250 0.000 srgb} bind def
|
||||
/col26 {0.750 0.380 0.000 srgb} bind def
|
||||
/col27 {1.000 0.500 0.500 srgb} bind def
|
||||
/col28 {1.000 0.630 0.630 srgb} bind def
|
||||
/col29 {1.000 0.750 0.750 srgb} bind def
|
||||
/col30 {1.000 0.880 0.880 srgb} bind def
|
||||
/col31 {1.000 0.840 0.000 srgb} bind def
|
||||
|
||||
end
|
||||
save
|
||||
newpath 0 65 moveto 0 0 lineto 119 0 lineto 119 65 lineto closepath clip newpath
|
||||
-143.0 298.0 translate
|
||||
1 -1 scale
|
||||
|
||||
/cp {closepath} bind def
|
||||
/ef {eofill} bind def
|
||||
/gr {grestore} bind def
|
||||
/gs {gsave} bind def
|
||||
/sa {save} bind def
|
||||
/rs {restore} bind def
|
||||
/l {lineto} bind def
|
||||
/m {moveto} bind def
|
||||
/rm {rmoveto} bind def
|
||||
/n {newpath} bind def
|
||||
/s {stroke} bind def
|
||||
/sh {show} bind def
|
||||
/slc {setlinecap} bind def
|
||||
/slj {setlinejoin} bind def
|
||||
/slw {setlinewidth} bind def
|
||||
/srgb {setrgbcolor} bind def
|
||||
/rot {rotate} bind def
|
||||
/sc {scale} bind def
|
||||
/sd {setdash} bind def
|
||||
/ff {findfont} bind def
|
||||
/sf {setfont} bind def
|
||||
/scf {scalefont} bind def
|
||||
/sw {stringwidth} bind def
|
||||
/tr {translate} bind def
|
||||
/tnt {dup dup currentrgbcolor
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb}
|
||||
bind def
|
||||
/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul
|
||||
4 -2 roll mul srgb} bind def
|
||||
/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def
|
||||
/$F2psEnd {$F2psEnteredState restore end} def
|
||||
|
||||
$F2psBegin
|
||||
%%Page: 1 1
|
||||
10 setmiterlimit
|
||||
0.06000 0.06000 sc
|
||||
% Polyline
|
||||
7.500 slw
|
||||
n 2400 4200 m 4050 4200 l 4050 4950 l 2400 4950 l
|
||||
cp gs col0 s gr
|
||||
% Polyline
|
||||
n 4050 4200 m
|
||||
4350 3900 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2400 4200 m 2700 3900 l 4350 3900 l 4350 4650 l
|
||||
4050 4950 l gs col0 s gr
|
||||
/Helvetica-Bold ff 180.00 scf sf
|
||||
3225 4650 m
|
||||
gs 1 -1 sc (A) dup sw pop 2 div neg 0 rm col0 sh gr
|
||||
$F2psEnd
|
||||
rs
|
115
en_US.ISO_8859-1/articles/vm-design/fig2.eps
Normal file
115
en_US.ISO_8859-1/articles/vm-design/fig2.eps
Normal file
|
@ -0,0 +1,115 @@
|
|||
%!PS-Adobe-2.0 EPSF-2.0
|
||||
%%Title: fig2.eps
|
||||
%%Creator: fig2dev Version 3.2.3 Patchlevel
|
||||
%%CreationDate: Sun Oct 8 19:55:31 2000
|
||||
%%For: nik@canyon.nothing-going-on.org (Nik Clayton)
|
||||
%%BoundingBox: 0 0 120 110
|
||||
%%Magnification: 1.0000
|
||||
%%EndComments
|
||||
/$F2psDict 200 dict def
|
||||
$F2psDict begin
|
||||
$F2psDict /mtrx matrix put
|
||||
/col-1 {0 setgray} bind def
|
||||
/col0 {0.000 0.000 0.000 srgb} bind def
|
||||
/col1 {0.000 0.000 1.000 srgb} bind def
|
||||
/col2 {0.000 1.000 0.000 srgb} bind def
|
||||
/col3 {0.000 1.000 1.000 srgb} bind def
|
||||
/col4 {1.000 0.000 0.000 srgb} bind def
|
||||
/col5 {1.000 0.000 1.000 srgb} bind def
|
||||
/col6 {1.000 1.000 0.000 srgb} bind def
|
||||
/col7 {1.000 1.000 1.000 srgb} bind def
|
||||
/col8 {0.000 0.000 0.560 srgb} bind def
|
||||
/col9 {0.000 0.000 0.690 srgb} bind def
|
||||
/col10 {0.000 0.000 0.820 srgb} bind def
|
||||
/col11 {0.530 0.810 1.000 srgb} bind def
|
||||
/col12 {0.000 0.560 0.000 srgb} bind def
|
||||
/col13 {0.000 0.690 0.000 srgb} bind def
|
||||
/col14 {0.000 0.820 0.000 srgb} bind def
|
||||
/col15 {0.000 0.560 0.560 srgb} bind def
|
||||
/col16 {0.000 0.690 0.690 srgb} bind def
|
||||
/col17 {0.000 0.820 0.820 srgb} bind def
|
||||
/col18 {0.560 0.000 0.000 srgb} bind def
|
||||
/col19 {0.690 0.000 0.000 srgb} bind def
|
||||
/col20 {0.820 0.000 0.000 srgb} bind def
|
||||
/col21 {0.560 0.000 0.560 srgb} bind def
|
||||
/col22 {0.690 0.000 0.690 srgb} bind def
|
||||
/col23 {0.820 0.000 0.820 srgb} bind def
|
||||
/col24 {0.500 0.190 0.000 srgb} bind def
|
||||
/col25 {0.630 0.250 0.000 srgb} bind def
|
||||
/col26 {0.750 0.380 0.000 srgb} bind def
|
||||
/col27 {1.000 0.500 0.500 srgb} bind def
|
||||
/col28 {1.000 0.630 0.630 srgb} bind def
|
||||
/col29 {1.000 0.750 0.750 srgb} bind def
|
||||
/col30 {1.000 0.880 0.880 srgb} bind def
|
||||
/col31 {1.000 0.840 0.000 srgb} bind def
|
||||
|
||||
end
|
||||
save
|
||||
newpath 0 110 moveto 0 0 lineto 120 0 lineto 120 110 lineto closepath clip newpath
|
||||
-174.0 370.0 translate
|
||||
1 -1 scale
|
||||
|
||||
/cp {closepath} bind def
|
||||
/ef {eofill} bind def
|
||||
/gr {grestore} bind def
|
||||
/gs {gsave} bind def
|
||||
/sa {save} bind def
|
||||
/rs {restore} bind def
|
||||
/l {lineto} bind def
|
||||
/m {moveto} bind def
|
||||
/rm {rmoveto} bind def
|
||||
/n {newpath} bind def
|
||||
/s {stroke} bind def
|
||||
/sh {show} bind def
|
||||
/slc {setlinecap} bind def
|
||||
/slj {setlinejoin} bind def
|
||||
/slw {setlinewidth} bind def
|
||||
/srgb {setrgbcolor} bind def
|
||||
/rot {rotate} bind def
|
||||
/sc {scale} bind def
|
||||
/sd {setdash} bind def
|
||||
/ff {findfont} bind def
|
||||
/sf {setfont} bind def
|
||||
/scf {scalefont} bind def
|
||||
/sw {stringwidth} bind def
|
||||
/tr {translate} bind def
|
||||
/tnt {dup dup currentrgbcolor
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb}
|
||||
bind def
|
||||
/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul
|
||||
4 -2 roll mul srgb} bind def
|
||||
/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def
|
||||
/$F2psEnd {$F2psEnteredState restore end} def
|
||||
|
||||
$F2psBegin
|
||||
%%Page: 1 1
|
||||
10 setmiterlimit
|
||||
0.06000 0.06000 sc
|
||||
/Helvetica-Bold ff 180.00 scf sf
|
||||
3750 5100 m
|
||||
gs 1 -1 sc (B) dup sw pop 2 div neg 0 rm col0 sh gr
|
||||
% Polyline
|
||||
7.500 slw
|
||||
n 4871 5100 m 4879 5100 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2925 5400 m 4575 5400 l 4575 6150 l 2925 6150 l
|
||||
cp gs col0 s gr
|
||||
% Polyline
|
||||
n 4575 4650 m
|
||||
4875 4350 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2925 4650 m 4575 4650 l 4575 5400 l 2925 5400 l
|
||||
cp gs col0 s gr
|
||||
% Polyline
|
||||
n 2925 4650 m 3225 4350 l 4875 4350 l 4875 5100 l
|
||||
4575 5400 l gs col0 s gr
|
||||
/Helvetica-Bold ff 180.00 scf sf
|
||||
3750 5850 m
|
||||
gs 1 -1 sc (A) dup sw pop 2 div neg 0 rm col0 sh gr
|
||||
% Polyline
|
||||
n 4875 5100 m 4875 5850 l
|
||||
4575 6150 l gs col0 s gr
|
||||
$F2psEnd
|
||||
rs
|
133
en_US.ISO_8859-1/articles/vm-design/fig3.eps
Normal file
133
en_US.ISO_8859-1/articles/vm-design/fig3.eps
Normal file
|
@ -0,0 +1,133 @@
|
|||
%!PS-Adobe-2.0 EPSF-2.0
|
||||
%%Title: fig3.eps
|
||||
%%Creator: fig2dev Version 3.2.3 Patchlevel
|
||||
%%CreationDate: Sun Oct 8 19:53:51 2000
|
||||
%%For: nik@canyon.nothing-going-on.org (Nik Clayton)
|
||||
%%BoundingBox: 0 0 120 155
|
||||
%%Magnification: 1.0000
|
||||
%%EndComments
|
||||
/$F2psDict 200 dict def
|
||||
$F2psDict begin
|
||||
$F2psDict /mtrx matrix put
|
||||
/col-1 {0 setgray} bind def
|
||||
/col0 {0.000 0.000 0.000 srgb} bind def
|
||||
/col1 {0.000 0.000 1.000 srgb} bind def
|
||||
/col2 {0.000 1.000 0.000 srgb} bind def
|
||||
/col3 {0.000 1.000 1.000 srgb} bind def
|
||||
/col4 {1.000 0.000 0.000 srgb} bind def
|
||||
/col5 {1.000 0.000 1.000 srgb} bind def
|
||||
/col6 {1.000 1.000 0.000 srgb} bind def
|
||||
/col7 {1.000 1.000 1.000 srgb} bind def
|
||||
/col8 {0.000 0.000 0.560 srgb} bind def
|
||||
/col9 {0.000 0.000 0.690 srgb} bind def
|
||||
/col10 {0.000 0.000 0.820 srgb} bind def
|
||||
/col11 {0.530 0.810 1.000 srgb} bind def
|
||||
/col12 {0.000 0.560 0.000 srgb} bind def
|
||||
/col13 {0.000 0.690 0.000 srgb} bind def
|
||||
/col14 {0.000 0.820 0.000 srgb} bind def
|
||||
/col15 {0.000 0.560 0.560 srgb} bind def
|
||||
/col16 {0.000 0.690 0.690 srgb} bind def
|
||||
/col17 {0.000 0.820 0.820 srgb} bind def
|
||||
/col18 {0.560 0.000 0.000 srgb} bind def
|
||||
/col19 {0.690 0.000 0.000 srgb} bind def
|
||||
/col20 {0.820 0.000 0.000 srgb} bind def
|
||||
/col21 {0.560 0.000 0.560 srgb} bind def
|
||||
/col22 {0.690 0.000 0.690 srgb} bind def
|
||||
/col23 {0.820 0.000 0.820 srgb} bind def
|
||||
/col24 {0.500 0.190 0.000 srgb} bind def
|
||||
/col25 {0.630 0.250 0.000 srgb} bind def
|
||||
/col26 {0.750 0.380 0.000 srgb} bind def
|
||||
/col27 {1.000 0.500 0.500 srgb} bind def
|
||||
/col28 {1.000 0.630 0.630 srgb} bind def
|
||||
/col29 {1.000 0.750 0.750 srgb} bind def
|
||||
/col30 {1.000 0.880 0.880 srgb} bind def
|
||||
/col31 {1.000 0.840 0.000 srgb} bind def
|
||||
|
||||
end
|
||||
save
|
||||
newpath 0 155 moveto 0 0 lineto 120 0 lineto 120 155 lineto closepath clip newpath
|
||||
-174.0 370.0 translate
|
||||
1 -1 scale
|
||||
|
||||
/cp {closepath} bind def
|
||||
/ef {eofill} bind def
|
||||
/gr {grestore} bind def
|
||||
/gs {gsave} bind def
|
||||
/sa {save} bind def
|
||||
/rs {restore} bind def
|
||||
/l {lineto} bind def
|
||||
/m {moveto} bind def
|
||||
/rm {rmoveto} bind def
|
||||
/n {newpath} bind def
|
||||
/s {stroke} bind def
|
||||
/sh {show} bind def
|
||||
/slc {setlinecap} bind def
|
||||
/slj {setlinejoin} bind def
|
||||
/slw {setlinewidth} bind def
|
||||
/srgb {setrgbcolor} bind def
|
||||
/rot {rotate} bind def
|
||||
/sc {scale} bind def
|
||||
/sd {setdash} bind def
|
||||
/ff {findfont} bind def
|
||||
/sf {setfont} bind def
|
||||
/scf {scalefont} bind def
|
||||
/sw {stringwidth} bind def
|
||||
/tr {translate} bind def
|
||||
/tnt {dup dup currentrgbcolor
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb}
|
||||
bind def
|
||||
/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul
|
||||
4 -2 roll mul srgb} bind def
|
||||
/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def
|
||||
/$F2psEnd {$F2psEnteredState restore end} def
|
||||
|
||||
$F2psBegin
|
||||
%%Page: 1 1
|
||||
10 setmiterlimit
|
||||
0.06000 0.06000 sc
|
||||
/Helvetica-Bold ff 180.00 scf sf
|
||||
4125 4350 m
|
||||
gs 1 -1 sc (C2) dup sw pop 2 div neg 0 rm col0 sh gr
|
||||
% Polyline
|
||||
7.500 slw
|
||||
n 4871 5100 m 4879 5100 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2925 5400 m 4575 5400 l 4575 6150 l 2925 6150 l
|
||||
cp gs col0 s gr
|
||||
% Polyline
|
||||
n 4575 4650 m
|
||||
4875 4350 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2925 4650 m 4575 4650 l 4575 5400 l 2925 5400 l
|
||||
cp gs col0 s gr
|
||||
% Polyline
|
||||
n 4875 3600 m 4875 5100 l
|
||||
4575 5400 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2925 4650 m 2925 3900 l 3225 3600 l
|
||||
4875 3600 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2925 3900 m 4425 3900 l 4575 3900 l
|
||||
4875 3600 l gs col0 s gr
|
||||
% Polyline
|
||||
n 4575 4650 m
|
||||
4575 3900 l gs col0 s gr
|
||||
% Polyline
|
||||
n 3750 4650 m 3750 3900 l
|
||||
4050 3600 l gs col0 s gr
|
||||
/Helvetica-Bold ff 180.00 scf sf
|
||||
3750 5850 m
|
||||
gs 1 -1 sc (A) dup sw pop 2 div neg 0 rm col0 sh gr
|
||||
/Helvetica-Bold ff 180.00 scf sf
|
||||
3750 5100 m
|
||||
gs 1 -1 sc (B) dup sw pop 2 div neg 0 rm col0 sh gr
|
||||
/Helvetica-Bold ff 180.00 scf sf
|
||||
3375 4350 m
|
||||
gs 1 -1 sc (C1) dup sw pop 2 div neg 0 rm col0 sh gr
|
||||
% Polyline
|
||||
n 4875 5100 m 4875 5850 l
|
||||
4575 6150 l gs col0 s gr
|
||||
$F2psEnd
|
||||
rs
|
133
en_US.ISO_8859-1/articles/vm-design/fig4.eps
Normal file
133
en_US.ISO_8859-1/articles/vm-design/fig4.eps
Normal file
|
@ -0,0 +1,133 @@
|
|||
%!PS-Adobe-2.0 EPSF-2.0
|
||||
%%Title: fig4.eps
|
||||
%%Creator: fig2dev Version 3.2.3 Patchlevel
|
||||
%%CreationDate: Sun Oct 8 19:55:53 2000
|
||||
%%For: nik@canyon.nothing-going-on.org (Nik Clayton)
|
||||
%%BoundingBox: 0 0 120 155
|
||||
%%Magnification: 1.0000
|
||||
%%EndComments
|
||||
/$F2psDict 200 dict def
|
||||
$F2psDict begin
|
||||
$F2psDict /mtrx matrix put
|
||||
/col-1 {0 setgray} bind def
|
||||
/col0 {0.000 0.000 0.000 srgb} bind def
|
||||
/col1 {0.000 0.000 1.000 srgb} bind def
|
||||
/col2 {0.000 1.000 0.000 srgb} bind def
|
||||
/col3 {0.000 1.000 1.000 srgb} bind def
|
||||
/col4 {1.000 0.000 0.000 srgb} bind def
|
||||
/col5 {1.000 0.000 1.000 srgb} bind def
|
||||
/col6 {1.000 1.000 0.000 srgb} bind def
|
||||
/col7 {1.000 1.000 1.000 srgb} bind def
|
||||
/col8 {0.000 0.000 0.560 srgb} bind def
|
||||
/col9 {0.000 0.000 0.690 srgb} bind def
|
||||
/col10 {0.000 0.000 0.820 srgb} bind def
|
||||
/col11 {0.530 0.810 1.000 srgb} bind def
|
||||
/col12 {0.000 0.560 0.000 srgb} bind def
|
||||
/col13 {0.000 0.690 0.000 srgb} bind def
|
||||
/col14 {0.000 0.820 0.000 srgb} bind def
|
||||
/col15 {0.000 0.560 0.560 srgb} bind def
|
||||
/col16 {0.000 0.690 0.690 srgb} bind def
|
||||
/col17 {0.000 0.820 0.820 srgb} bind def
|
||||
/col18 {0.560 0.000 0.000 srgb} bind def
|
||||
/col19 {0.690 0.000 0.000 srgb} bind def
|
||||
/col20 {0.820 0.000 0.000 srgb} bind def
|
||||
/col21 {0.560 0.000 0.560 srgb} bind def
|
||||
/col22 {0.690 0.000 0.690 srgb} bind def
|
||||
/col23 {0.820 0.000 0.820 srgb} bind def
|
||||
/col24 {0.500 0.190 0.000 srgb} bind def
|
||||
/col25 {0.630 0.250 0.000 srgb} bind def
|
||||
/col26 {0.750 0.380 0.000 srgb} bind def
|
||||
/col27 {1.000 0.500 0.500 srgb} bind def
|
||||
/col28 {1.000 0.630 0.630 srgb} bind def
|
||||
/col29 {1.000 0.750 0.750 srgb} bind def
|
||||
/col30 {1.000 0.880 0.880 srgb} bind def
|
||||
/col31 {1.000 0.840 0.000 srgb} bind def
|
||||
|
||||
end
|
||||
save
|
||||
newpath 0 155 moveto 0 0 lineto 120 0 lineto 120 155 lineto closepath clip newpath
|
||||
-174.0 370.0 translate
|
||||
1 -1 scale
|
||||
|
||||
/cp {closepath} bind def
|
||||
/ef {eofill} bind def
|
||||
/gr {grestore} bind def
|
||||
/gs {gsave} bind def
|
||||
/sa {save} bind def
|
||||
/rs {restore} bind def
|
||||
/l {lineto} bind def
|
||||
/m {moveto} bind def
|
||||
/rm {rmoveto} bind def
|
||||
/n {newpath} bind def
|
||||
/s {stroke} bind def
|
||||
/sh {show} bind def
|
||||
/slc {setlinecap} bind def
|
||||
/slj {setlinejoin} bind def
|
||||
/slw {setlinewidth} bind def
|
||||
/srgb {setrgbcolor} bind def
|
||||
/rot {rotate} bind def
|
||||
/sc {scale} bind def
|
||||
/sd {setdash} bind def
|
||||
/ff {findfont} bind def
|
||||
/sf {setfont} bind def
|
||||
/scf {scalefont} bind def
|
||||
/sw {stringwidth} bind def
|
||||
/tr {translate} bind def
|
||||
/tnt {dup dup currentrgbcolor
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb}
|
||||
bind def
|
||||
/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul
|
||||
4 -2 roll mul srgb} bind def
|
||||
/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def
|
||||
/$F2psEnd {$F2psEnteredState restore end} def
|
||||
|
||||
$F2psBegin
|
||||
%%Page: 1 1
|
||||
10 setmiterlimit
|
||||
0.06000 0.06000 sc
|
||||
/Helvetica-Bold ff 180.00 scf sf
|
||||
3375 4350 m
|
||||
gs 1 -1 sc (C1) dup sw pop 2 div neg 0 rm col0 sh gr
|
||||
% Polyline
|
||||
7.500 slw
|
||||
n 4871 5100 m 4879 5100 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2925 5400 m 4575 5400 l 4575 6150 l 2925 6150 l
|
||||
cp gs col0 s gr
|
||||
% Polyline
|
||||
n 4575 4650 m
|
||||
4875 4350 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2925 4650 m 4575 4650 l 4575 5400 l 2925 5400 l
|
||||
cp gs col0 s gr
|
||||
% Polyline
|
||||
n 4875 4350 m 4875 5100 l
|
||||
4575 5400 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2925 4650 m 2925 3900 l 3225 3600 l
|
||||
4050 3600 l gs col0 s gr
|
||||
% Polyline
|
||||
n 3750 4650 m 3750 3900 l
|
||||
4050 3600 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2925 3900 m
|
||||
3750 3900 l gs col0 s gr
|
||||
% Polyline
|
||||
n 3750 4650 m 4050 4350 l
|
||||
4875 4350 l gs col0 s gr
|
||||
% Polyline
|
||||
n 4050 4350 m
|
||||
4050 3600 l gs col0 s gr
|
||||
/Helvetica-Bold ff 180.00 scf sf
|
||||
3750 5850 m
|
||||
gs 1 -1 sc (A) dup sw pop 2 div neg 0 rm col0 sh gr
|
||||
/Helvetica-Bold ff 180.00 scf sf
|
||||
3750 5100 m
|
||||
gs 1 -1 sc (B) dup sw pop 2 div neg 0 rm col0 sh gr
|
||||
% Polyline
|
||||
n 4875 5100 m 4875 5850 l
|
||||
4575 6150 l gs col0 s gr
|
||||
$F2psEnd
|
||||
rs
|
Loading…
Reference in a new issue