This is Matt Dillon's VM Design article from DaemonNews. It's here

a) Because it's a cool piece of documentation b) It's a real world example of using images in the documentation. It's not turned on in the upper level Makefile yet, as I expect the specifics of the toolchain to change over the next week or so as people play around with this, and I don't want to the doc build mirrors to have to suddenly update the ports they have installed. Once this has stabilised it can be turned on.
svn path=/head/; revision=8116
2000-10-08 19:23:10 +00:00 · 2000-10-08 19:23:10 +00:00 · 68b9d2851a · 2020-12-08 03:00:23 +00:00
commit 68b9d2851a
parent 3197343458
12 changed files with 2678 additions and 0 deletions
--- a/en_US.ISO8859-1/articles/vm-design/Makefile
+++ b/en_US.ISO8859-1/articles/vm-design/Makefile
@ -0,0 +1,16 @@
+# $FreeBSD: doc/en_US.ISO_8859-1/articles/mh/Makefile,v 1.8 1999/09/06 06:52:37 peter Exp $
+
+DOC?= article
+
+FORMATS?= html
+
+IMAGES=	fig1.eps fig2.eps fig3.eps fig4.eps
+
+INSTALL_COMPRESSED?=gz
+INSTALL_ONLY_COMPRESSED?=
+
+SRCS= article.sgml
+
+DOC_PREFIX?= ${.CURDIR}/../../..
+
+.include "${DOC_PREFIX}/share/mk/doc.project.mk"
--- a/en_US.ISO8859-1/articles/vm-design/article.sgml
+++ b/en_US.ISO8859-1/articles/vm-design/article.sgml
@ -0,0 +1,838 @@
+<!-- $FreeBSD: doc/en_US.ISO_8859-1/articles/mh/article.sgml,v 1.7 1999/10/10 20:20:38 jhb Exp $ -->
+<!-- FreeBSD Documentation Project -->
+
+<!DOCTYPE ARTICLE PUBLIC "-//FreeBSD//DTD DocBook V3.1-Based Extension//EN" [
+<!ENTITY % man PUBLIC "-//FreeBSD//ENTITIES DocBook Manual Page Entities//EN">
+%man;
+]>
+
+<article>
+  <artheader>
+    <title>Design elements of the FreeBSD VM system</title>
+
+    <authorgroup>
+      <author>
+	<firstname>Matthew</firstname>
+
+	<surname>Dillon</surname>
+
+	<affiliation>
+	  <address>
+	    <email>dillon@apollo.backplane.com</email>
+	  </address>
+	</affiliation>
+      </author>
+    </authorgroup>
+
+    <abstract>
+      <para>The title is really just a fancy way of saying that I am going to
+	attempt to describe the whole VM enchilada, hopefully in a way that
+	everyone can follow.  For the last year I have concentrated on a number
+	of major kernel subsystems within FreeBSD, with the VM and Swap
+	subsystems being the most interesting and NFS being &lsquo;a necessary
+	chore&rsquo;.  I rewrote only small portions of the code.  In the VM
+	arena the only major rewrite I have done is to the swap subsystem.
+	Most of my work was cleanup and maintenance, with only moderate code
+	rewriting and no major algorithmic adjustments within the VM
+	subsystem.  The bulk of the VM subsystem's theoretical base remains
+	unchanged and a lot of the credit for the modernization effort in the
+	last few years belongs to John Dyson and David Greenman.  Not being a
+	historian like Kirk I will not attempt to tag all the various features
+	with peoples names, since I will invariably get it wrong.</para>
+    </abstract>
+
+    <legalnotice>
+      <para>This article was originally published in the January 2000 issue of 
+	<ulink url="http://www.daemonnews.org/">DaemonNews</ulink>.  This
+	version of the article may include updates from Matt and other authors
+	to reflect changes in FreeBSD's VM implementation.</para>
+    </legalnotice>
+  </artheader>
+
+  <sect1>
+    <title>Introduction</title>
+
+    <para>Before moving along to the actual design let's spend a little time
+      on the necessity of maintaining and modernizing any long-living
+      codebase.  In the programming world, algorithms tend to be more
+      important than code and it is precisely due to BSD's academic roots that
+      a great deal of attention was paid to algorithm design from the
+      beginning.  More attention paid to the design generally leads to a clean
+      and flexible codebase that can be fairly easily modified, extended, or
+      replaced over time.  While BSD is considered an &lsquo;old&rsquo;
+      operating system by some people, those of us who work on it tend to view
+      it more as a &lsquo;mature&rsquo; codebase which has various components
+      modified, extended, or replaced with modern code.  It has evolved, and
+      FreeBSD is at the bleeding edge no matter how old some of the code might
+      be.  This is an important distinction to make and one that is
+      unfortunately lost to many people.  The biggest error a programmer can
+      make is to not learn from history, and this is precisely the error that
+      many other modern operating systems have made.  NT is the best example
+      of this, and the consequences have been dire.  Linux also makes this
+      mistake to some degree&mdash;enough that we BSD folk can make small
+      jokes about it every once in a while, anyway.  Linux's problem is simply
+      one of a lack of experience and history to compare ideas against, a
+      problem that is easily and rapidly being addressed by the Linux
+      community in the same way it has been addressed in the BSD
+      community&mdash;by continuous code development.  The NT folk, on the
+      other hand, repeatedly make the same mistakes solved by UNIX decades ago
+      and then spend years fixing them. Over and over again.  They have a
+      severe case of &lsquo;not designed here&rsquo; and &lsquo;we are always
+      right because our marketing department says so&rsquo;.  I have little
+      tolerance for anyone who cannot learn from history.</para>
+
+    <para>Much of the apparent complexity of the FreeBSD design, especially in
+      the VM/Swap subsystem, is a direct result of having to solve serious
+      performance issues that occur under various conditions.  These issues
+      are not due to bad algorithmic design but instead rise from
+      environmental factors.  In any direct comparison between platforms,
+      these issues become most apparent when system resources begin to get
+      stressed.  As I describe FreeBSD's VM/Swap subsystem the reader should
+      always keep two points in mind.  First, the most important aspect of
+      performance design is what is known as &ldquo;Optimizing the Critical
+      Path&rdquo;.  It is often the case that performance optimizations add a
+      little bloat to the code in order to make the critical path perform
+      better.  Second, a solid, generalized design outperforms a
+      heavily-optimized design over the long run.  While a generalized design
+      may end up being slower than an heavily-optimized design when they are
+      first implemented, the generalized design tends to be easier to adapt to
+      changing conditions and the heavily-optimized design winds up having to
+      be thrown away.  Any codebase that will survive and be maintainable for
+      years must therefore be designed properly from the beginning even if it
+      costs some performance.  Twenty years ago people were still arguing that
+      programming in assembly was better than programming in a high-level
+      language because it produced code that was ten times as fast.  Today,
+      the fallibility of that argument is obvious&mdash;as are the parallels
+      to algorithmic design and code generalization.</para>
+  </sect1>
+
+  <sect1>
+    <title>VM Objects</title>
+
+    <para>The best way to begin describing the FreeBSD VM system is to look at
+      it from the perspective of a user-level process.  Each user process sees
+      a single, private, contiguous VM address space containing several types
+      of memory objects.  These objects have various characteristics.  Program
+      code and program data are effectively a single memory-mapped file (the
+      binary file being run), but program code is read-only while program data
+      is copy-on-write.  Program BSS is just memory allocated and filled with
+      zeros on demand, called demand zero page fill.  Arbitrary files can be
+      memory-mapped into the address space as well, which is how the shared
+      library mechanism works.  Such mappings can require modifications to
+      remain private to the process making them.  The fork system call adds an
+      entirely new dimension to the VM management problem on top of the
+      complexity already given.</para>
+
+    <para>A program binary data page (which is a basic copy-on-write page)
+      illustrates the complexity.  A program binary contains a preinitialized
+      data section which is initially mapped directly from the program file.
+      When a program is loaded into a process's VM space, this area is
+      initially memory-mapped and backed by the program binary itself,
+      allowing the VM system to free/reuse the page and later load it back in
+      from the binary.  The moment a process modifies this data, however, the
+      VM system must make a private copy of the page for that process.  Since
+      the private copy has been modified, the VM system may no longer free it,
+      because there is no longer any way to restore it later on.</para>
+
+    <para>You will notice immediately that what was originally a simple file
+      mapping has become much more complex.  Data may be modified on a
+      page-by-page basis whereas the file mapping encompasses many pages at
+      once.  The complexity further increases when a process forks.  When a
+      process forks, the result is two processes&mdash;each with their own
+      private address spaces, including any modifications made by the original
+      process prior to the call to <function>fork()</function>.  It would be
+      silly for the VM system to make a complete copy of the data at the time
+      of the <function>fork()</function> because it is quite possible that at
+      least one of the two processes will only need to read from that page
+      from then on, allowing the original page to continue to be used.  What
+      was a private page is made copy-on-write again, since each process
+      (parent and child) expects their own personal post-fork modifications to
+      remain private to themselves and not effect the other.</para>
+
+    <para>FreeBSD manages all of this with a layered VM Object model.  The
+      original binary program file winds up being the lowest VM Object layer.
+      A copy-on-write layer is pushed on top of that to hold those pages which
+      had to be copied from the original file.  If the program modifies a data
+      page belonging to the original file the VM system takes a fault and
+      makes a copy of the page in the higher layer.  When a process forks,
+      additional VM Object layers are pushed on.  This might make a little
+      more sense with a fairly basic example.  A <function>fork()</function>
+      is a common operation for any *BSD system, so this example will consider
+      a program that starts up, and forks.  When the process starts, the VM
+      system creates an object layer, let's call this A:</para>
+
+    <mediaobject>
+      <imageobject>
+        <imagedata fileref="fig1">
+      </imageobject>
+	
+      <textobject>
+	<literallayout>+---------------+
+|       A       |
+---------------+</literallayout>
+      </textobject>
+
+      <textobject>
+	<phrase>A picture</phrase>
+      </textobject>
+    </mediaobject>
+
+    <para>A represents the file&mdash;pages may be paged in and out of the
+      file's physical media as necessary.  Paging in from the disk is
+      reasonable for a program, but we really don't want to page back out and
+      overwrite the executable.  The VM system therefore creates a second
+      layer, B, that will be physically backed by swap space:</para>
+
+    <mediaobject>
+      <imageobject>
+        <imagedata fileref="fig2">
+      </imageobject>
+
+      <textobject>
+	<literallayout>+---------------+
+|       B       |	  
+---------------+
+|       A       |
+---------------+</literallayout>
+      </textobject>
+    </mediaobject>
+
+    <para>On the first write to a page after this, a new page is created in B,
+      and its contents are initialized from A.  All pages in B can be paged in
+      or out to a swap device.  When the program forks, the VM system creates
+      two new object layers&mdash;C1 for the parent, and C2 for the
+      child&mdash;that rest on top of B:</para>
+
+    <mediaobject>
+      <imageobject>
+        <imagedata fileref="fig3">
+      </imageobject>
+      
+      <textobject>
+	<literallayout>+-------+-------+
+|   C1  |   C2  |
+-------+-------+
+|       B       |
+---------------+
+|       A       |
+---------------+</literallayout>
+      </textobject>
+    </mediaobject>
+
+    <para>In this case, let's say a page in B is modified by the original
+      parent process.  The process will take a copy-on-write fault and
+      duplicate the page in C1, leaving the original page in B untouched.
+      Now, let's say the same page in B is modified by the child process.  The
+      process will take a copy-on-write fault and duplicate the page in C2.
+      The original page in B is now completely hidden since both C1 and C2
+      have a copy and B could theoretically be destroyed if it does not
+      represent a 'real' file).  However, this sort of optimization is not
+      trivial to make because it is so fine-grained.  FreeBSD does not make
+      this optimization.  Now, suppose (as is often the case) that the child
+      process does an <function>exec()</function>.  Its current address space
+      is usually replaced by a new address space representing a new file.  In
+      this case, the C2 layer is destroyed:</para>
+
+    <mediaobject>
+      <imageobject>
+        <imagedata fileref="fig4">
+      </imageobject>
+
+      <textobject>
+	<literallayout>+-------+
+|   C1  |
+-------+-------+
+|       B       |
+---------------+
+|       A       |
+---------------+</literallayout>
+      </textobject>
+    </mediaobject>
+
+    <para>In this case, the number of children of B drops to one, and all
+      accesses to B now go through C1.  This means that B and C1 can be
+      collapsed together.  Any pages in B that also exist in C1 are deleted
+      from B during the collapse.  Thus, even though the optimization in the
+      previous step could not be made, we can recover the dead pages when
+      either of the processes exit or <function>exec()</function>.</para>
+
+    <para>This model creates a number of potential problems.  The first is that
+      you can wind up with a relatively deep stack of layered VM Objects which
+      can cost scanning time and memory when you when you take a fault.  Deep
+      layering can occur when processes fork and then fork again (either
+      parent or child).  The second problem is that you can wind up with dead,
+      inaccessible pages deep in the stack of VM Objects.  In our last example
+      if both the parent and child processes modify the same page, they both
+      get their own private copies of the page and the original page in B is
+      no longer accessible by anyone.  That page in B can be freed.</para>
+
+    <para>FreeBSD solves the deep layering problem with a special optimization
+      called the &ldquo;All Shadowed Case&rdquo;.  This case occurs if either
+      C1 or C2 take sufficient COW faults to completely shadow all pages in B.
+      Lets say that C1 achieves this.  C1 can now bypass B entirely, so rather
+      then have C1->B->A and C2->B->A we now have C1->A and C2->B->A.  But
+      look what also happened&mdash;now B has only one reference (C2), so we
+      can collapse B and C2 together.  The end result is that B is deleted
+      entirely and we have C1->A and C2->A.  It is often the case that B will
+      contain a large number of pages and neither C1 nor C2 will be able to
+      completely overshadow it.  If we fork again and create a set of D
+      layers, however, it is much more likely that one of the D layers will
+      eventually be able to completely overshadow the much smaller dataset
+      reprsented by C1 or C2.  The same optimization will work at any point in
+      the graph and the grand result of this is that even on a heavily forked
+      machine VM Object stacks tend to not get much deeper then 4.  This is
+      true of both the parent and the children and true whether the parent is
+      doing the forking or whether the children cascade forks.</para>
+
+    <para>The dead page problem still exists in the case where C1 or C2 do not
+      completely overshadow B.  Due to our other optimizations this case does
+      not represent much of a problem and we simply allow the pages to be
+      dead.  If the system runs low on memory it will swap them out, eating a
+      little swap, but that's it.</para>
+
+    <para>The advantage to the VM Object model is that
+      <function>fork()</function> is extremely fast, since no real data
+      copying need take place.  The disadvantage is that you can build a
+      relatively complex VM Object layering that slows page fault handling
+      down a little, and you spend memory managing the VM Object structures.
+      The optimizations FreeBSD makes proves to reduce the problems enough
+      that they can be ignored, leaving no real disadvantage.</para>
+  </sect1>
+
+  <sect1>
+    <title>SWAP Layers</title>
+
+    <para>Private data pages are initially either copy-on-write or zero-fill
+      pages.  When a change, and therefore a copy, is made, the original
+      backing object (usually a file) can no longer be used to save a copy of
+      the page when the VM system needs to reuse it for other purposes.  This
+      is where SWAP comes in.  SWAP is allocated to create backing store for
+      memory that does not otherwise have it.  FreeBSD allocates the swap
+      management structure for a VM Object only when it is actually needed.
+      However, the swap management structure has had problems
+      historically.</para>
+
+    <para>Under FreeBSD 3.x the swap management structure preallocates an
+      array that encompasses the entire object requiring swap backing
+      store&mdash;even if only a few pages of that object are swap-backed.
+      This creates a kernel memory fragmentation problem when large objects
+      are mapped, or processes with large runsizes (RSS) fork.  Also, in order
+      to keep track of swap space, a &lsquo;list of holes&rsquo; is kept in
+      kernel memory, and this tends to get severely fragmented as well.  Since
+      the 'list of holes' is a linear list, the swap allocation and freeing
+      performance is a non-optimal O(n)-per-page.  It also requires kernel
+      memory allocations to take place during the swap freeing process, and
+      that creates low memory deadlock problems.  The problem is further
+      exacerbated by holes created due to the interleaving algorithm.  Also,
+      the swap block map can become fragmented fairly easily resulting in
+      non-contiguous allocations. Kernel memory must also be allocated on the
+      fly for additional swap management structures when a swapout occurs.  It
+      is evident that there was plenty of room for improvement.</para>
+
+    <para>For FreeBSD 4.x, I completely rewrote the swap subsystem.  With this
+      rewrite, swap management structures are allocated through a hash table
+      rather than a linear array giving them a fixed allocation size and much
+      finer granularity.  Rather then using a linearly linked list to keep
+      track of swap space reservations, it now uses a bitmap of swap blocks
+      arranged in a radix tree structure with free-space hinting in the radix
+      node structures.  This effectively makes swap allocation and freeing an
+      O(1) operation.  The entire radix tree bitmap is also preallocated in
+      order to avoid having to allocate kernel memory during critical low
+      memory swapping operations.  After all, the system tends to swap when it
+      is low on memory so we should avoid allocating kernel memory at such
+      times in order to avoid potential deadlocks.  Finally, to reduce
+      fragmentation the radix tree is capable of allocating large contiguous
+      chunks at once, skipping over smaller fragmented chunks.  I did not take
+      the final step of having an 'allocating hint pointer' that would trundle
+      through a portion of swap as allocations were made in order to further
+      guarantee contiguous allocations or at least locality of reference, but
+      I ensured that such an addition could be made.</para>
+  </sect1>
+
+  <sect1>
+    <title>When to free a page</title>
+
+    <para>Since the VM system uses all available memory for disk caching,
+      there are usually very few truly-free pages.  The VM system depends on
+      being able to properly choose pages which are not in use to reuse for
+      new allocations.  Selecting the optimal pages to free is possibly the
+      single-most important function any VM system can perform because if it
+      makes a poor selection, the VM system may be forced to unnecessarily
+      retrieve pages from disk, seriously degrading system performance.</para>
+
+    <para>How much overhead are we willing to suffer in the critical path to
+      avoid freeing the wrong page?  Each wrong choice we make will cost us
+      hundreds of thousands of CPU cycles and a noticeable stall of the
+      affected processes, so we are willing to endure a significant amount of
+      overhead in order to be sure that the right page is chosen.  This is why
+      FreeBSD tends to outperform other systems when memory resources become
+      stressed.</para>
+
+    <para>The free page determination algorithm is built upon a history of the
+      use of memory pages.  To acquire this history, the system takes advantage
+      of a page-used bit feature that most hardware page tables have.</para>
+
+    <para>In any case, the page-used bit is cleared and at some later point
+      the VM system comes across the page again and sees that the page-used
+      bit has been set.  This indicates that the page is still being actively
+      used.  If the bit is still clear it is an indication that the page is not
+      being actively used.  By testing this bit periodically, a use history (in
+      the form of a counter) for the physical page is developed.  When the VM
+      system later needs to free up some pages, checking this history becomes
+      the cornerstone of determining the best candidate page to reuse.</para>
+
+    <sidebar>
+      <title>What if the hardware has no page-used bit?</title>
+
+      <para>For those platforms that do not have this feature, the system
+	actually emulates a page-used bit.  It unmaps or protects a page,
+	forcing a page fault if the page is accessed again.  When the page
+	fault is taken, the system simply marks the page as having been used
+	and unprotects the page so that it may be used.  While taking such page
+	faults just to determine if a page is being used appears to be an
+	expensive proposition, it is much less expensive than reusing the page
+	for some other purpose only to find that a process needs it back and
+	then have to go to disk.</para>
+    </sidebar>
+
+    <para>FreeBSD makes use of several page queues to further refine the
+      selection of pages to reuse as well as to determine when dirty pages
+      must be flushed to their backing store.  Since page tables are dynamic
+      entities under FreeBSD, it costs virtually nothing to unmap a page from
+      the address space of any processes using it.  When a page candidate has
+      been chosen based on the page-use counter, this is precisely what is
+      done.  The system must make a distinction between clean pages which can
+      theoretically be freed up at any time, and dirty pages which must first
+      be written to their backing store before being reusable.  When a page
+      candidate has been found it is moved to the inactive queue if it is
+      dirty, or the cache queue if it is clean.  A separate algorithm based on
+      the dirty-to-clean page ratio determines when dirty pages in the
+      inactive queue must be flushed to disk.  Once this is accomplished, the
+      flushed pages are moved from the inactive queue to the cache queue.  At
+      this point, pages in the cache queue can still be reactivated by a VM
+      fault at relatively low cost.  However, pages in the cache queue are
+      considered to be &lsquo;immediately freeable&rsquo; and will be reused
+      in an LRU (least-recently used) fashion when the system needs to
+      allocate new memory.</para>
+
+    <para>It is important to note that the FreeBSD VM system attempts to
+      separate clean and dirty pages for the express reason of avoiding
+      unnecessary flushes of dirty pages (which eats I/O bandwidth), nor does
+      it move pages between the various page queues gratuitously when the
+      memory subsystem is not being stressed.  This is why you will see some
+      systems with very low cache queue counts and high active queue counts
+      when doing a <command>systat -vm</command> command.  As the VM system
+      becomes more stressed, it makes a greater effort to maintain the various
+      page queues at the levels determined to be the most effective.  An urban
+      myth has circulated for years that Linux did a better job avoiding
+      swapouts than FreeBSD, but this in fact is not true.  What was actually
+      occurring was that FreeBSD was proactively paging out unused pages in
+      order to make room for more disk cache while Linux was keeping unused
+      pages in core and leaving less memory available for cache and process
+      pages.  I don't know whether this is still true today.</para>
+  </sect1>
+
+  <sect1>
+    <title>Pre-Faulting and Zeroing Optimizations</title>
+
+    <para>Taking a VM fault is not expensive if the underlying page is already
+      in core and can simply be mapped into the process, but it can become
+      expensive if you take a whole lot of them on a regular basis.  A good
+      example of this is running a program such as &man.ls.1; or &man.ps.1;
+      over and over again.  If the program binary is mapped into memory but
+      not mapped into the page table, then all the pages that will be accessed
+      by the program will have to be faulted in every time the program is run.
+      This is unnecessary when the pages in question are already in the VM
+      Cache, so FreeBSD will attempt to pre-populate a process's page tables
+      with those pages that are already in the VM Cache.  One thing that
+      FreeBSD does not yet do is pre-copy-on-write certain pages on exec.  For
+      example, if you run the &man.ls.1; program while running <command>vmstat
+	1</command> you will notice that it always takes a certain number of
+      page faults, even when you run it over and over again.  These are
+      zero-fill faults, not program code faults (which were pre-faulted in
+      already).  Pre-copying pages on exec or fork is an area that could use
+      more study.</para>
+
+    <para>A large percentage of page faults that occur are zero-fill faults.
+      You can usually see this by observing the <command>vmstat -s</command>
+      output.  These occur when a process accesses pages in its BSS area.  The
+      BSS area is expected to be initially zero but the VM system does not
+      bother to allocate any memory at all until the process actually accesses
+      it.  When a fault occurs the VM system must not only allocate a new page,
+      it must zero it as well.  To optimize the zeroing operation the VM system
+      has the ability to pre-zero pages and mark them as such, and to request
+      pre-zeroed pages when zero-fill faults occur.  The pre-zeroing occurs
+      whenever the CPU is idle but the number of pages the system pre-zeros is
+      limited in order to avoid blowing away the memory caches.  This is an
+      excellent example of adding complexity to the VM system in order to
+      optimize the critical path.</para>
+  </sect1>
+
+  <sect1>
+    <title>Page Table Optimizations</title>
+
+    <para>The page table optimizations make up the most contentious part of
+      the FreeBSD VM design and they have shown some strain with the advent of
+      serious use of <function>mmap()</function>.  I think this is actually a
+      feature of most BSDs though I am not sure when it was first introduced.
+      There are two major optimizations.  The first is that hardware page
+      tables do not contain persistent state but instead can be thrown away at
+      any time with only a minor amount of management overhead.  The second is
+      that every active page table entry in the system has a governing
+      <literal>pv_entry</literal> structure which is tied into the
+      <literal>vm_page</literal> structure.  FreeBSD can simply iterate
+      through those mappings that are known to exist while Linux must check
+      all page tables that <emphasis>might</emphasis> contain a specific
+      mapping to see if it does, which can achieve O(n^2) overhead in certain
+      situations.  It is because of this that FreeBSD tends to make better
+      choices on which pages to reuse or swap when memory is stressed, giving
+      it better performance under load. However, FreeBSD requires kernel
+      tuning to accommodate large-shared-address-space situations such as
+      those that can occur in a news system because it may run out of
+      <literal>pv_entry</literal> structures.</para>
+
+    <para>Both Linux and FreeBSD need work in this area.  FreeBSD is trying to
+      maximize the advantage of a potentially sparse active-mapping model (not
+      all processes need to map all pages of a shared library, for example),
+      whereas Linux is trying to simplify its algorithms.  FreeBSD generally
+      has the performance advantage here at the cost of wasting a little extra
+      memory, but FreeBSD breaks down in the case where a large file is
+      massively shared across hundreds of processes.  Linux, on the other hand,
+      breaks down in the case where many processes are sparsely-mapping the
+      same shared library and also runs non-optimally when trying to determine
+      whether a page can be reused or not.</para>
+  </sect1>
+
+  <sect1>
+    <title>Page Coloring</title>
+
+    <para>We'll end with the page coloring optimizations.  Page coloring is a
+      performance optimization designed to ensure that accesses to contiguous
+      pages in virtual memory make the best use of the processor cache.  In
+      ancient times (i.e. 10+ years ago) processor caches tended to map
+      virtual memory rather than physical memory.  This led to a huge number of
+      problems including having to clear the cache on every context switch in
+      some cases, and problems with data aliasing in the cache.  Modern
+      processor caches map physical memory precisely to solve those problems.
+      This means that two side-by-side pages in a processes address space may
+      not correspond to two side-by-side pages in the cache.  In fact, if you
+      aren't careful side-by-side pages in virtual memory could wind up using
+      the same page in the processor cache&mdash;leading to cacheable data
+      being thrown away prematurely and reducing CPU performance.  This is true
+      even with multi-way set-associative caches (though the effect is
+      mitigated somewhat).</para>
+
+    <para>FreeBSD's memory allocation code implements page coloring
+      optimizations, which means that the memory allocation code will attempt
+      to locate free pages that are contiguous from the point of view of the
+      cache.  For example, if page 16 of physical memory is assigned to page 0
+      of a process's virtual memory and the cache can hold 4 pages, the page
+      coloring code will not assign page 20 of physical memory to page 1 of a
+      process's virtual memory.  It would, instead, assign page 21 of physical
+      memory.  The page coloring code attempts to avoid assigning page 20
+      because this maps over the same cache memory as page 16 and would result
+      in non-optimal caching.  This code adds a significant amount of
+      complexity to the VM memory allocation subsystem as you can well
+      imagine, but the result is well worth the effort.  Page Coloring makes VM
+      memory as deterministic as physical memory in regards to cache
+      performance.</para>
+  </sect1>
+
+  <sect1>
+    <title>Conclusion</title>
+
+    <para>Virtual memory in modern operating systems must address a number of
+      different issues efficiently and for many different usage patterns.  The
+      modular and algorithmic approach that BSD has historically taken allows
+      us to study and understand the current implementation as well as
+      relatively cleanly replace large sections of the code.  There have been a
+      number of improvements to the FreeBSD VM system in the last several
+      years, and work is ongoing.</para>
+  </sect1>
+
+  <sect1>
+    <title>Bonus QA session by Allen Briggs
+      <email>briggs@ninthwonder.com</email></title>
+
+    <qandaset>
+      <qandaentry>
+	<question>
+	  <para>What is &ldquo;the interleaving algorithm&rdquo; that you
+	    refer to in your listing of the ills of the FreeBSD 3.x swap
+	    arrangments?</para>
+	</question>
+
+	<answer>
+	  <para>FreeBSD uses a fixed swap interleave which defaults to 4.  This
+	    means that FreeBSD reserves space for four swap areas even if you
+	    only have one, two, or three.  Since swap is interleaved the linear
+	    address space representing the &lsquo;four swap areas&rsquo; will be
+	    fragmented if you don't actually have four swap areas.  For
+	    example, if you have two swap areas A and B FreeBSD's address
+	    space representation for that swap area will be interleaved in
+	    blocks of 16 pages:</para>
+
+	  <literallayout>A B C D A B C D A B C D A B C D</literallayout>
+
+	  <para>FreeBSD 3.x uses a &lsquo;sequential list of free
+	    regions&rsquo; approach to accounting for the free swap areas.
+	    The idea is that large blocks of free linear space can be
+	    represented with a single list node
+	    (<filename>kern/subr_rlist.c</filename>).  But due to the
+	    fragmentation the sequential list winds up being insanely
+	    fragmented.  In the above example, completely unused swap will
+	    have A and B shown as &lsquo;free&rsquo; and C and D shown as
+	    &lsquo;all allocated&rsquo;.  Each A-B sequence requires a list
+	    node to account for because C and D are holes, so the list node
+	    cannot be combined with the next A-B sequence.</para>
+
+	  <para>Why do we interleave our swap space instead of just tack swap
+	    areas onto the end and do something fancier?  Because it's a whole
+	    lot easier to allocate linear swaths of an address space and have
+	    the result automatically be interleaved across multiple disks than
+	    it is to try to put that sophistication elsewhere.</para>
+
+	  <para>The fragmentation causes other problems.  Being a linear list
+	    under 3.x, and having such a huge amount of inherent
+	    fragmentation, allocating and freeing swap winds up being an O(N)
+	    algorithm instead of an O(1) algorithm.  Combined with other
+	    factors (heavy swapping) and you start getting into O(N^2) and
+	    O(N^3) levels of overhead, which is bad.  The 3.x system may also
+	    need to allocate KVM during a swap operation to create a new list
+	    node which can lead to a deadlock if the system is trying to
+	    pageout pages in a low-memory situation.</para>
+
+	  <para>Under 4.x we do not use a sequential list.  Instead we use a
+	    radix tree and bitmaps of swap blocks rather than ranged list
+	    nodes.  We take the hit of preallocating all the bitmaps required
+	    for the entire swap area up front but it winds up wasting less
+	    memory due to the use of a bitmap (one bit per block) instead of a
+	    linked list of nodes.  The use of a radix tree instead of a
+	    sequential list gives us nearly O(1) performance no matter how
+	    fragmented the tree becomes.</para>
+	</answer>
+      </qandaentry>
+
+      <qandaentry>
+	<question>
+	  <para>I don't get the following:</para>
+
+	  <blockquote>
+	    <para>It is important to note that the FreeBSD VM system attempts
+	      to separate clean and dirty pages for the express reason of
+	      avoiding unnecessary flushes of dirty pages (which eats I/O
+	      bandwidth), nor does it move pages between the various page
+	      queues gratitously when the memory subsystem is not being
+	      stressed.  This is why you will see some systems with very low
+	      cache queue counts and high active queue counts when doing a
+	      <command>systat -vm</command> command.</para>
+	  </blockquote>
+	  
+	  <para>How is the separation of clean and dirty (inactive) pages
+	    related to the situation where you see low cache queue counts and
+	    high active queue counts in <command>systat -vm</command>?  Do the
+	    systat stats roll the active and dirty pages together for the
+	    active queue count?</para>
+	</question>
+
+	<answer>
+	  <para>Yes, that is confusing.  The relationship is
+	    &ldquo;goal&rdquo; verses &ldquo;reality&rdquo;.  Our goal is to
+	    separate the pages but the reality is that if we are not in a
+	    memory crunch, we don't really have to.</para>
+
+	  <para>What this means is that FreeBSD will not try very hard to
+	    separate out dirty pages (inactive queue) from clean pages (cache
+	    queue) when the system is not being stressed, nor will it try to
+	    deactivate pages (active queue -> inactive queue) when the system
+	    is not being stressed, even if they aren't being used.</para>
+	</answer>
+      </qandaentry>
+
+      <qandaentry>
+	<question>
+	  <para> In the &man.ls.1; / <command>vmstat 1</command> example,
+	    wouldn't some of the page faults be data page faults (COW from
+	    executable file to private page)?  I.e., I would expect the page
+	    faults to be some zero-fill and some program data.  Or are you
+	    implying that FreeBSD does do pre-COW for the program data?</para>
+	</question>
+
+	<answer>
+	  <para>A COW fault can be either zero-fill or program-data.  The
+	    mechanism is the same either way because the backing program-data
+	    is almost certainly already in the cache.  I am indeed lumping the
+	    two together.  FreeBSD does not pre-COW program data or zero-fill,
+	    but it <emphasis>does</emphasis> pre-map pages that exist in its
+	    cache.</para>
+	</answer>
+      </qandaentry>
+
+      <qandaentry>
+	<question>
+	  <para>In your section on page table optimizations, can you give a
+	    little more detail about <literal>pv_entry</literal> and
+	    <literal>vm_page</literal> (or should vm_page be
+	    <literal>vm_pmap</literal>&mdash;as in 4.4, cf. pp. 180-181 of
+	    McKusick, Bostic, Karel, Quarterman)?  Specifically, what kind of
+	    operation/reaction would require scanning the mappings?</para>
+
+	  <para>How does Linux do in the case where FreeBSD breaks down
+	    (sharing a large file mapping over many processes)?</para>
+	</question>
+
+	<answer>
+	  <para>A <literal>vm_page</literal> represents an (object,index#)
+	    tuple.  A <literal>pv_entry</literal> represents a hardware page
+	    table entry (pte).  If you have five processes sharing the same
+	    physical page, and three of those processes's page tables actually
+	    map the page, that page will be represented by a single
+	    <literal>vm_page</literal> structure and three
+	    <literal>pv_entry</literal> structures.</para>
+
+	  <para><literal>pv_entry</literal> structures only represent pages
+	    mapped by the MMU (one <literal>pv_entry</literal> represnts one
+	    pte).  This means that when we need to remove all hardware
+	    references to a <literal>vm_page</literal> (in order to reuse the
+	    page for something else, page it out, clear it, dirty it, and so
+	    forth) we can simply scan the linked list of
+	    <literal>pv_entry</literal>'s associated with that
+	    <literal>vm_page</literal> to remove or modify the pte's from
+	    their page tables.</para>
+
+	  <para>Under Linux there is no such linked list.  In order to remove
+	    all the hardware page table mappings for a
+	    <literal>vm_page</literal> linux must index into every VM object
+	    that <emphasis>might</emphasis> have mapped the page.  For
+	    example, if you have 50 processes all mapping the same shared
+	    library and want to get rid of page X in that library, you need to
+	    index into the page table for each of those 50 processes even if
+	    only 10 of them have actually mapped the page.  So Linux is
+	    trading off the simplicity of its design against performance.
+	    Many VM algorithms which are O(1) or (small N) under FreeBSD wind
+	    up being O(N), O(N^2), or worse under Linux.  Since the pte's
+	    representing a particular page in an object tend to be at the same
+	    offset in all the page tables they are mapped in, reducing the
+	    number of accesses into the page tables at the same pte offset
+	    will often avoid blowing away the L1 cache line for that offset,
+	    which can lead to better performance.</para>
+
+	  <para>FreeBSD has added complexity (the <literal>pv_entry</literal>
+	    scheme) in order to increase performance (to limit page table
+	    accesses to <emphasis>only</emphasis> those pte's that need to be
+	    modified).</para>
+
+	  <para>But FreeBSD has a scaling problem that Linux does not in that
+	    there are a limited number of <literal>pv_entry</literal>
+	    structures and this causes problems when you have massive sharing
+	    of data.  In this case you may run out of
+	    <literal>pv_entry</literal> structures even though there is plenty
+	    of free memory available.  This can be fixed easily enough by
+	    bumping up the number of <literal>pv_entry</literal> structures in
+	    the kernel config, but we really need to find a better way to do
+	    it.</para>
+
+	  <para>In regards to the memory overhead of a page table verses the
+	    <literal>pv_entry</literal> scheme: Linux uses
+	    &lsquo;permanent&rsquo; page tables that are not throw away, but
+	    does not need a <literal>pv_entry</literal> for each potentially
+	    mapped pte.  FreeBSD uses &lsquo;throw away&rsquo; page tables but
+	    adds in a <literal>pv_entry</literal> structure for each
+	    actually-mapped pte.  I think memory utilization winds up being
+	    about the same, giving FreeBSD an algorithmic advantage with its
+	    ability to throw away page tables at will with very low
+	    overhead.</para>
+	</answer>
+      </qandaentry>
+
+      <qandaentry>
+	<question>
+	  <para>Finally, in the page coloring section, it might help to have a
+	    little more description of what you mean here.  I didn't quite
+	    follow it.</para>
+	</question>
+
+	<answer>
+	  <para>Do you know how an L1 hardware memory cache works?  I'll
+	    explain: Consider a machine with 16MB of main memory but only 128K
+	    of L1 cache.  Generally the way this cache works is that each 128K
+	    block of main memory uses the <emphasis>same</emphasis> 128K of
+	    cache.  If you access offset 0 in main memory and then offset
+	    offset 128K in main memory you can wind up throwing away the
+	    cached data you read from offset 0!</para>
+
+	  <para>Now, I am simplifying things greatly.  What I just described
+	    is what is called a &lsquo;direct mapped&rsquo; hardware memory
+	    cache.  Most modern caches are what are called
+	    2-way-set-associative or 4-way-set-associative caches.  The
+	    set-associatively allows you to access up to N different memory
+	    regions that overlap the same cache memory without destroying the
+	    previously cached data.  But only N.</para>
+
+	  <para>So if I have a 4-way set associative cache I can access offset
+	    0, offset 128K, 256K and offset 384K and still be able to access
+	    offset 0 again and have it come from the L1 cache.  If I then
+	    access offset 512K, however, one of the four previously cached
+	    data objects will be thrown away by the cache.</para>
+
+	  <para>It is extremely important&hellip;
+	    <emphasis>extremely</emphasis> important for most of a processor's
+	    memory accesses to be able to come from the L1 cache, because the
+	    L1 cache operates at the processor frequency.  The moment you have
+	    an L1 cahe miss and have to go to the L2 cache or to main memory,
+	    the processor will stall and potentially sit twidling its fingers
+	    for <emphasis>hundreds</emphasis> of instructions worth of time
+	    waiting for a read from main memory to complete.  Main memory (the
+	    dynamic ram you stuff into a computer) is
+	    <emphasis>slow</emphasis>, when compared to the speed of a modern
+	    processor core.</para>
+
+	  <para>Ok, so now onto page coloring: All modern memory caches are
+	    what are known as <emphasis>physical</emphasis> caches.  They
+	    cache physical memory addresses, not virtual memory addresses.
+	    This allows the cache to be left alone across a process context
+	    switch, which is very important.</para>
+
+	  <para>But in the UNIX world you are dealing with virtual address
+	    spaces, not physical address spaces.  Any program you write will
+	    see the virtual address space given to it.  The actual
+	    <emphasis>physical</emphasis> pages underlying that virtual
+	    address space are not necessarily physically contiguous! In fact,
+	    you might have two pages that are side by side in a processes
+	    address space which wind up being at offset 0 and offset 128K in
+	    <emphasis>physical</emphasis> memory.</para>
+
+	  <para>A program normally assumes that two side-by-side pages will be
+	    optimally cached.  That is, that you can access data objects in
+	    both pages without having them blow away each other's cache entry.
+	    But this is only true if the physical pages underlying the virtual
+	    address space are contiguous (insofar as the cache is
+	    concerned).</para>
+
+	  <para>This is what Page coloring does.  Instead of assigning
+	    <emphasis>random</emphasis> physical pages to virtual addresses,
+	    which may result in non-optimal cache performance , Page coloring
+	    assigns <emphasis>reasonably-contiguous</emphasis> physical pages
+	    to virtual addresses.  Thus programs can be written under the
+	    assumption that the characteristics of the underlying hardware
+	    cache are the same for their virtual address space as they would
+	    be if the program had been run directly in a physical address
+	    space.</para>
+
+	  <para>Note that I say &lsquo;reasonably&rsquo; contiguous rather
+	    than simply &lsquo;contiguous&rsquo;.  From the point of view of a
+	    128K direct mapped cache, the physical address 0 is the same as
+	    the physical address 128K.  So two side-by-side pages in your
+	    virtual address space may wind up being offset 128K and offset
+	    132K in physical memory, but could also easily be offset 128K and
+	    offset 4K in physical memory and still retain the same cache
+	    performance characteristics.  So page-coloring does
+	    <emphasis>not</emphasis> have to assign truly contiguous pages of
+	    physical memory to contiguous pages of virtual memory, it just
+	    needs to make sure it assigns contiguous pages from the point of
+	    view of cache performance and operation.</para>
+	</answer>
+      </qandaentry>
+    </qandaset>
+  </sect1>
+</article>
--- a/en_US.ISO8859-1/articles/vm-design/fig1.eps
+++ b/en_US.ISO8859-1/articles/vm-design/fig1.eps
@ -0,0 +1,104 @@
+%!PS-Adobe-2.0 EPSF-2.0
+%%Title: fig1.eps
+%%Creator: fig2dev Version 3.2.3 Patchlevel 
+%%CreationDate: Sun Oct  8 19:54:25 2000
+%%For: nik@canyon.nothing-going-on.org (Nik Clayton)
+%%BoundingBox: 0 0 119 65
+%%Magnification: 1.0000
+%%EndComments
+/$F2psDict 200 dict def
+$F2psDict begin
+$F2psDict /mtrx matrix put
+/col-1 {0 setgray} bind def
+/col0 {0.000 0.000 0.000 srgb} bind def
+/col1 {0.000 0.000 1.000 srgb} bind def
+/col2 {0.000 1.000 0.000 srgb} bind def
+/col3 {0.000 1.000 1.000 srgb} bind def
+/col4 {1.000 0.000 0.000 srgb} bind def
+/col5 {1.000 0.000 1.000 srgb} bind def
+/col6 {1.000 1.000 0.000 srgb} bind def
+/col7 {1.000 1.000 1.000 srgb} bind def
+/col8 {0.000 0.000 0.560 srgb} bind def
+/col9 {0.000 0.000 0.690 srgb} bind def
+/col10 {0.000 0.000 0.820 srgb} bind def
+/col11 {0.530 0.810 1.000 srgb} bind def
+/col12 {0.000 0.560 0.000 srgb} bind def
+/col13 {0.000 0.690 0.000 srgb} bind def
+/col14 {0.000 0.820 0.000 srgb} bind def
+/col15 {0.000 0.560 0.560 srgb} bind def
+/col16 {0.000 0.690 0.690 srgb} bind def
+/col17 {0.000 0.820 0.820 srgb} bind def
+/col18 {0.560 0.000 0.000 srgb} bind def
+/col19 {0.690 0.000 0.000 srgb} bind def
+/col20 {0.820 0.000 0.000 srgb} bind def
+/col21 {0.560 0.000 0.560 srgb} bind def
+/col22 {0.690 0.000 0.690 srgb} bind def
+/col23 {0.820 0.000 0.820 srgb} bind def
+/col24 {0.500 0.190 0.000 srgb} bind def
+/col25 {0.630 0.250 0.000 srgb} bind def
+/col26 {0.750 0.380 0.000 srgb} bind def
+/col27 {1.000 0.500 0.500 srgb} bind def
+/col28 {1.000 0.630 0.630 srgb} bind def
+/col29 {1.000 0.750 0.750 srgb} bind def
+/col30 {1.000 0.880 0.880 srgb} bind def
+/col31 {1.000 0.840 0.000 srgb} bind def
+
+end
+save
+newpath 0 65 moveto 0 0 lineto 119 0 lineto 119 65 lineto closepath clip newpath
+-143.0 298.0 translate
+1 -1 scale
+
+/cp {closepath} bind def
+/ef {eofill} bind def
+/gr {grestore} bind def
+/gs {gsave} bind def
+/sa {save} bind def
+/rs {restore} bind def
+/l {lineto} bind def
+/m {moveto} bind def
+/rm {rmoveto} bind def
+/n {newpath} bind def
+/s {stroke} bind def
+/sh {show} bind def
+/slc {setlinecap} bind def
+/slj {setlinejoin} bind def
+/slw {setlinewidth} bind def
+/srgb {setrgbcolor} bind def
+/rot {rotate} bind def
+/sc {scale} bind def
+/sd {setdash} bind def
+/ff {findfont} bind def
+/sf {setfont} bind def
+/scf {scalefont} bind def
+/sw {stringwidth} bind def
+/tr {translate} bind def
+/tnt {dup dup currentrgbcolor
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb}
+  bind def
+/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul
+  4 -2 roll mul srgb} bind def
+/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def
+/$F2psEnd {$F2psEnteredState restore end} def
+
+$F2psBegin
+%%Page: 1 1
+10 setmiterlimit
+ 0.06000 0.06000 sc
+% Polyline
+7.500 slw
+n 2400 4200 m 4050 4200 l 4050 4950 l 2400 4950 l
+ cp gs col0 s gr 
+% Polyline
+n 4050 4200 m
+ 4350 3900 l gs col0 s gr 
+% Polyline
+n 2400 4200 m 2700 3900 l 4350 3900 l 4350 4650 l
+ 4050 4950 l gs col0 s gr 
+/Helvetica-Bold ff 180.00 scf sf
+3225 4650 m
+gs 1 -1 sc (A) dup sw pop 2 div neg 0 rm  col0 sh gr
+$F2psEnd
+rs
--- a/en_US.ISO8859-1/articles/vm-design/fig2.eps
+++ b/en_US.ISO8859-1/articles/vm-design/fig2.eps
@ -0,0 +1,115 @@
+%!PS-Adobe-2.0 EPSF-2.0
+%%Title: fig2.eps
+%%Creator: fig2dev Version 3.2.3 Patchlevel 
+%%CreationDate: Sun Oct  8 19:55:31 2000
+%%For: nik@canyon.nothing-going-on.org (Nik Clayton)
+%%BoundingBox: 0 0 120 110
+%%Magnification: 1.0000
+%%EndComments
+/$F2psDict 200 dict def
+$F2psDict begin
+$F2psDict /mtrx matrix put
+/col-1 {0 setgray} bind def
+/col0 {0.000 0.000 0.000 srgb} bind def
+/col1 {0.000 0.000 1.000 srgb} bind def
+/col2 {0.000 1.000 0.000 srgb} bind def
+/col3 {0.000 1.000 1.000 srgb} bind def
+/col4 {1.000 0.000 0.000 srgb} bind def
+/col5 {1.000 0.000 1.000 srgb} bind def
+/col6 {1.000 1.000 0.000 srgb} bind def
+/col7 {1.000 1.000 1.000 srgb} bind def
+/col8 {0.000 0.000 0.560 srgb} bind def
+/col9 {0.000 0.000 0.690 srgb} bind def
+/col10 {0.000 0.000 0.820 srgb} bind def
+/col11 {0.530 0.810 1.000 srgb} bind def
+/col12 {0.000 0.560 0.000 srgb} bind def
+/col13 {0.000 0.690 0.000 srgb} bind def
+/col14 {0.000 0.820 0.000 srgb} bind def
+/col15 {0.000 0.560 0.560 srgb} bind def
+/col16 {0.000 0.690 0.690 srgb} bind def
+/col17 {0.000 0.820 0.820 srgb} bind def
+/col18 {0.560 0.000 0.000 srgb} bind def
+/col19 {0.690 0.000 0.000 srgb} bind def
+/col20 {0.820 0.000 0.000 srgb} bind def
+/col21 {0.560 0.000 0.560 srgb} bind def
+/col22 {0.690 0.000 0.690 srgb} bind def
+/col23 {0.820 0.000 0.820 srgb} bind def
+/col24 {0.500 0.190 0.000 srgb} bind def
+/col25 {0.630 0.250 0.000 srgb} bind def
+/col26 {0.750 0.380 0.000 srgb} bind def
+/col27 {1.000 0.500 0.500 srgb} bind def
+/col28 {1.000 0.630 0.630 srgb} bind def
+/col29 {1.000 0.750 0.750 srgb} bind def
+/col30 {1.000 0.880 0.880 srgb} bind def
+/col31 {1.000 0.840 0.000 srgb} bind def
+
+end
+save
+newpath 0 110 moveto 0 0 lineto 120 0 lineto 120 110 lineto closepath clip newpath
+-174.0 370.0 translate
+1 -1 scale
+
+/cp {closepath} bind def
+/ef {eofill} bind def
+/gr {grestore} bind def
+/gs {gsave} bind def
+/sa {save} bind def
+/rs {restore} bind def
+/l {lineto} bind def
+/m {moveto} bind def
+/rm {rmoveto} bind def
+/n {newpath} bind def
+/s {stroke} bind def
+/sh {show} bind def
+/slc {setlinecap} bind def
+/slj {setlinejoin} bind def
+/slw {setlinewidth} bind def
+/srgb {setrgbcolor} bind def
+/rot {rotate} bind def
+/sc {scale} bind def
+/sd {setdash} bind def
+/ff {findfont} bind def
+/sf {setfont} bind def
+/scf {scalefont} bind def
+/sw {stringwidth} bind def
+/tr {translate} bind def
+/tnt {dup dup currentrgbcolor
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb}
+  bind def
+/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul
+  4 -2 roll mul srgb} bind def
+/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def
+/$F2psEnd {$F2psEnteredState restore end} def
+
+$F2psBegin
+%%Page: 1 1
+10 setmiterlimit
+ 0.06000 0.06000 sc
+/Helvetica-Bold ff 180.00 scf sf
+3750 5100 m
+gs 1 -1 sc (B) dup sw pop 2 div neg 0 rm  col0 sh gr
+% Polyline
+7.500 slw
+n 4871 5100 m 4879 5100 l gs col0 s gr
+% Polyline
+n 2925 5400 m 4575 5400 l 4575 6150 l 2925 6150 l
+ cp gs col0 s gr 
+% Polyline
+n 4575 4650 m
+ 4875 4350 l gs col0 s gr 
+% Polyline
+n 2925 4650 m 4575 4650 l 4575 5400 l 2925 5400 l
+ cp gs col0 s gr 
+% Polyline
+n 2925 4650 m 3225 4350 l 4875 4350 l 4875 5100 l
+ 4575 5400 l gs col0 s gr 
+/Helvetica-Bold ff 180.00 scf sf
+3750 5850 m
+gs 1 -1 sc (A) dup sw pop 2 div neg 0 rm  col0 sh gr
+% Polyline
+n 4875 5100 m 4875 5850 l
+ 4575 6150 l gs col0 s gr 
+$F2psEnd
+rs
--- a/en_US.ISO8859-1/articles/vm-design/fig3.eps
+++ b/en_US.ISO8859-1/articles/vm-design/fig3.eps
@ -0,0 +1,133 @@
+%!PS-Adobe-2.0 EPSF-2.0
+%%Title: fig3.eps
+%%Creator: fig2dev Version 3.2.3 Patchlevel 
+%%CreationDate: Sun Oct  8 19:53:51 2000
+%%For: nik@canyon.nothing-going-on.org (Nik Clayton)
+%%BoundingBox: 0 0 120 155
+%%Magnification: 1.0000
+%%EndComments
+/$F2psDict 200 dict def
+$F2psDict begin
+$F2psDict /mtrx matrix put
+/col-1 {0 setgray} bind def
+/col0 {0.000 0.000 0.000 srgb} bind def
+/col1 {0.000 0.000 1.000 srgb} bind def
+/col2 {0.000 1.000 0.000 srgb} bind def
+/col3 {0.000 1.000 1.000 srgb} bind def
+/col4 {1.000 0.000 0.000 srgb} bind def
+/col5 {1.000 0.000 1.000 srgb} bind def
+/col6 {1.000 1.000 0.000 srgb} bind def
+/col7 {1.000 1.000 1.000 srgb} bind def
+/col8 {0.000 0.000 0.560 srgb} bind def
+/col9 {0.000 0.000 0.690 srgb} bind def
+/col10 {0.000 0.000 0.820 srgb} bind def
+/col11 {0.530 0.810 1.000 srgb} bind def
+/col12 {0.000 0.560 0.000 srgb} bind def
+/col13 {0.000 0.690 0.000 srgb} bind def
+/col14 {0.000 0.820 0.000 srgb} bind def
+/col15 {0.000 0.560 0.560 srgb} bind def
+/col16 {0.000 0.690 0.690 srgb} bind def
+/col17 {0.000 0.820 0.820 srgb} bind def
+/col18 {0.560 0.000 0.000 srgb} bind def
+/col19 {0.690 0.000 0.000 srgb} bind def
+/col20 {0.820 0.000 0.000 srgb} bind def
+/col21 {0.560 0.000 0.560 srgb} bind def
+/col22 {0.690 0.000 0.690 srgb} bind def
+/col23 {0.820 0.000 0.820 srgb} bind def
+/col24 {0.500 0.190 0.000 srgb} bind def
+/col25 {0.630 0.250 0.000 srgb} bind def
+/col26 {0.750 0.380 0.000 srgb} bind def
+/col27 {1.000 0.500 0.500 srgb} bind def
+/col28 {1.000 0.630 0.630 srgb} bind def
+/col29 {1.000 0.750 0.750 srgb} bind def
+/col30 {1.000 0.880 0.880 srgb} bind def
+/col31 {1.000 0.840 0.000 srgb} bind def
+
+end
+save
+newpath 0 155 moveto 0 0 lineto 120 0 lineto 120 155 lineto closepath clip newpath
+-174.0 370.0 translate
+1 -1 scale
+
+/cp {closepath} bind def
+/ef {eofill} bind def
+/gr {grestore} bind def
+/gs {gsave} bind def
+/sa {save} bind def
+/rs {restore} bind def
+/l {lineto} bind def
+/m {moveto} bind def
+/rm {rmoveto} bind def
+/n {newpath} bind def
+/s {stroke} bind def
+/sh {show} bind def
+/slc {setlinecap} bind def
+/slj {setlinejoin} bind def
+/slw {setlinewidth} bind def
+/srgb {setrgbcolor} bind def
+/rot {rotate} bind def
+/sc {scale} bind def
+/sd {setdash} bind def
+/ff {findfont} bind def
+/sf {setfont} bind def
+/scf {scalefont} bind def
+/sw {stringwidth} bind def
+/tr {translate} bind def
+/tnt {dup dup currentrgbcolor
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb}
+  bind def
+/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul
+  4 -2 roll mul srgb} bind def
+/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def
+/$F2psEnd {$F2psEnteredState restore end} def
+
+$F2psBegin
+%%Page: 1 1
+10 setmiterlimit
+ 0.06000 0.06000 sc
+/Helvetica-Bold ff 180.00 scf sf
+4125 4350 m
+gs 1 -1 sc (C2) dup sw pop 2 div neg 0 rm  col0 sh gr
+% Polyline
+7.500 slw
+n 4871 5100 m 4879 5100 l gs col0 s gr
+% Polyline
+n 2925 5400 m 4575 5400 l 4575 6150 l 2925 6150 l
+ cp gs col0 s gr 
+% Polyline
+n 4575 4650 m
+ 4875 4350 l gs col0 s gr 
+% Polyline
+n 2925 4650 m 4575 4650 l 4575 5400 l 2925 5400 l
+ cp gs col0 s gr 
+% Polyline
+n 4875 3600 m 4875 5100 l
+ 4575 5400 l gs col0 s gr 
+% Polyline
+n 2925 4650 m 2925 3900 l 3225 3600 l
+ 4875 3600 l gs col0 s gr 
+% Polyline
+n 2925 3900 m 4425 3900 l 4575 3900 l
+ 4875 3600 l gs col0 s gr 
+% Polyline
+n 4575 4650 m
+ 4575 3900 l gs col0 s gr 
+% Polyline
+n 3750 4650 m 3750 3900 l
+ 4050 3600 l gs col0 s gr 
+/Helvetica-Bold ff 180.00 scf sf
+3750 5850 m
+gs 1 -1 sc (A) dup sw pop 2 div neg 0 rm  col0 sh gr
+/Helvetica-Bold ff 180.00 scf sf
+3750 5100 m
+gs 1 -1 sc (B) dup sw pop 2 div neg 0 rm  col0 sh gr
+/Helvetica-Bold ff 180.00 scf sf
+3375 4350 m
+gs 1 -1 sc (C1) dup sw pop 2 div neg 0 rm  col0 sh gr
+% Polyline
+n 4875 5100 m 4875 5850 l
+ 4575 6150 l gs col0 s gr 
+$F2psEnd
+rs
--- a/en_US.ISO8859-1/articles/vm-design/fig4.eps
+++ b/en_US.ISO8859-1/articles/vm-design/fig4.eps
@ -0,0 +1,133 @@
+%!PS-Adobe-2.0 EPSF-2.0
+%%Title: fig4.eps
+%%Creator: fig2dev Version 3.2.3 Patchlevel 
+%%CreationDate: Sun Oct  8 19:55:53 2000
+%%For: nik@canyon.nothing-going-on.org (Nik Clayton)
+%%BoundingBox: 0 0 120 155
+%%Magnification: 1.0000
+%%EndComments
+/$F2psDict 200 dict def
+$F2psDict begin
+$F2psDict /mtrx matrix put
+/col-1 {0 setgray} bind def
+/col0 {0.000 0.000 0.000 srgb} bind def
+/col1 {0.000 0.000 1.000 srgb} bind def
+/col2 {0.000 1.000 0.000 srgb} bind def
+/col3 {0.000 1.000 1.000 srgb} bind def
+/col4 {1.000 0.000 0.000 srgb} bind def
+/col5 {1.000 0.000 1.000 srgb} bind def
+/col6 {1.000 1.000 0.000 srgb} bind def
+/col7 {1.000 1.000 1.000 srgb} bind def
+/col8 {0.000 0.000 0.560 srgb} bind def
+/col9 {0.000 0.000 0.690 srgb} bind def
+/col10 {0.000 0.000 0.820 srgb} bind def
+/col11 {0.530 0.810 1.000 srgb} bind def
+/col12 {0.000 0.560 0.000 srgb} bind def
+/col13 {0.000 0.690 0.000 srgb} bind def
+/col14 {0.000 0.820 0.000 srgb} bind def
+/col15 {0.000 0.560 0.560 srgb} bind def
+/col16 {0.000 0.690 0.690 srgb} bind def
+/col17 {0.000 0.820 0.820 srgb} bind def
+/col18 {0.560 0.000 0.000 srgb} bind def
+/col19 {0.690 0.000 0.000 srgb} bind def
+/col20 {0.820 0.000 0.000 srgb} bind def
+/col21 {0.560 0.000 0.560 srgb} bind def
+/col22 {0.690 0.000 0.690 srgb} bind def
+/col23 {0.820 0.000 0.820 srgb} bind def
+/col24 {0.500 0.190 0.000 srgb} bind def
+/col25 {0.630 0.250 0.000 srgb} bind def
+/col26 {0.750 0.380 0.000 srgb} bind def
+/col27 {1.000 0.500 0.500 srgb} bind def
+/col28 {1.000 0.630 0.630 srgb} bind def
+/col29 {1.000 0.750 0.750 srgb} bind def
+/col30 {1.000 0.880 0.880 srgb} bind def
+/col31 {1.000 0.840 0.000 srgb} bind def
+
+end
+save
+newpath 0 155 moveto 0 0 lineto 120 0 lineto 120 155 lineto closepath clip newpath
+-174.0 370.0 translate
+1 -1 scale
+
+/cp {closepath} bind def
+/ef {eofill} bind def
+/gr {grestore} bind def
+/gs {gsave} bind def
+/sa {save} bind def
+/rs {restore} bind def
+/l {lineto} bind def
+/m {moveto} bind def
+/rm {rmoveto} bind def
+/n {newpath} bind def
+/s {stroke} bind def
+/sh {show} bind def
+/slc {setlinecap} bind def
+/slj {setlinejoin} bind def
+/slw {setlinewidth} bind def
+/srgb {setrgbcolor} bind def
+/rot {rotate} bind def
+/sc {scale} bind def
+/sd {setdash} bind def
+/ff {findfont} bind def
+/sf {setfont} bind def
+/scf {scalefont} bind def
+/sw {stringwidth} bind def
+/tr {translate} bind def
+/tnt {dup dup currentrgbcolor
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb}
+  bind def
+/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul
+  4 -2 roll mul srgb} bind def
+/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def
+/$F2psEnd {$F2psEnteredState restore end} def
+
+$F2psBegin
+%%Page: 1 1
+10 setmiterlimit
+ 0.06000 0.06000 sc
+/Helvetica-Bold ff 180.00 scf sf
+3375 4350 m
+gs 1 -1 sc (C1) dup sw pop 2 div neg 0 rm  col0 sh gr
+% Polyline
+7.500 slw
+n 4871 5100 m 4879 5100 l gs col0 s gr
+% Polyline
+n 2925 5400 m 4575 5400 l 4575 6150 l 2925 6150 l
+ cp gs col0 s gr 
+% Polyline
+n 4575 4650 m
+ 4875 4350 l gs col0 s gr 
+% Polyline
+n 2925 4650 m 4575 4650 l 4575 5400 l 2925 5400 l
+ cp gs col0 s gr 
+% Polyline
+n 4875 4350 m 4875 5100 l
+ 4575 5400 l gs col0 s gr 
+% Polyline
+n 2925 4650 m 2925 3900 l 3225 3600 l
+ 4050 3600 l gs col0 s gr 
+% Polyline
+n 3750 4650 m 3750 3900 l
+ 4050 3600 l gs col0 s gr 
+% Polyline
+n 2925 3900 m
+ 3750 3900 l gs col0 s gr 
+% Polyline
+n 3750 4650 m 4050 4350 l
+ 4875 4350 l gs col0 s gr 
+% Polyline
+n 4050 4350 m
+ 4050 3600 l gs col0 s gr 
+/Helvetica-Bold ff 180.00 scf sf
+3750 5850 m
+gs 1 -1 sc (A) dup sw pop 2 div neg 0 rm  col0 sh gr
+/Helvetica-Bold ff 180.00 scf sf
+3750 5100 m
+gs 1 -1 sc (B) dup sw pop 2 div neg 0 rm  col0 sh gr
+% Polyline
+n 4875 5100 m 4875 5850 l
+ 4575 6150 l gs col0 s gr 
+$F2psEnd
+rs
--- a/en_US.ISO_8859-1/articles/vm-design/Makefile
+++ b/en_US.ISO_8859-1/articles/vm-design/Makefile
@ -0,0 +1,16 @@
+# $FreeBSD: doc/en_US.ISO_8859-1/articles/mh/Makefile,v 1.8 1999/09/06 06:52:37 peter Exp $
+
+DOC?= article
+
+FORMATS?= html
+
+IMAGES=	fig1.eps fig2.eps fig3.eps fig4.eps
+
+INSTALL_COMPRESSED?=gz
+INSTALL_ONLY_COMPRESSED?=
+
+SRCS= article.sgml
+
+DOC_PREFIX?= ${.CURDIR}/../../..
+
+.include "${DOC_PREFIX}/share/mk/doc.project.mk"
--- a/en_US.ISO_8859-1/articles/vm-design/article.sgml
+++ b/en_US.ISO_8859-1/articles/vm-design/article.sgml
@ -0,0 +1,838 @@
+<!-- $FreeBSD: doc/en_US.ISO_8859-1/articles/mh/article.sgml,v 1.7 1999/10/10 20:20:38 jhb Exp $ -->
+<!-- FreeBSD Documentation Project -->
+
+<!DOCTYPE ARTICLE PUBLIC "-//FreeBSD//DTD DocBook V3.1-Based Extension//EN" [
+<!ENTITY % man PUBLIC "-//FreeBSD//ENTITIES DocBook Manual Page Entities//EN">
+%man;
+]>
+
+<article>
+  <artheader>
+    <title>Design elements of the FreeBSD VM system</title>
+
+    <authorgroup>
+      <author>
+	<firstname>Matthew</firstname>
+
+	<surname>Dillon</surname>
+
+	<affiliation>
+	  <address>
+	    <email>dillon@apollo.backplane.com</email>
+	  </address>
+	</affiliation>
+      </author>
+    </authorgroup>
+
+    <abstract>
+      <para>The title is really just a fancy way of saying that I am going to
+	attempt to describe the whole VM enchilada, hopefully in a way that
+	everyone can follow.  For the last year I have concentrated on a number
+	of major kernel subsystems within FreeBSD, with the VM and Swap
+	subsystems being the most interesting and NFS being &lsquo;a necessary
+	chore&rsquo;.  I rewrote only small portions of the code.  In the VM
+	arena the only major rewrite I have done is to the swap subsystem.
+	Most of my work was cleanup and maintenance, with only moderate code
+	rewriting and no major algorithmic adjustments within the VM
+	subsystem.  The bulk of the VM subsystem's theoretical base remains
+	unchanged and a lot of the credit for the modernization effort in the
+	last few years belongs to John Dyson and David Greenman.  Not being a
+	historian like Kirk I will not attempt to tag all the various features
+	with peoples names, since I will invariably get it wrong.</para>
+    </abstract>
+
+    <legalnotice>
+      <para>This article was originally published in the January 2000 issue of 
+	<ulink url="http://www.daemonnews.org/">DaemonNews</ulink>.  This
+	version of the article may include updates from Matt and other authors
+	to reflect changes in FreeBSD's VM implementation.</para>
+    </legalnotice>
+  </artheader>
+
+  <sect1>
+    <title>Introduction</title>
+
+    <para>Before moving along to the actual design let's spend a little time
+      on the necessity of maintaining and modernizing any long-living
+      codebase.  In the programming world, algorithms tend to be more
+      important than code and it is precisely due to BSD's academic roots that
+      a great deal of attention was paid to algorithm design from the
+      beginning.  More attention paid to the design generally leads to a clean
+      and flexible codebase that can be fairly easily modified, extended, or
+      replaced over time.  While BSD is considered an &lsquo;old&rsquo;
+      operating system by some people, those of us who work on it tend to view
+      it more as a &lsquo;mature&rsquo; codebase which has various components
+      modified, extended, or replaced with modern code.  It has evolved, and
+      FreeBSD is at the bleeding edge no matter how old some of the code might
+      be.  This is an important distinction to make and one that is
+      unfortunately lost to many people.  The biggest error a programmer can
+      make is to not learn from history, and this is precisely the error that
+      many other modern operating systems have made.  NT is the best example
+      of this, and the consequences have been dire.  Linux also makes this
+      mistake to some degree&mdash;enough that we BSD folk can make small
+      jokes about it every once in a while, anyway.  Linux's problem is simply
+      one of a lack of experience and history to compare ideas against, a
+      problem that is easily and rapidly being addressed by the Linux
+      community in the same way it has been addressed in the BSD
+      community&mdash;by continuous code development.  The NT folk, on the
+      other hand, repeatedly make the same mistakes solved by UNIX decades ago
+      and then spend years fixing them. Over and over again.  They have a
+      severe case of &lsquo;not designed here&rsquo; and &lsquo;we are always
+      right because our marketing department says so&rsquo;.  I have little
+      tolerance for anyone who cannot learn from history.</para>
+
+    <para>Much of the apparent complexity of the FreeBSD design, especially in
+      the VM/Swap subsystem, is a direct result of having to solve serious
+      performance issues that occur under various conditions.  These issues
+      are not due to bad algorithmic design but instead rise from
+      environmental factors.  In any direct comparison between platforms,
+      these issues become most apparent when system resources begin to get
+      stressed.  As I describe FreeBSD's VM/Swap subsystem the reader should
+      always keep two points in mind.  First, the most important aspect of
+      performance design is what is known as &ldquo;Optimizing the Critical
+      Path&rdquo;.  It is often the case that performance optimizations add a
+      little bloat to the code in order to make the critical path perform
+      better.  Second, a solid, generalized design outperforms a
+      heavily-optimized design over the long run.  While a generalized design
+      may end up being slower than an heavily-optimized design when they are
+      first implemented, the generalized design tends to be easier to adapt to
+      changing conditions and the heavily-optimized design winds up having to
+      be thrown away.  Any codebase that will survive and be maintainable for
+      years must therefore be designed properly from the beginning even if it
+      costs some performance.  Twenty years ago people were still arguing that
+      programming in assembly was better than programming in a high-level
+      language because it produced code that was ten times as fast.  Today,
+      the fallibility of that argument is obvious&mdash;as are the parallels
+      to algorithmic design and code generalization.</para>
+  </sect1>
+
+  <sect1>
+    <title>VM Objects</title>
+
+    <para>The best way to begin describing the FreeBSD VM system is to look at
+      it from the perspective of a user-level process.  Each user process sees
+      a single, private, contiguous VM address space containing several types
+      of memory objects.  These objects have various characteristics.  Program
+      code and program data are effectively a single memory-mapped file (the
+      binary file being run), but program code is read-only while program data
+      is copy-on-write.  Program BSS is just memory allocated and filled with
+      zeros on demand, called demand zero page fill.  Arbitrary files can be
+      memory-mapped into the address space as well, which is how the shared
+      library mechanism works.  Such mappings can require modifications to
+      remain private to the process making them.  The fork system call adds an
+      entirely new dimension to the VM management problem on top of the
+      complexity already given.</para>
+
+    <para>A program binary data page (which is a basic copy-on-write page)
+      illustrates the complexity.  A program binary contains a preinitialized
+      data section which is initially mapped directly from the program file.
+      When a program is loaded into a process's VM space, this area is
+      initially memory-mapped and backed by the program binary itself,
+      allowing the VM system to free/reuse the page and later load it back in
+      from the binary.  The moment a process modifies this data, however, the
+      VM system must make a private copy of the page for that process.  Since
+      the private copy has been modified, the VM system may no longer free it,
+      because there is no longer any way to restore it later on.</para>
+
+    <para>You will notice immediately that what was originally a simple file
+      mapping has become much more complex.  Data may be modified on a
+      page-by-page basis whereas the file mapping encompasses many pages at
+      once.  The complexity further increases when a process forks.  When a
+      process forks, the result is two processes&mdash;each with their own
+      private address spaces, including any modifications made by the original
+      process prior to the call to <function>fork()</function>.  It would be
+      silly for the VM system to make a complete copy of the data at the time
+      of the <function>fork()</function> because it is quite possible that at
+      least one of the two processes will only need to read from that page
+      from then on, allowing the original page to continue to be used.  What
+      was a private page is made copy-on-write again, since each process
+      (parent and child) expects their own personal post-fork modifications to
+      remain private to themselves and not effect the other.</para>
+
+    <para>FreeBSD manages all of this with a layered VM Object model.  The
+      original binary program file winds up being the lowest VM Object layer.
+      A copy-on-write layer is pushed on top of that to hold those pages which
+      had to be copied from the original file.  If the program modifies a data
+      page belonging to the original file the VM system takes a fault and
+      makes a copy of the page in the higher layer.  When a process forks,
+      additional VM Object layers are pushed on.  This might make a little
+      more sense with a fairly basic example.  A <function>fork()</function>
+      is a common operation for any *BSD system, so this example will consider
+      a program that starts up, and forks.  When the process starts, the VM
+      system creates an object layer, let's call this A:</para>
+
+    <mediaobject>
+      <imageobject>
+        <imagedata fileref="fig1">
+      </imageobject>
+	
+      <textobject>
+	<literallayout>+---------------+
+|       A       |
+---------------+</literallayout>
+      </textobject>
+
+      <textobject>
+	<phrase>A picture</phrase>
+      </textobject>
+    </mediaobject>
+
+    <para>A represents the file&mdash;pages may be paged in and out of the
+      file's physical media as necessary.  Paging in from the disk is
+      reasonable for a program, but we really don't want to page back out and
+      overwrite the executable.  The VM system therefore creates a second
+      layer, B, that will be physically backed by swap space:</para>
+
+    <mediaobject>
+      <imageobject>
+        <imagedata fileref="fig2">
+      </imageobject>
+
+      <textobject>
+	<literallayout>+---------------+
+|       B       |	  
+---------------+
+|       A       |
+---------------+</literallayout>
+      </textobject>
+    </mediaobject>
+
+    <para>On the first write to a page after this, a new page is created in B,
+      and its contents are initialized from A.  All pages in B can be paged in
+      or out to a swap device.  When the program forks, the VM system creates
+      two new object layers&mdash;C1 for the parent, and C2 for the
+      child&mdash;that rest on top of B:</para>
+
+    <mediaobject>
+      <imageobject>
+        <imagedata fileref="fig3">
+      </imageobject>
+      
+      <textobject>
+	<literallayout>+-------+-------+
+|   C1  |   C2  |
+-------+-------+
+|       B       |
+---------------+
+|       A       |
+---------------+</literallayout>
+      </textobject>
+    </mediaobject>
+
+    <para>In this case, let's say a page in B is modified by the original
+      parent process.  The process will take a copy-on-write fault and
+      duplicate the page in C1, leaving the original page in B untouched.
+      Now, let's say the same page in B is modified by the child process.  The
+      process will take a copy-on-write fault and duplicate the page in C2.
+      The original page in B is now completely hidden since both C1 and C2
+      have a copy and B could theoretically be destroyed if it does not
+      represent a 'real' file).  However, this sort of optimization is not
+      trivial to make because it is so fine-grained.  FreeBSD does not make
+      this optimization.  Now, suppose (as is often the case) that the child
+      process does an <function>exec()</function>.  Its current address space
+      is usually replaced by a new address space representing a new file.  In
+      this case, the C2 layer is destroyed:</para>
+
+    <mediaobject>
+      <imageobject>
+        <imagedata fileref="fig4">
+      </imageobject>
+
+      <textobject>
+	<literallayout>+-------+
+|   C1  |
+-------+-------+
+|       B       |
+---------------+
+|       A       |
+---------------+</literallayout>
+      </textobject>
+    </mediaobject>
+
+    <para>In this case, the number of children of B drops to one, and all
+      accesses to B now go through C1.  This means that B and C1 can be
+      collapsed together.  Any pages in B that also exist in C1 are deleted
+      from B during the collapse.  Thus, even though the optimization in the
+      previous step could not be made, we can recover the dead pages when
+      either of the processes exit or <function>exec()</function>.</para>
+
+    <para>This model creates a number of potential problems.  The first is that
+      you can wind up with a relatively deep stack of layered VM Objects which
+      can cost scanning time and memory when you when you take a fault.  Deep
+      layering can occur when processes fork and then fork again (either
+      parent or child).  The second problem is that you can wind up with dead,
+      inaccessible pages deep in the stack of VM Objects.  In our last example
+      if both the parent and child processes modify the same page, they both
+      get their own private copies of the page and the original page in B is
+      no longer accessible by anyone.  That page in B can be freed.</para>
+
+    <para>FreeBSD solves the deep layering problem with a special optimization
+      called the &ldquo;All Shadowed Case&rdquo;.  This case occurs if either
+      C1 or C2 take sufficient COW faults to completely shadow all pages in B.
+      Lets say that C1 achieves this.  C1 can now bypass B entirely, so rather
+      then have C1->B->A and C2->B->A we now have C1->A and C2->B->A.  But
+      look what also happened&mdash;now B has only one reference (C2), so we
+      can collapse B and C2 together.  The end result is that B is deleted
+      entirely and we have C1->A and C2->A.  It is often the case that B will
+      contain a large number of pages and neither C1 nor C2 will be able to
+      completely overshadow it.  If we fork again and create a set of D
+      layers, however, it is much more likely that one of the D layers will
+      eventually be able to completely overshadow the much smaller dataset
+      reprsented by C1 or C2.  The same optimization will work at any point in
+      the graph and the grand result of this is that even on a heavily forked
+      machine VM Object stacks tend to not get much deeper then 4.  This is
+      true of both the parent and the children and true whether the parent is
+      doing the forking or whether the children cascade forks.</para>
+
+    <para>The dead page problem still exists in the case where C1 or C2 do not
+      completely overshadow B.  Due to our other optimizations this case does
+      not represent much of a problem and we simply allow the pages to be
+      dead.  If the system runs low on memory it will swap them out, eating a
+      little swap, but that's it.</para>
+
+    <para>The advantage to the VM Object model is that
+      <function>fork()</function> is extremely fast, since no real data
+      copying need take place.  The disadvantage is that you can build a
+      relatively complex VM Object layering that slows page fault handling
+      down a little, and you spend memory managing the VM Object structures.
+      The optimizations FreeBSD makes proves to reduce the problems enough
+      that they can be ignored, leaving no real disadvantage.</para>
+  </sect1>
+
+  <sect1>
+    <title>SWAP Layers</title>
+
+    <para>Private data pages are initially either copy-on-write or zero-fill
+      pages.  When a change, and therefore a copy, is made, the original
+      backing object (usually a file) can no longer be used to save a copy of
+      the page when the VM system needs to reuse it for other purposes.  This
+      is where SWAP comes in.  SWAP is allocated to create backing store for
+      memory that does not otherwise have it.  FreeBSD allocates the swap
+      management structure for a VM Object only when it is actually needed.
+      However, the swap management structure has had problems
+      historically.</para>
+
+    <para>Under FreeBSD 3.x the swap management structure preallocates an
+      array that encompasses the entire object requiring swap backing
+      store&mdash;even if only a few pages of that object are swap-backed.
+      This creates a kernel memory fragmentation problem when large objects
+      are mapped, or processes with large runsizes (RSS) fork.  Also, in order
+      to keep track of swap space, a &lsquo;list of holes&rsquo; is kept in
+      kernel memory, and this tends to get severely fragmented as well.  Since
+      the 'list of holes' is a linear list, the swap allocation and freeing
+      performance is a non-optimal O(n)-per-page.  It also requires kernel
+      memory allocations to take place during the swap freeing process, and
+      that creates low memory deadlock problems.  The problem is further
+      exacerbated by holes created due to the interleaving algorithm.  Also,
+      the swap block map can become fragmented fairly easily resulting in
+      non-contiguous allocations. Kernel memory must also be allocated on the
+      fly for additional swap management structures when a swapout occurs.  It
+      is evident that there was plenty of room for improvement.</para>
+
+    <para>For FreeBSD 4.x, I completely rewrote the swap subsystem.  With this
+      rewrite, swap management structures are allocated through a hash table
+      rather than a linear array giving them a fixed allocation size and much
+      finer granularity.  Rather then using a linearly linked list to keep
+      track of swap space reservations, it now uses a bitmap of swap blocks
+      arranged in a radix tree structure with free-space hinting in the radix
+      node structures.  This effectively makes swap allocation and freeing an
+      O(1) operation.  The entire radix tree bitmap is also preallocated in
+      order to avoid having to allocate kernel memory during critical low
+      memory swapping operations.  After all, the system tends to swap when it
+      is low on memory so we should avoid allocating kernel memory at such
+      times in order to avoid potential deadlocks.  Finally, to reduce
+      fragmentation the radix tree is capable of allocating large contiguous
+      chunks at once, skipping over smaller fragmented chunks.  I did not take
+      the final step of having an 'allocating hint pointer' that would trundle
+      through a portion of swap as allocations were made in order to further
+      guarantee contiguous allocations or at least locality of reference, but
+      I ensured that such an addition could be made.</para>
+  </sect1>
+
+  <sect1>
+    <title>When to free a page</title>
+
+    <para>Since the VM system uses all available memory for disk caching,
+      there are usually very few truly-free pages.  The VM system depends on
+      being able to properly choose pages which are not in use to reuse for
+      new allocations.  Selecting the optimal pages to free is possibly the
+      single-most important function any VM system can perform because if it
+      makes a poor selection, the VM system may be forced to unnecessarily
+      retrieve pages from disk, seriously degrading system performance.</para>
+
+    <para>How much overhead are we willing to suffer in the critical path to
+      avoid freeing the wrong page?  Each wrong choice we make will cost us
+      hundreds of thousands of CPU cycles and a noticeable stall of the
+      affected processes, so we are willing to endure a significant amount of
+      overhead in order to be sure that the right page is chosen.  This is why
+      FreeBSD tends to outperform other systems when memory resources become
+      stressed.</para>
+
+    <para>The free page determination algorithm is built upon a history of the
+      use of memory pages.  To acquire this history, the system takes advantage
+      of a page-used bit feature that most hardware page tables have.</para>
+
+    <para>In any case, the page-used bit is cleared and at some later point
+      the VM system comes across the page again and sees that the page-used
+      bit has been set.  This indicates that the page is still being actively
+      used.  If the bit is still clear it is an indication that the page is not
+      being actively used.  By testing this bit periodically, a use history (in
+      the form of a counter) for the physical page is developed.  When the VM
+      system later needs to free up some pages, checking this history becomes
+      the cornerstone of determining the best candidate page to reuse.</para>
+
+    <sidebar>
+      <title>What if the hardware has no page-used bit?</title>
+
+      <para>For those platforms that do not have this feature, the system
+	actually emulates a page-used bit.  It unmaps or protects a page,
+	forcing a page fault if the page is accessed again.  When the page
+	fault is taken, the system simply marks the page as having been used
+	and unprotects the page so that it may be used.  While taking such page
+	faults just to determine if a page is being used appears to be an
+	expensive proposition, it is much less expensive than reusing the page
+	for some other purpose only to find that a process needs it back and
+	then have to go to disk.</para>
+    </sidebar>
+
+    <para>FreeBSD makes use of several page queues to further refine the
+      selection of pages to reuse as well as to determine when dirty pages
+      must be flushed to their backing store.  Since page tables are dynamic
+      entities under FreeBSD, it costs virtually nothing to unmap a page from
+      the address space of any processes using it.  When a page candidate has
+      been chosen based on the page-use counter, this is precisely what is
+      done.  The system must make a distinction between clean pages which can
+      theoretically be freed up at any time, and dirty pages which must first
+      be written to their backing store before being reusable.  When a page
+      candidate has been found it is moved to the inactive queue if it is
+      dirty, or the cache queue if it is clean.  A separate algorithm based on
+      the dirty-to-clean page ratio determines when dirty pages in the
+      inactive queue must be flushed to disk.  Once this is accomplished, the
+      flushed pages are moved from the inactive queue to the cache queue.  At
+      this point, pages in the cache queue can still be reactivated by a VM
+      fault at relatively low cost.  However, pages in the cache queue are
+      considered to be &lsquo;immediately freeable&rsquo; and will be reused
+      in an LRU (least-recently used) fashion when the system needs to
+      allocate new memory.</para>
+
+    <para>It is important to note that the FreeBSD VM system attempts to
+      separate clean and dirty pages for the express reason of avoiding
+      unnecessary flushes of dirty pages (which eats I/O bandwidth), nor does
+      it move pages between the various page queues gratuitously when the
+      memory subsystem is not being stressed.  This is why you will see some
+      systems with very low cache queue counts and high active queue counts
+      when doing a <command>systat -vm</command> command.  As the VM system
+      becomes more stressed, it makes a greater effort to maintain the various
+      page queues at the levels determined to be the most effective.  An urban
+      myth has circulated for years that Linux did a better job avoiding
+      swapouts than FreeBSD, but this in fact is not true.  What was actually
+      occurring was that FreeBSD was proactively paging out unused pages in
+      order to make room for more disk cache while Linux was keeping unused
+      pages in core and leaving less memory available for cache and process
+      pages.  I don't know whether this is still true today.</para>
+  </sect1>
+
+  <sect1>
+    <title>Pre-Faulting and Zeroing Optimizations</title>
+
+    <para>Taking a VM fault is not expensive if the underlying page is already
+      in core and can simply be mapped into the process, but it can become
+      expensive if you take a whole lot of them on a regular basis.  A good
+      example of this is running a program such as &man.ls.1; or &man.ps.1;
+      over and over again.  If the program binary is mapped into memory but
+      not mapped into the page table, then all the pages that will be accessed
+      by the program will have to be faulted in every time the program is run.
+      This is unnecessary when the pages in question are already in the VM
+      Cache, so FreeBSD will attempt to pre-populate a process's page tables
+      with those pages that are already in the VM Cache.  One thing that
+      FreeBSD does not yet do is pre-copy-on-write certain pages on exec.  For
+      example, if you run the &man.ls.1; program while running <command>vmstat
+	1</command> you will notice that it always takes a certain number of
+      page faults, even when you run it over and over again.  These are
+      zero-fill faults, not program code faults (which were pre-faulted in
+      already).  Pre-copying pages on exec or fork is an area that could use
+      more study.</para>
+
+    <para>A large percentage of page faults that occur are zero-fill faults.
+      You can usually see this by observing the <command>vmstat -s</command>
+      output.  These occur when a process accesses pages in its BSS area.  The
+      BSS area is expected to be initially zero but the VM system does not
+      bother to allocate any memory at all until the process actually accesses
+      it.  When a fault occurs the VM system must not only allocate a new page,
+      it must zero it as well.  To optimize the zeroing operation the VM system
+      has the ability to pre-zero pages and mark them as such, and to request
+      pre-zeroed pages when zero-fill faults occur.  The pre-zeroing occurs
+      whenever the CPU is idle but the number of pages the system pre-zeros is
+      limited in order to avoid blowing away the memory caches.  This is an
+      excellent example of adding complexity to the VM system in order to
+      optimize the critical path.</para>
+  </sect1>
+
+  <sect1>
+    <title>Page Table Optimizations</title>
+
+    <para>The page table optimizations make up the most contentious part of
+      the FreeBSD VM design and they have shown some strain with the advent of
+      serious use of <function>mmap()</function>.  I think this is actually a
+      feature of most BSDs though I am not sure when it was first introduced.
+      There are two major optimizations.  The first is that hardware page
+      tables do not contain persistent state but instead can be thrown away at
+      any time with only a minor amount of management overhead.  The second is
+      that every active page table entry in the system has a governing
+      <literal>pv_entry</literal> structure which is tied into the
+      <literal>vm_page</literal> structure.  FreeBSD can simply iterate
+      through those mappings that are known to exist while Linux must check
+      all page tables that <emphasis>might</emphasis> contain a specific
+      mapping to see if it does, which can achieve O(n^2) overhead in certain
+      situations.  It is because of this that FreeBSD tends to make better
+      choices on which pages to reuse or swap when memory is stressed, giving
+      it better performance under load. However, FreeBSD requires kernel
+      tuning to accommodate large-shared-address-space situations such as
+      those that can occur in a news system because it may run out of
+      <literal>pv_entry</literal> structures.</para>
+
+    <para>Both Linux and FreeBSD need work in this area.  FreeBSD is trying to
+      maximize the advantage of a potentially sparse active-mapping model (not
+      all processes need to map all pages of a shared library, for example),
+      whereas Linux is trying to simplify its algorithms.  FreeBSD generally
+      has the performance advantage here at the cost of wasting a little extra
+      memory, but FreeBSD breaks down in the case where a large file is
+      massively shared across hundreds of processes.  Linux, on the other hand,
+      breaks down in the case where many processes are sparsely-mapping the
+      same shared library and also runs non-optimally when trying to determine
+      whether a page can be reused or not.</para>
+  </sect1>
+
+  <sect1>
+    <title>Page Coloring</title>
+
+    <para>We'll end with the page coloring optimizations.  Page coloring is a
+      performance optimization designed to ensure that accesses to contiguous
+      pages in virtual memory make the best use of the processor cache.  In
+      ancient times (i.e. 10+ years ago) processor caches tended to map
+      virtual memory rather than physical memory.  This led to a huge number of
+      problems including having to clear the cache on every context switch in
+      some cases, and problems with data aliasing in the cache.  Modern
+      processor caches map physical memory precisely to solve those problems.
+      This means that two side-by-side pages in a processes address space may
+      not correspond to two side-by-side pages in the cache.  In fact, if you
+      aren't careful side-by-side pages in virtual memory could wind up using
+      the same page in the processor cache&mdash;leading to cacheable data
+      being thrown away prematurely and reducing CPU performance.  This is true
+      even with multi-way set-associative caches (though the effect is
+      mitigated somewhat).</para>
+
+    <para>FreeBSD's memory allocation code implements page coloring
+      optimizations, which means that the memory allocation code will attempt
+      to locate free pages that are contiguous from the point of view of the
+      cache.  For example, if page 16 of physical memory is assigned to page 0
+      of a process's virtual memory and the cache can hold 4 pages, the page
+      coloring code will not assign page 20 of physical memory to page 1 of a
+      process's virtual memory.  It would, instead, assign page 21 of physical
+      memory.  The page coloring code attempts to avoid assigning page 20
+      because this maps over the same cache memory as page 16 and would result
+      in non-optimal caching.  This code adds a significant amount of
+      complexity to the VM memory allocation subsystem as you can well
+      imagine, but the result is well worth the effort.  Page Coloring makes VM
+      memory as deterministic as physical memory in regards to cache
+      performance.</para>
+  </sect1>
+
+  <sect1>
+    <title>Conclusion</title>
+
+    <para>Virtual memory in modern operating systems must address a number of
+      different issues efficiently and for many different usage patterns.  The
+      modular and algorithmic approach that BSD has historically taken allows
+      us to study and understand the current implementation as well as
+      relatively cleanly replace large sections of the code.  There have been a
+      number of improvements to the FreeBSD VM system in the last several
+      years, and work is ongoing.</para>
+  </sect1>
+
+  <sect1>
+    <title>Bonus QA session by Allen Briggs
+      <email>briggs@ninthwonder.com</email></title>
+
+    <qandaset>
+      <qandaentry>
+	<question>
+	  <para>What is &ldquo;the interleaving algorithm&rdquo; that you
+	    refer to in your listing of the ills of the FreeBSD 3.x swap
+	    arrangments?</para>
+	</question>
+
+	<answer>
+	  <para>FreeBSD uses a fixed swap interleave which defaults to 4.  This
+	    means that FreeBSD reserves space for four swap areas even if you
+	    only have one, two, or three.  Since swap is interleaved the linear
+	    address space representing the &lsquo;four swap areas&rsquo; will be
+	    fragmented if you don't actually have four swap areas.  For
+	    example, if you have two swap areas A and B FreeBSD's address
+	    space representation for that swap area will be interleaved in
+	    blocks of 16 pages:</para>
+
+	  <literallayout>A B C D A B C D A B C D A B C D</literallayout>
+
+	  <para>FreeBSD 3.x uses a &lsquo;sequential list of free
+	    regions&rsquo; approach to accounting for the free swap areas.
+	    The idea is that large blocks of free linear space can be
+	    represented with a single list node
+	    (<filename>kern/subr_rlist.c</filename>).  But due to the
+	    fragmentation the sequential list winds up being insanely
+	    fragmented.  In the above example, completely unused swap will
+	    have A and B shown as &lsquo;free&rsquo; and C and D shown as
+	    &lsquo;all allocated&rsquo;.  Each A-B sequence requires a list
+	    node to account for because C and D are holes, so the list node
+	    cannot be combined with the next A-B sequence.</para>
+
+	  <para>Why do we interleave our swap space instead of just tack swap
+	    areas onto the end and do something fancier?  Because it's a whole
+	    lot easier to allocate linear swaths of an address space and have
+	    the result automatically be interleaved across multiple disks than
+	    it is to try to put that sophistication elsewhere.</para>
+
+	  <para>The fragmentation causes other problems.  Being a linear list
+	    under 3.x, and having such a huge amount of inherent
+	    fragmentation, allocating and freeing swap winds up being an O(N)
+	    algorithm instead of an O(1) algorithm.  Combined with other
+	    factors (heavy swapping) and you start getting into O(N^2) and
+	    O(N^3) levels of overhead, which is bad.  The 3.x system may also
+	    need to allocate KVM during a swap operation to create a new list
+	    node which can lead to a deadlock if the system is trying to
+	    pageout pages in a low-memory situation.</para>
+
+	  <para>Under 4.x we do not use a sequential list.  Instead we use a
+	    radix tree and bitmaps of swap blocks rather than ranged list
+	    nodes.  We take the hit of preallocating all the bitmaps required
+	    for the entire swap area up front but it winds up wasting less
+	    memory due to the use of a bitmap (one bit per block) instead of a
+	    linked list of nodes.  The use of a radix tree instead of a
+	    sequential list gives us nearly O(1) performance no matter how
+	    fragmented the tree becomes.</para>
+	</answer>
+      </qandaentry>
+
+      <qandaentry>
+	<question>
+	  <para>I don't get the following:</para>
+
+	  <blockquote>
+	    <para>It is important to note that the FreeBSD VM system attempts
+	      to separate clean and dirty pages for the express reason of
+	      avoiding unnecessary flushes of dirty pages (which eats I/O
+	      bandwidth), nor does it move pages between the various page
+	      queues gratitously when the memory subsystem is not being
+	      stressed.  This is why you will see some systems with very low
+	      cache queue counts and high active queue counts when doing a
+	      <command>systat -vm</command> command.</para>
+	  </blockquote>
+	  
+	  <para>How is the separation of clean and dirty (inactive) pages
+	    related to the situation where you see low cache queue counts and
+	    high active queue counts in <command>systat -vm</command>?  Do the
+	    systat stats roll the active and dirty pages together for the
+	    active queue count?</para>
+	</question>
+
+	<answer>
+	  <para>Yes, that is confusing.  The relationship is
+	    &ldquo;goal&rdquo; verses &ldquo;reality&rdquo;.  Our goal is to
+	    separate the pages but the reality is that if we are not in a
+	    memory crunch, we don't really have to.</para>
+
+	  <para>What this means is that FreeBSD will not try very hard to
+	    separate out dirty pages (inactive queue) from clean pages (cache
+	    queue) when the system is not being stressed, nor will it try to
+	    deactivate pages (active queue -> inactive queue) when the system
+	    is not being stressed, even if they aren't being used.</para>
+	</answer>
+      </qandaentry>
+
+      <qandaentry>
+	<question>
+	  <para> In the &man.ls.1; / <command>vmstat 1</command> example,
+	    wouldn't some of the page faults be data page faults (COW from
+	    executable file to private page)?  I.e., I would expect the page
+	    faults to be some zero-fill and some program data.  Or are you
+	    implying that FreeBSD does do pre-COW for the program data?</para>
+	</question>
+
+	<answer>
+	  <para>A COW fault can be either zero-fill or program-data.  The
+	    mechanism is the same either way because the backing program-data
+	    is almost certainly already in the cache.  I am indeed lumping the
+	    two together.  FreeBSD does not pre-COW program data or zero-fill,
+	    but it <emphasis>does</emphasis> pre-map pages that exist in its
+	    cache.</para>
+	</answer>
+      </qandaentry>
+
+      <qandaentry>
+	<question>
+	  <para>In your section on page table optimizations, can you give a
+	    little more detail about <literal>pv_entry</literal> and
+	    <literal>vm_page</literal> (or should vm_page be
+	    <literal>vm_pmap</literal>&mdash;as in 4.4, cf. pp. 180-181 of
+	    McKusick, Bostic, Karel, Quarterman)?  Specifically, what kind of
+	    operation/reaction would require scanning the mappings?</para>
+
+	  <para>How does Linux do in the case where FreeBSD breaks down
+	    (sharing a large file mapping over many processes)?</para>
+	</question>
+
+	<answer>
+	  <para>A <literal>vm_page</literal> represents an (object,index#)
+	    tuple.  A <literal>pv_entry</literal> represents a hardware page
+	    table entry (pte).  If you have five processes sharing the same
+	    physical page, and three of those processes's page tables actually
+	    map the page, that page will be represented by a single
+	    <literal>vm_page</literal> structure and three
+	    <literal>pv_entry</literal> structures.</para>
+
+	  <para><literal>pv_entry</literal> structures only represent pages
+	    mapped by the MMU (one <literal>pv_entry</literal> represnts one
+	    pte).  This means that when we need to remove all hardware
+	    references to a <literal>vm_page</literal> (in order to reuse the
+	    page for something else, page it out, clear it, dirty it, and so
+	    forth) we can simply scan the linked list of
+	    <literal>pv_entry</literal>'s associated with that
+	    <literal>vm_page</literal> to remove or modify the pte's from
+	    their page tables.</para>
+
+	  <para>Under Linux there is no such linked list.  In order to remove
+	    all the hardware page table mappings for a
+	    <literal>vm_page</literal> linux must index into every VM object
+	    that <emphasis>might</emphasis> have mapped the page.  For
+	    example, if you have 50 processes all mapping the same shared
+	    library and want to get rid of page X in that library, you need to
+	    index into the page table for each of those 50 processes even if
+	    only 10 of them have actually mapped the page.  So Linux is
+	    trading off the simplicity of its design against performance.
+	    Many VM algorithms which are O(1) or (small N) under FreeBSD wind
+	    up being O(N), O(N^2), or worse under Linux.  Since the pte's
+	    representing a particular page in an object tend to be at the same
+	    offset in all the page tables they are mapped in, reducing the
+	    number of accesses into the page tables at the same pte offset
+	    will often avoid blowing away the L1 cache line for that offset,
+	    which can lead to better performance.</para>
+
+	  <para>FreeBSD has added complexity (the <literal>pv_entry</literal>
+	    scheme) in order to increase performance (to limit page table
+	    accesses to <emphasis>only</emphasis> those pte's that need to be
+	    modified).</para>
+
+	  <para>But FreeBSD has a scaling problem that Linux does not in that
+	    there are a limited number of <literal>pv_entry</literal>
+	    structures and this causes problems when you have massive sharing
+	    of data.  In this case you may run out of
+	    <literal>pv_entry</literal> structures even though there is plenty
+	    of free memory available.  This can be fixed easily enough by
+	    bumping up the number of <literal>pv_entry</literal> structures in
+	    the kernel config, but we really need to find a better way to do
+	    it.</para>
+
+	  <para>In regards to the memory overhead of a page table verses the
+	    <literal>pv_entry</literal> scheme: Linux uses
+	    &lsquo;permanent&rsquo; page tables that are not throw away, but
+	    does not need a <literal>pv_entry</literal> for each potentially
+	    mapped pte.  FreeBSD uses &lsquo;throw away&rsquo; page tables but
+	    adds in a <literal>pv_entry</literal> structure for each
+	    actually-mapped pte.  I think memory utilization winds up being
+	    about the same, giving FreeBSD an algorithmic advantage with its
+	    ability to throw away page tables at will with very low
+	    overhead.</para>
+	</answer>
+      </qandaentry>
+
+      <qandaentry>
+	<question>
+	  <para>Finally, in the page coloring section, it might help to have a
+	    little more description of what you mean here.  I didn't quite
+	    follow it.</para>
+	</question>
+
+	<answer>
+	  <para>Do you know how an L1 hardware memory cache works?  I'll
+	    explain: Consider a machine with 16MB of main memory but only 128K
+	    of L1 cache.  Generally the way this cache works is that each 128K
+	    block of main memory uses the <emphasis>same</emphasis> 128K of
+	    cache.  If you access offset 0 in main memory and then offset
+	    offset 128K in main memory you can wind up throwing away the
+	    cached data you read from offset 0!</para>
+
+	  <para>Now, I am simplifying things greatly.  What I just described
+	    is what is called a &lsquo;direct mapped&rsquo; hardware memory
+	    cache.  Most modern caches are what are called
+	    2-way-set-associative or 4-way-set-associative caches.  The
+	    set-associatively allows you to access up to N different memory
+	    regions that overlap the same cache memory without destroying the
+	    previously cached data.  But only N.</para>
+
+	  <para>So if I have a 4-way set associative cache I can access offset
+	    0, offset 128K, 256K and offset 384K and still be able to access
+	    offset 0 again and have it come from the L1 cache.  If I then
+	    access offset 512K, however, one of the four previously cached
+	    data objects will be thrown away by the cache.</para>
+
+	  <para>It is extremely important&hellip;
+	    <emphasis>extremely</emphasis> important for most of a processor's
+	    memory accesses to be able to come from the L1 cache, because the
+	    L1 cache operates at the processor frequency.  The moment you have
+	    an L1 cahe miss and have to go to the L2 cache or to main memory,
+	    the processor will stall and potentially sit twidling its fingers
+	    for <emphasis>hundreds</emphasis> of instructions worth of time
+	    waiting for a read from main memory to complete.  Main memory (the
+	    dynamic ram you stuff into a computer) is
+	    <emphasis>slow</emphasis>, when compared to the speed of a modern
+	    processor core.</para>
+
+	  <para>Ok, so now onto page coloring: All modern memory caches are
+	    what are known as <emphasis>physical</emphasis> caches.  They
+	    cache physical memory addresses, not virtual memory addresses.
+	    This allows the cache to be left alone across a process context
+	    switch, which is very important.</para>
+
+	  <para>But in the UNIX world you are dealing with virtual address
+	    spaces, not physical address spaces.  Any program you write will
+	    see the virtual address space given to it.  The actual
+	    <emphasis>physical</emphasis> pages underlying that virtual
+	    address space are not necessarily physically contiguous! In fact,
+	    you might have two pages that are side by side in a processes
+	    address space which wind up being at offset 0 and offset 128K in
+	    <emphasis>physical</emphasis> memory.</para>
+
+	  <para>A program normally assumes that two side-by-side pages will be
+	    optimally cached.  That is, that you can access data objects in
+	    both pages without having them blow away each other's cache entry.
+	    But this is only true if the physical pages underlying the virtual
+	    address space are contiguous (insofar as the cache is
+	    concerned).</para>
+
+	  <para>This is what Page coloring does.  Instead of assigning
+	    <emphasis>random</emphasis> physical pages to virtual addresses,
+	    which may result in non-optimal cache performance , Page coloring
+	    assigns <emphasis>reasonably-contiguous</emphasis> physical pages
+	    to virtual addresses.  Thus programs can be written under the
+	    assumption that the characteristics of the underlying hardware
+	    cache are the same for their virtual address space as they would
+	    be if the program had been run directly in a physical address
+	    space.</para>
+
+	  <para>Note that I say &lsquo;reasonably&rsquo; contiguous rather
+	    than simply &lsquo;contiguous&rsquo;.  From the point of view of a
+	    128K direct mapped cache, the physical address 0 is the same as
+	    the physical address 128K.  So two side-by-side pages in your
+	    virtual address space may wind up being offset 128K and offset
+	    132K in physical memory, but could also easily be offset 128K and
+	    offset 4K in physical memory and still retain the same cache
+	    performance characteristics.  So page-coloring does
+	    <emphasis>not</emphasis> have to assign truly contiguous pages of
+	    physical memory to contiguous pages of virtual memory, it just
+	    needs to make sure it assigns contiguous pages from the point of
+	    view of cache performance and operation.</para>
+	</answer>
+      </qandaentry>
+    </qandaset>
+  </sect1>
+</article>
--- a/en_US.ISO_8859-1/articles/vm-design/fig1.eps
+++ b/en_US.ISO_8859-1/articles/vm-design/fig1.eps
@ -0,0 +1,104 @@
+%!PS-Adobe-2.0 EPSF-2.0
+%%Title: fig1.eps
+%%Creator: fig2dev Version 3.2.3 Patchlevel 
+%%CreationDate: Sun Oct  8 19:54:25 2000
+%%For: nik@canyon.nothing-going-on.org (Nik Clayton)
+%%BoundingBox: 0 0 119 65
+%%Magnification: 1.0000
+%%EndComments
+/$F2psDict 200 dict def
+$F2psDict begin
+$F2psDict /mtrx matrix put
+/col-1 {0 setgray} bind def
+/col0 {0.000 0.000 0.000 srgb} bind def
+/col1 {0.000 0.000 1.000 srgb} bind def
+/col2 {0.000 1.000 0.000 srgb} bind def
+/col3 {0.000 1.000 1.000 srgb} bind def
+/col4 {1.000 0.000 0.000 srgb} bind def
+/col5 {1.000 0.000 1.000 srgb} bind def
+/col6 {1.000 1.000 0.000 srgb} bind def
+/col7 {1.000 1.000 1.000 srgb} bind def
+/col8 {0.000 0.000 0.560 srgb} bind def
+/col9 {0.000 0.000 0.690 srgb} bind def
+/col10 {0.000 0.000 0.820 srgb} bind def
+/col11 {0.530 0.810 1.000 srgb} bind def
+/col12 {0.000 0.560 0.000 srgb} bind def
+/col13 {0.000 0.690 0.000 srgb} bind def
+/col14 {0.000 0.820 0.000 srgb} bind def
+/col15 {0.000 0.560 0.560 srgb} bind def
+/col16 {0.000 0.690 0.690 srgb} bind def
+/col17 {0.000 0.820 0.820 srgb} bind def
+/col18 {0.560 0.000 0.000 srgb} bind def
+/col19 {0.690 0.000 0.000 srgb} bind def
+/col20 {0.820 0.000 0.000 srgb} bind def
+/col21 {0.560 0.000 0.560 srgb} bind def
+/col22 {0.690 0.000 0.690 srgb} bind def
+/col23 {0.820 0.000 0.820 srgb} bind def
+/col24 {0.500 0.190 0.000 srgb} bind def
+/col25 {0.630 0.250 0.000 srgb} bind def
+/col26 {0.750 0.380 0.000 srgb} bind def
+/col27 {1.000 0.500 0.500 srgb} bind def
+/col28 {1.000 0.630 0.630 srgb} bind def
+/col29 {1.000 0.750 0.750 srgb} bind def
+/col30 {1.000 0.880 0.880 srgb} bind def
+/col31 {1.000 0.840 0.000 srgb} bind def
+
+end
+save
+newpath 0 65 moveto 0 0 lineto 119 0 lineto 119 65 lineto closepath clip newpath
+-143.0 298.0 translate
+1 -1 scale
+
+/cp {closepath} bind def
+/ef {eofill} bind def
+/gr {grestore} bind def
+/gs {gsave} bind def
+/sa {save} bind def
+/rs {restore} bind def
+/l {lineto} bind def
+/m {moveto} bind def
+/rm {rmoveto} bind def
+/n {newpath} bind def
+/s {stroke} bind def
+/sh {show} bind def
+/slc {setlinecap} bind def
+/slj {setlinejoin} bind def
+/slw {setlinewidth} bind def
+/srgb {setrgbcolor} bind def
+/rot {rotate} bind def
+/sc {scale} bind def
+/sd {setdash} bind def
+/ff {findfont} bind def
+/sf {setfont} bind def
+/scf {scalefont} bind def
+/sw {stringwidth} bind def
+/tr {translate} bind def
+/tnt {dup dup currentrgbcolor
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb}
+  bind def
+/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul
+  4 -2 roll mul srgb} bind def
+/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def
+/$F2psEnd {$F2psEnteredState restore end} def
+
+$F2psBegin
+%%Page: 1 1
+10 setmiterlimit
+ 0.06000 0.06000 sc
+% Polyline
+7.500 slw
+n 2400 4200 m 4050 4200 l 4050 4950 l 2400 4950 l
+ cp gs col0 s gr 
+% Polyline
+n 4050 4200 m
+ 4350 3900 l gs col0 s gr 
+% Polyline
+n 2400 4200 m 2700 3900 l 4350 3900 l 4350 4650 l
+ 4050 4950 l gs col0 s gr 
+/Helvetica-Bold ff 180.00 scf sf
+3225 4650 m
+gs 1 -1 sc (A) dup sw pop 2 div neg 0 rm  col0 sh gr
+$F2psEnd
+rs
--- a/en_US.ISO_8859-1/articles/vm-design/fig2.eps
+++ b/en_US.ISO_8859-1/articles/vm-design/fig2.eps
@ -0,0 +1,115 @@
+%!PS-Adobe-2.0 EPSF-2.0
+%%Title: fig2.eps
+%%Creator: fig2dev Version 3.2.3 Patchlevel 
+%%CreationDate: Sun Oct  8 19:55:31 2000
+%%For: nik@canyon.nothing-going-on.org (Nik Clayton)
+%%BoundingBox: 0 0 120 110
+%%Magnification: 1.0000
+%%EndComments
+/$F2psDict 200 dict def
+$F2psDict begin
+$F2psDict /mtrx matrix put
+/col-1 {0 setgray} bind def
+/col0 {0.000 0.000 0.000 srgb} bind def
+/col1 {0.000 0.000 1.000 srgb} bind def
+/col2 {0.000 1.000 0.000 srgb} bind def
+/col3 {0.000 1.000 1.000 srgb} bind def
+/col4 {1.000 0.000 0.000 srgb} bind def
+/col5 {1.000 0.000 1.000 srgb} bind def
+/col6 {1.000 1.000 0.000 srgb} bind def
+/col7 {1.000 1.000 1.000 srgb} bind def
+/col8 {0.000 0.000 0.560 srgb} bind def
+/col9 {0.000 0.000 0.690 srgb} bind def
+/col10 {0.000 0.000 0.820 srgb} bind def
+/col11 {0.530 0.810 1.000 srgb} bind def
+/col12 {0.000 0.560 0.000 srgb} bind def
+/col13 {0.000 0.690 0.000 srgb} bind def
+/col14 {0.000 0.820 0.000 srgb} bind def
+/col15 {0.000 0.560 0.560 srgb} bind def
+/col16 {0.000 0.690 0.690 srgb} bind def
+/col17 {0.000 0.820 0.820 srgb} bind def
+/col18 {0.560 0.000 0.000 srgb} bind def
+/col19 {0.690 0.000 0.000 srgb} bind def
+/col20 {0.820 0.000 0.000 srgb} bind def
+/col21 {0.560 0.000 0.560 srgb} bind def
+/col22 {0.690 0.000 0.690 srgb} bind def
+/col23 {0.820 0.000 0.820 srgb} bind def
+/col24 {0.500 0.190 0.000 srgb} bind def
+/col25 {0.630 0.250 0.000 srgb} bind def
+/col26 {0.750 0.380 0.000 srgb} bind def
+/col27 {1.000 0.500 0.500 srgb} bind def
+/col28 {1.000 0.630 0.630 srgb} bind def
+/col29 {1.000 0.750 0.750 srgb} bind def
+/col30 {1.000 0.880 0.880 srgb} bind def
+/col31 {1.000 0.840 0.000 srgb} bind def
+
+end
+save
+newpath 0 110 moveto 0 0 lineto 120 0 lineto 120 110 lineto closepath clip newpath
+-174.0 370.0 translate
+1 -1 scale
+
+/cp {closepath} bind def
+/ef {eofill} bind def
+/gr {grestore} bind def
+/gs {gsave} bind def
+/sa {save} bind def
+/rs {restore} bind def
+/l {lineto} bind def
+/m {moveto} bind def
+/rm {rmoveto} bind def
+/n {newpath} bind def
+/s {stroke} bind def
+/sh {show} bind def
+/slc {setlinecap} bind def
+/slj {setlinejoin} bind def
+/slw {setlinewidth} bind def
+/srgb {setrgbcolor} bind def
+/rot {rotate} bind def
+/sc {scale} bind def
+/sd {setdash} bind def
+/ff {findfont} bind def
+/sf {setfont} bind def
+/scf {scalefont} bind def
+/sw {stringwidth} bind def
+/tr {translate} bind def
+/tnt {dup dup currentrgbcolor
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb}
+  bind def
+/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul
+  4 -2 roll mul srgb} bind def
+/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def
+/$F2psEnd {$F2psEnteredState restore end} def
+
+$F2psBegin
+%%Page: 1 1
+10 setmiterlimit
+ 0.06000 0.06000 sc
+/Helvetica-Bold ff 180.00 scf sf
+3750 5100 m
+gs 1 -1 sc (B) dup sw pop 2 div neg 0 rm  col0 sh gr
+% Polyline
+7.500 slw
+n 4871 5100 m 4879 5100 l gs col0 s gr
+% Polyline
+n 2925 5400 m 4575 5400 l 4575 6150 l 2925 6150 l
+ cp gs col0 s gr 
+% Polyline
+n 4575 4650 m
+ 4875 4350 l gs col0 s gr 
+% Polyline
+n 2925 4650 m 4575 4650 l 4575 5400 l 2925 5400 l
+ cp gs col0 s gr 
+% Polyline
+n 2925 4650 m 3225 4350 l 4875 4350 l 4875 5100 l
+ 4575 5400 l gs col0 s gr 
+/Helvetica-Bold ff 180.00 scf sf
+3750 5850 m
+gs 1 -1 sc (A) dup sw pop 2 div neg 0 rm  col0 sh gr
+% Polyline
+n 4875 5100 m 4875 5850 l
+ 4575 6150 l gs col0 s gr 
+$F2psEnd
+rs
--- a/en_US.ISO_8859-1/articles/vm-design/fig3.eps
+++ b/en_US.ISO_8859-1/articles/vm-design/fig3.eps
@ -0,0 +1,133 @@
+%!PS-Adobe-2.0 EPSF-2.0
+%%Title: fig3.eps
+%%Creator: fig2dev Version 3.2.3 Patchlevel 
+%%CreationDate: Sun Oct  8 19:53:51 2000
+%%For: nik@canyon.nothing-going-on.org (Nik Clayton)
+%%BoundingBox: 0 0 120 155
+%%Magnification: 1.0000
+%%EndComments
+/$F2psDict 200 dict def
+$F2psDict begin
+$F2psDict /mtrx matrix put
+/col-1 {0 setgray} bind def
+/col0 {0.000 0.000 0.000 srgb} bind def
+/col1 {0.000 0.000 1.000 srgb} bind def
+/col2 {0.000 1.000 0.000 srgb} bind def
+/col3 {0.000 1.000 1.000 srgb} bind def
+/col4 {1.000 0.000 0.000 srgb} bind def
+/col5 {1.000 0.000 1.000 srgb} bind def
+/col6 {1.000 1.000 0.000 srgb} bind def
+/col7 {1.000 1.000 1.000 srgb} bind def
+/col8 {0.000 0.000 0.560 srgb} bind def
+/col9 {0.000 0.000 0.690 srgb} bind def
+/col10 {0.000 0.000 0.820 srgb} bind def
+/col11 {0.530 0.810 1.000 srgb} bind def
+/col12 {0.000 0.560 0.000 srgb} bind def
+/col13 {0.000 0.690 0.000 srgb} bind def
+/col14 {0.000 0.820 0.000 srgb} bind def
+/col15 {0.000 0.560 0.560 srgb} bind def
+/col16 {0.000 0.690 0.690 srgb} bind def
+/col17 {0.000 0.820 0.820 srgb} bind def
+/col18 {0.560 0.000 0.000 srgb} bind def
+/col19 {0.690 0.000 0.000 srgb} bind def
+/col20 {0.820 0.000 0.000 srgb} bind def
+/col21 {0.560 0.000 0.560 srgb} bind def
+/col22 {0.690 0.000 0.690 srgb} bind def
+/col23 {0.820 0.000 0.820 srgb} bind def
+/col24 {0.500 0.190 0.000 srgb} bind def
+/col25 {0.630 0.250 0.000 srgb} bind def
+/col26 {0.750 0.380 0.000 srgb} bind def
+/col27 {1.000 0.500 0.500 srgb} bind def
+/col28 {1.000 0.630 0.630 srgb} bind def
+/col29 {1.000 0.750 0.750 srgb} bind def
+/col30 {1.000 0.880 0.880 srgb} bind def
+/col31 {1.000 0.840 0.000 srgb} bind def
+
+end
+save
+newpath 0 155 moveto 0 0 lineto 120 0 lineto 120 155 lineto closepath clip newpath
+-174.0 370.0 translate
+1 -1 scale
+
+/cp {closepath} bind def
+/ef {eofill} bind def
+/gr {grestore} bind def
+/gs {gsave} bind def
+/sa {save} bind def
+/rs {restore} bind def
+/l {lineto} bind def
+/m {moveto} bind def
+/rm {rmoveto} bind def
+/n {newpath} bind def
+/s {stroke} bind def
+/sh {show} bind def
+/slc {setlinecap} bind def
+/slj {setlinejoin} bind def
+/slw {setlinewidth} bind def
+/srgb {setrgbcolor} bind def
+/rot {rotate} bind def
+/sc {scale} bind def
+/sd {setdash} bind def
+/ff {findfont} bind def
+/sf {setfont} bind def
+/scf {scalefont} bind def
+/sw {stringwidth} bind def
+/tr {translate} bind def
+/tnt {dup dup currentrgbcolor
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb}
+  bind def
+/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul
+  4 -2 roll mul srgb} bind def
+/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def
+/$F2psEnd {$F2psEnteredState restore end} def
+
+$F2psBegin
+%%Page: 1 1
+10 setmiterlimit
+ 0.06000 0.06000 sc
+/Helvetica-Bold ff 180.00 scf sf
+4125 4350 m
+gs 1 -1 sc (C2) dup sw pop 2 div neg 0 rm  col0 sh gr
+% Polyline
+7.500 slw
+n 4871 5100 m 4879 5100 l gs col0 s gr
+% Polyline
+n 2925 5400 m 4575 5400 l 4575 6150 l 2925 6150 l
+ cp gs col0 s gr 
+% Polyline
+n 4575 4650 m
+ 4875 4350 l gs col0 s gr 
+% Polyline
+n 2925 4650 m 4575 4650 l 4575 5400 l 2925 5400 l
+ cp gs col0 s gr 
+% Polyline
+n 4875 3600 m 4875 5100 l
+ 4575 5400 l gs col0 s gr 
+% Polyline
+n 2925 4650 m 2925 3900 l 3225 3600 l
+ 4875 3600 l gs col0 s gr 
+% Polyline
+n 2925 3900 m 4425 3900 l 4575 3900 l
+ 4875 3600 l gs col0 s gr 
+% Polyline
+n 4575 4650 m
+ 4575 3900 l gs col0 s gr 
+% Polyline
+n 3750 4650 m 3750 3900 l
+ 4050 3600 l gs col0 s gr 
+/Helvetica-Bold ff 180.00 scf sf
+3750 5850 m
+gs 1 -1 sc (A) dup sw pop 2 div neg 0 rm  col0 sh gr
+/Helvetica-Bold ff 180.00 scf sf
+3750 5100 m
+gs 1 -1 sc (B) dup sw pop 2 div neg 0 rm  col0 sh gr
+/Helvetica-Bold ff 180.00 scf sf
+3375 4350 m
+gs 1 -1 sc (C1) dup sw pop 2 div neg 0 rm  col0 sh gr
+% Polyline
+n 4875 5100 m 4875 5850 l
+ 4575 6150 l gs col0 s gr 
+$F2psEnd
+rs
--- a/en_US.ISO_8859-1/articles/vm-design/fig4.eps
+++ b/en_US.ISO_8859-1/articles/vm-design/fig4.eps
@ -0,0 +1,133 @@
+%!PS-Adobe-2.0 EPSF-2.0
+%%Title: fig4.eps
+%%Creator: fig2dev Version 3.2.3 Patchlevel 
+%%CreationDate: Sun Oct  8 19:55:53 2000
+%%For: nik@canyon.nothing-going-on.org (Nik Clayton)
+%%BoundingBox: 0 0 120 155
+%%Magnification: 1.0000
+%%EndComments
+/$F2psDict 200 dict def
+$F2psDict begin
+$F2psDict /mtrx matrix put
+/col-1 {0 setgray} bind def
+/col0 {0.000 0.000 0.000 srgb} bind def
+/col1 {0.000 0.000 1.000 srgb} bind def
+/col2 {0.000 1.000 0.000 srgb} bind def
+/col3 {0.000 1.000 1.000 srgb} bind def
+/col4 {1.000 0.000 0.000 srgb} bind def
+/col5 {1.000 0.000 1.000 srgb} bind def
+/col6 {1.000 1.000 0.000 srgb} bind def
+/col7 {1.000 1.000 1.000 srgb} bind def
+/col8 {0.000 0.000 0.560 srgb} bind def
+/col9 {0.000 0.000 0.690 srgb} bind def
+/col10 {0.000 0.000 0.820 srgb} bind def
+/col11 {0.530 0.810 1.000 srgb} bind def
+/col12 {0.000 0.560 0.000 srgb} bind def
+/col13 {0.000 0.690 0.000 srgb} bind def
+/col14 {0.000 0.820 0.000 srgb} bind def
+/col15 {0.000 0.560 0.560 srgb} bind def
+/col16 {0.000 0.690 0.690 srgb} bind def
+/col17 {0.000 0.820 0.820 srgb} bind def
+/col18 {0.560 0.000 0.000 srgb} bind def
+/col19 {0.690 0.000 0.000 srgb} bind def
+/col20 {0.820 0.000 0.000 srgb} bind def
+/col21 {0.560 0.000 0.560 srgb} bind def
+/col22 {0.690 0.000 0.690 srgb} bind def
+/col23 {0.820 0.000 0.820 srgb} bind def
+/col24 {0.500 0.190 0.000 srgb} bind def
+/col25 {0.630 0.250 0.000 srgb} bind def
+/col26 {0.750 0.380 0.000 srgb} bind def
+/col27 {1.000 0.500 0.500 srgb} bind def
+/col28 {1.000 0.630 0.630 srgb} bind def
+/col29 {1.000 0.750 0.750 srgb} bind def
+/col30 {1.000 0.880 0.880 srgb} bind def
+/col31 {1.000 0.840 0.000 srgb} bind def
+
+end
+save
+newpath 0 155 moveto 0 0 lineto 120 0 lineto 120 155 lineto closepath clip newpath
+-174.0 370.0 translate
+1 -1 scale
+
+/cp {closepath} bind def
+/ef {eofill} bind def
+/gr {grestore} bind def
+/gs {gsave} bind def
+/sa {save} bind def
+/rs {restore} bind def
+/l {lineto} bind def
+/m {moveto} bind def
+/rm {rmoveto} bind def
+/n {newpath} bind def
+/s {stroke} bind def
+/sh {show} bind def
+/slc {setlinecap} bind def
+/slj {setlinejoin} bind def
+/slw {setlinewidth} bind def
+/srgb {setrgbcolor} bind def
+/rot {rotate} bind def
+/sc {scale} bind def
+/sd {setdash} bind def
+/ff {findfont} bind def
+/sf {setfont} bind def
+/scf {scalefont} bind def
+/sw {stringwidth} bind def
+/tr {translate} bind def
+/tnt {dup dup currentrgbcolor
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add
+  4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb}
+  bind def
+/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul
+  4 -2 roll mul srgb} bind def
+/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def
+/$F2psEnd {$F2psEnteredState restore end} def
+
+$F2psBegin
+%%Page: 1 1
+10 setmiterlimit
+ 0.06000 0.06000 sc
+/Helvetica-Bold ff 180.00 scf sf
+3375 4350 m
+gs 1 -1 sc (C1) dup sw pop 2 div neg 0 rm  col0 sh gr
+% Polyline
+7.500 slw
+n 4871 5100 m 4879 5100 l gs col0 s gr
+% Polyline
+n 2925 5400 m 4575 5400 l 4575 6150 l 2925 6150 l
+ cp gs col0 s gr 
+% Polyline
+n 4575 4650 m
+ 4875 4350 l gs col0 s gr 
+% Polyline
+n 2925 4650 m 4575 4650 l 4575 5400 l 2925 5400 l
+ cp gs col0 s gr 
+% Polyline
+n 4875 4350 m 4875 5100 l
+ 4575 5400 l gs col0 s gr 
+% Polyline
+n 2925 4650 m 2925 3900 l 3225 3600 l
+ 4050 3600 l gs col0 s gr 
+% Polyline
+n 3750 4650 m 3750 3900 l
+ 4050 3600 l gs col0 s gr 
+% Polyline
+n 2925 3900 m
+ 3750 3900 l gs col0 s gr 
+% Polyline
+n 3750 4650 m 4050 4350 l
+ 4875 4350 l gs col0 s gr 
+% Polyline
+n 4050 4350 m
+ 4050 3600 l gs col0 s gr 
+/Helvetica-Bold ff 180.00 scf sf
+3750 5850 m
+gs 1 -1 sc (A) dup sw pop 2 div neg 0 rm  col0 sh gr
+/Helvetica-Bold ff 180.00 scf sf
+3750 5100 m
+gs 1 -1 sc (B) dup sw pop 2 div neg 0 rm  col0 sh gr
+% Polyline
+n 4875 5100 m 4875 5850 l
+ 4575 6150 l gs col0 s gr 
+$F2psEnd
+rs