From 77f5bcecffecac1cf84172183302dd1951e47cc0 Mon Sep 17 00:00:00 2001 From: Johann Kois Date: Thu, 17 Jun 2010 09:19:58 +0000 Subject: [PATCH] s/FreeBSD/&os;/ Suggested by: Benjamin Lukas (qavvap att googlemail dott com) --- .../articles/vm-design/article.sgml | 92 +++++++++---------- 1 file changed, 46 insertions(+), 46 deletions(-) diff --git a/en_US.ISO8859-1/articles/vm-design/article.sgml b/en_US.ISO8859-1/articles/vm-design/article.sgml index 01fa36b51a..5fdf10537a 100644 --- a/en_US.ISO8859-1/articles/vm-design/article.sgml +++ b/en_US.ISO8859-1/articles/vm-design/article.sgml @@ -8,7 +8,7 @@
- Design elements of the FreeBSD VM system + Design elements of the &os; VM system @@ -36,7 +36,7 @@ The title is really just a fancy way of saying that I am going to attempt to describe the whole VM enchilada, hopefully in a way that everyone can follow. For the last year I have concentrated on a number - of major kernel subsystems within FreeBSD, with the VM and Swap + of major kernel subsystems within &os;, with the VM and Swap subsystems being the most interesting and NFS being a necessary chore. I rewrote only small portions of the code. In the VM arena the only major rewrite I have done is to the swap subsystem. @@ -53,7 +53,7 @@ This article was originally published in the January 2000 issue of DaemonNews. This version of the article may include updates from Matt and other authors - to reflect changes in FreeBSD's VM implementation. + to reflect changes in &os;'s VM implementation. @@ -71,7 +71,7 @@ operating system by some people, those of us who work on it tend to view it more as a mature codebase which has various components modified, extended, or replaced with modern code. It has evolved, and - FreeBSD is at the bleeding edge no matter how old some of the code might + &os; is at the bleeding edge no matter how old some of the code might be. This is an important distinction to make and one that is unfortunately lost to many people. The biggest error a programmer can make is to not learn from history, and this is precisely the error that @@ -89,13 +89,13 @@ right because our marketing department says so. I have little tolerance for anyone who cannot learn from history. - Much of the apparent complexity of the FreeBSD design, especially in + Much of the apparent complexity of the &os; design, especially in the VM/Swap subsystem, is a direct result of having to solve serious performance issues that occur under various conditions. These issues are not due to bad algorithmic design but instead rise from environmental factors. In any direct comparison between platforms, these issues become most apparent when system resources begin to get - stressed. As I describe FreeBSD's VM/Swap subsystem the reader should + stressed. As I describe &os;'s VM/Swap subsystem the reader should always keep two points in mind. First, the most important aspect of performance design is what is known as Optimizing the Critical Path. It is often the case that performance optimizations add a @@ -117,7 +117,7 @@ VM Objects - The best way to begin describing the FreeBSD VM system is to look at + The best way to begin describing the &os; VM system is to look at it from the perspective of a user-level process. Each user process sees a single, private, contiguous VM address space containing several types of memory objects. These objects have various characteristics. Program @@ -157,7 +157,7 @@ (parent and child) expects their own personal post-fork modifications to remain private to themselves and not effect the other. - FreeBSD manages all of this with a layered VM Object model. The + &os; manages all of this with a layered VM Object model. The original binary program file winds up being the lowest VM Object layer. A copy-on-write layer is pushed on top of that to hold those pages which had to be copied from the original file. If the program modifies a data @@ -235,7 +235,7 @@ The original page in B is now completely hidden since both C1 and C2 have a copy and B could theoretically be destroyed if it does not represent a real file; however, this sort of optimization is not - trivial to make because it is so fine-grained. FreeBSD does not make + trivial to make because it is so fine-grained. &os; does not make this optimization. Now, suppose (as is often the case) that the child process does an exec(). Its current address space is usually replaced by a new address space representing a new file. In @@ -274,7 +274,7 @@ get their own private copies of the page and the original page in B is no longer accessible by anyone. That page in B can be freed. - FreeBSD solves the deep layering problem with a special optimization + &os; solves the deep layering problem with a special optimization called the All Shadowed Case. This case occurs if either C1 or C2 take sufficient COW faults to completely shadow all pages in B. Lets say that C1 achieves this. C1 can now bypass B entirely, so rather @@ -303,7 +303,7 @@ copying need take place. The disadvantage is that you can build a relatively complex VM Object layering that slows page fault handling down a little, and you spend memory managing the VM Object structures. - The optimizations FreeBSD makes proves to reduce the problems enough + The optimizations &os; makes proves to reduce the problems enough that they can be ignored, leaving no real disadvantage. @@ -315,12 +315,12 @@ backing object (usually a file) can no longer be used to save a copy of the page when the VM system needs to reuse it for other purposes. This is where SWAP comes in. SWAP is allocated to create backing store for - memory that does not otherwise have it. FreeBSD allocates the swap + memory that does not otherwise have it. &os; allocates the swap management structure for a VM Object only when it is actually needed. However, the swap management structure has had problems historically. - Under FreeBSD 3.X the swap management structure preallocates an + Under &os; 3.X the swap management structure preallocates an array that encompasses the entire object requiring swap backing store—even if only a few pages of that object are swap-backed. This creates a kernel memory fragmentation problem when large objects @@ -337,7 +337,7 @@ fly for additional swap management structures when a swapout occurs. It is evident that there was plenty of room for improvement. - For FreeBSD 4.X, I completely rewrote the swap subsystem. With this + For &os; 4.X, I completely rewrote the swap subsystem. With this rewrite, swap management structures are allocated through a hash table rather than a linear array giving them a fixed allocation size and much finer granularity. Rather then using a linearly linked list to keep @@ -373,7 +373,7 @@ hundreds of thousands of CPU cycles and a noticeable stall of the affected processes, so we are willing to endure a significant amount of overhead in order to be sure that the right page is chosen. This is why - FreeBSD tends to outperform other systems when memory resources become + &os; tends to outperform other systems when memory resources become stressed. The free page determination algorithm is built upon a history of the @@ -403,10 +403,10 @@ then have to go to disk. - FreeBSD makes use of several page queues to further refine the + &os; makes use of several page queues to further refine the selection of pages to reuse as well as to determine when dirty pages must be flushed to their backing store. Since page tables are dynamic - entities under FreeBSD, it costs virtually nothing to unmap a page from + entities under &os;, it costs virtually nothing to unmap a page from the address space of any processes using it. When a page candidate has been chosen based on the page-use counter, this is precisely what is done. The system must make a distinction between clean pages which can @@ -423,7 +423,7 @@ in an LRU (least-recently used) fashion when the system needs to allocate new memory. - It is important to note that the FreeBSD VM system attempts to + It is important to note that the &os; VM system attempts to separate clean and dirty pages for the express reason of avoiding unnecessary flushes of dirty pages (which eats I/O bandwidth), nor does it move pages between the various page queues gratuitously when the @@ -433,8 +433,8 @@ becomes more stressed, it makes a greater effort to maintain the various page queues at the levels determined to be the most effective. An urban myth has circulated for years that Linux did a better job avoiding - swapouts than FreeBSD, but this in fact is not true. What was actually - occurring was that FreeBSD was proactively paging out unused pages in + swapouts than &os;, but this in fact is not true. What was actually + occurring was that &os; was proactively paging out unused pages in order to make room for more disk cache while Linux was keeping unused pages in core and leaving less memory available for cache and process pages. I do not know whether this is still true today. @@ -451,9 +451,9 @@ not mapped into the page table, then all the pages that will be accessed by the program will have to be faulted in every time the program is run. This is unnecessary when the pages in question are already in the VM - Cache, so FreeBSD will attempt to pre-populate a process's page tables + Cache, so &os; will attempt to pre-populate a process's page tables with those pages that are already in the VM Cache. One thing that - FreeBSD does not yet do is pre-copy-on-write certain pages on exec. For + &os; does not yet do is pre-copy-on-write certain pages on exec. For example, if you run the &man.ls.1; program while running vmstat 1 you will notice that it always takes a certain number of page faults, even when you run it over and over again. These are @@ -480,7 +480,7 @@ Page Table Optimizations The page table optimizations make up the most contentious part of - the FreeBSD VM design and they have shown some strain with the advent of + the &os; VM design and they have shown some strain with the advent of serious use of mmap(). I think this is actually a feature of most BSDs though I am not sure when it was first introduced. There are two major optimizations. The first is that hardware page @@ -488,23 +488,23 @@ any time with only a minor amount of management overhead. The second is that every active page table entry in the system has a governing pv_entry structure which is tied into the - vm_page structure. FreeBSD can simply iterate + vm_page structure. &os; can simply iterate through those mappings that are known to exist while Linux must check all page tables that might contain a specific mapping to see if it does, which can achieve O(n^2) overhead in certain - situations. It is because of this that FreeBSD tends to make better + situations. It is because of this that &os; tends to make better choices on which pages to reuse or swap when memory is stressed, giving - it better performance under load. However, FreeBSD requires kernel + it better performance under load. However, &os; requires kernel tuning to accommodate large-shared-address-space situations such as those that can occur in a news system because it may run out of pv_entry structures. - Both Linux and FreeBSD need work in this area. FreeBSD is trying to + Both Linux and &os; need work in this area. &os; is trying to maximize the advantage of a potentially sparse active-mapping model (not all processes need to map all pages of a shared library, for example), - whereas Linux is trying to simplify its algorithms. FreeBSD generally + whereas Linux is trying to simplify its algorithms. &os; generally has the performance advantage here at the cost of wasting a little extra - memory, but FreeBSD breaks down in the case where a large file is + memory, but &os; breaks down in the case where a large file is massively shared across hundreds of processes. Linux, on the other hand, breaks down in the case where many processes are sparsely-mapping the same shared library and also runs non-optimally when trying to determine @@ -530,7 +530,7 @@ even with multi-way set-associative caches (though the effect is mitigated somewhat). - FreeBSD's memory allocation code implements page coloring + &os;'s memory allocation code implements page coloring optimizations, which means that the memory allocation code will attempt to locate free pages that are contiguous from the point of view of the cache. For example, if page 16 of physical memory is assigned to page 0 @@ -554,7 +554,7 @@ modular and algorithmic approach that BSD has historically taken allows us to study and understand the current implementation as well as relatively cleanly replace large sections of the code. There have been a - number of improvements to the FreeBSD VM system in the last several + number of improvements to the &os; VM system in the last several years, and work is ongoing. @@ -566,23 +566,23 @@ What is the interleaving algorithm that you - refer to in your listing of the ills of the FreeBSD 3.X swap + refer to in your listing of the ills of the &os; 3.X swap arrangements? - FreeBSD uses a fixed swap interleave which defaults to 4. This - means that FreeBSD reserves space for four swap areas even if you + &os; uses a fixed swap interleave which defaults to 4. This + means that &os; reserves space for four swap areas even if you only have one, two, or three. Since swap is interleaved the linear address space representing the four swap areas will be fragmented if you do not actually have four swap areas. For - example, if you have two swap areas A and B FreeBSD's address + example, if you have two swap areas A and B &os;'s address space representation for that swap area will be interleaved in blocks of 16 pages: A B C D A B C D A B C D A B C D - FreeBSD 3.X uses a sequential list of free + &os; 3.X uses a sequential list of free regions approach to accounting for the free swap areas. The idea is that large blocks of free linear space can be represented with a single list node @@ -626,7 +626,7 @@ I do not get the following:
- It is important to note that the FreeBSD VM system attempts + It is important to note that the &os; VM system attempts to separate clean and dirty pages for the express reason of avoiding unnecessary flushes of dirty pages (which eats I/O bandwidth), nor does it move pages between the various page @@ -649,7 +649,7 @@ separate the pages but the reality is that if we are not in a memory crunch, we do not really have to. - What this means is that FreeBSD will not try very hard to + What this means is that &os; will not try very hard to separate out dirty pages (inactive queue) from clean pages (cache queue) when the system is not being stressed, nor will it try to deactivate pages (active queue -> inactive queue) when the system @@ -663,14 +663,14 @@ would not some of the page faults be data page faults (COW from executable file to private page)? I.e., I would expect the page faults to be some zero-fill and some program data. Or are you - implying that FreeBSD does do pre-COW for the program data? + implying that &os; does do pre-COW for the program data? A COW fault can be either zero-fill or program-data. The mechanism is the same either way because the backing program-data is almost certainly already in the cache. I am indeed lumping the - two together. FreeBSD does not pre-COW program data or zero-fill, + two together. &os; does not pre-COW program data or zero-fill, but it does pre-map pages that exist in its cache. @@ -685,7 +685,7 @@ McKusick, Bostic, Karel, Quarterman)? Specifically, what kind of operation/reaction would require scanning the mappings? - How does Linux do in the case where FreeBSD breaks down + How does Linux do in the case where &os; breaks down (sharing a large file mapping over many processes)? @@ -717,7 +717,7 @@ index into the page table for each of those 50 processes even if only 10 of them have actually mapped the page. So Linux is trading off the simplicity of its design against performance. - Many VM algorithms which are O(1) or (small N) under FreeBSD wind + Many VM algorithms which are O(1) or (small N) under &os; wind up being O(N), O(N^2), or worse under Linux. Since the pte's representing a particular page in an object tend to be at the same offset in all the page tables they are mapped in, reducing the @@ -725,12 +725,12 @@ will often avoid blowing away the L1 cache line for that offset, which can lead to better performance. - FreeBSD has added complexity (the pv_entry + &os; has added complexity (the pv_entry scheme) in order to increase performance (to limit page table accesses to only those pte's that need to be modified). - But FreeBSD has a scaling problem that Linux does not in that + But &os; has a scaling problem that Linux does not in that there are a limited number of pv_entry structures and this causes problems when you have massive sharing of data. In this case you may run out of @@ -744,10 +744,10 @@ pv_entry scheme: Linux uses permanent page tables that are not throw away, but does not need a pv_entry for each potentially - mapped pte. FreeBSD uses throw away page tables but + mapped pte. &os; uses throw away page tables but adds in a pv_entry structure for each actually-mapped pte. I think memory utilization winds up being - about the same, giving FreeBSD an algorithmic advantage with its + about the same, giving &os; an algorithmic advantage with its ability to throw away page tables at will with very low overhead.