From f74d50dc72a5c86b290628235b2a72e484f22bad Mon Sep 17 00:00:00 2001 From: Johann Kois Date: Thu, 17 Jun 2010 11:24:25 +0000 Subject: [PATCH] Whitespace-change only. Make the output a little bit more readable by - addings some list (orderedlist, itemized list) - adding some additional paragraphs - rearranging the Q/A section so that something useful is displayed in the list of Q/A. Suggested by: Benjamin Lukas (qavvap att googlemail dott com) --- .../articles/vm-design/article.sgml | 160 ++++++++++++------ 1 file changed, 109 insertions(+), 51 deletions(-) diff --git a/en_US.ISO8859-1/articles/vm-design/article.sgml b/en_US.ISO8859-1/articles/vm-design/article.sgml index 5fdf10537a..a0ec9e0392 100644 --- a/en_US.ISO8859-1/articles/vm-design/article.sgml +++ b/en_US.ISO8859-1/articles/vm-design/article.sgml @@ -96,22 +96,33 @@ environmental factors. In any direct comparison between platforms, these issues become most apparent when system resources begin to get stressed. As I describe &os;'s VM/Swap subsystem the reader should - always keep two points in mind. First, the most important aspect of - performance design is what is known as Optimizing the Critical - Path. It is often the case that performance optimizations add a - little bloat to the code in order to make the critical path perform - better. Second, a solid, generalized design outperforms a - heavily-optimized design over the long run. While a generalized design - may end up being slower than an heavily-optimized design when they are - first implemented, the generalized design tends to be easier to adapt to - changing conditions and the heavily-optimized design winds up having to - be thrown away. Any codebase that will survive and be maintainable for + always keep two points in mind: + + + + The most important aspect of performance design is what is + known as Optimizing the Critical Path. It is often + the case that performance optimizations add a little bloat to the + code in order to make the critical path perform better. + + + + A solid, generalized design outperforms a heavily-optimized + design over the long run. While a generalized design may end up + being slower than an heavily-optimized design when they are + first implemented, the generalized design tends to be easier to + adapt to changing conditions and the heavily-optimized design + winds up having to be thrown away. + + + + Any codebase that will survive and be maintainable for years must therefore be designed properly from the beginning even if it costs some performance. Twenty years ago people were still arguing that programming in assembly was better than programming in a high-level language because it produced code that was ten times as fast. Today, - the fallibility of that argument is obvious—as are the parallels - to algorithmic design and code generalization. + the fallibility of that argument is obvious  — as are + the parallels to algorithmic design and code generalization. @@ -318,40 +329,85 @@ memory that does not otherwise have it. &os; allocates the swap management structure for a VM Object only when it is actually needed. However, the swap management structure has had problems - historically. + historically: - Under &os; 3.X the swap management structure preallocates an - array that encompasses the entire object requiring swap backing - store—even if only a few pages of that object are swap-backed. - This creates a kernel memory fragmentation problem when large objects - are mapped, or processes with large runsizes (RSS) fork. Also, in order - to keep track of swap space, a list of holes is kept in - kernel memory, and this tends to get severely fragmented as well. Since - the list of holes is a linear list, the swap allocation and freeing - performance is a non-optimal O(n)-per-page. It also requires kernel - memory allocations to take place during the swap freeing process, and - that creates low memory deadlock problems. The problem is further - exacerbated by holes created due to the interleaving algorithm. Also, - the swap block map can become fragmented fairly easily resulting in - non-contiguous allocations. Kernel memory must also be allocated on the - fly for additional swap management structures when a swapout occurs. It - is evident that there was plenty of room for improvement. + + + Under &os; 3.X the swap management structure preallocates an + array that encompasses the entire object requiring swap backing + store—even if only a few pages of that object are + swap-backed. This creates a kernel memory fragmentation problem + when large objects are mapped, or processes with large runsizes + (RSS) fork. + - For &os; 4.X, I completely rewrote the swap subsystem. With this - rewrite, swap management structures are allocated through a hash table - rather than a linear array giving them a fixed allocation size and much - finer granularity. Rather then using a linearly linked list to keep - track of swap space reservations, it now uses a bitmap of swap blocks - arranged in a radix tree structure with free-space hinting in the radix - node structures. This effectively makes swap allocation and freeing an - O(1) operation. The entire radix tree bitmap is also preallocated in - order to avoid having to allocate kernel memory during critical low - memory swapping operations. After all, the system tends to swap when it - is low on memory so we should avoid allocating kernel memory at such - times in order to avoid potential deadlocks. Finally, to reduce - fragmentation the radix tree is capable of allocating large contiguous - chunks at once, skipping over smaller fragmented chunks. I did not take - the final step of having an allocating hint pointer that would trundle + + Also, in order to keep track of swap space, a list of + holes is kept in kernel memory, and this tends to get + severely fragmented as well. Since the list of + holes is a linear list, the swap allocation and freeing + performance is a non-optimal O(n)-per-page. + + + + It requires kernel memory allocations to take place during + the swap freeing process, and that creates low memory deadlock + problems. + + + + The problem is further exacerbated by holes created due to + the interleaving algorithm. + + + + Also, the swap block map can become fragmented fairly easily + resulting in non-contiguous allocations. + + + + Kernel memory must also be allocated on the fly for additional + swap management structures when a swapout occurs. + + + + It is evident from that list that there was plenty of room for + improvement. For &os; 4.X, I completely rewrote the swap + subsystem: + + + + Swap management structures are allocated through a hash + table rather than a linear array giving them a fixed allocation + size and much finer granularity. + + + + Rather then using a linearly linked list to keep track of + swap space reservations, it now uses a bitmap of swap blocks + arranged in a radix tree structure with free-space hinting in + the radix node structures. This effectively makes swap + allocation and freeing an O(1) operation. + + + + The entire radix tree bitmap is also preallocated in + order to avoid having to allocate kernel memory during critical + low memory swapping operations. After all, the system tends to + swap when it is low on memory so we should avoid allocating + kernel memory at such times in order to avoid potential + deadlocks. + + + + To reduce fragmentation the radix tree is capable + of allocating large contiguous chunks at once, skipping over + smaller fragmented chunks. + + + + I did not take the final step of having an + allocating hint pointer that would trundle through a portion of swap as allocations were made in order to further guarantee contiguous allocations or at least locality of reference, but I ensured that such an addition could be made. @@ -431,7 +487,9 @@ systems with very low cache queue counts and high active queue counts when doing a systat -vm command. As the VM system becomes more stressed, it makes a greater effort to maintain the various - page queues at the levels determined to be the most effective. An urban + page queues at the levels determined to be the most effective. + + An urban myth has circulated for years that Linux did a better job avoiding swapouts than &os;, but this in fact is not true. What was actually occurring was that &os; was proactively paging out unused pages in @@ -623,6 +681,12 @@ + How is the separation of clean and dirty (inactive) pages + related to the situation where you see low cache queue counts and + high active queue counts in systat -vm? Do the + systat stats roll the active and dirty pages together for the + active queue count? + I do not get the following:
@@ -635,12 +699,6 @@ cache queue counts and high active queue counts when doing a systat -vm command.
- - How is the separation of clean and dirty (inactive) pages - related to the situation where you see low cache queue counts and - high active queue counts in systat -vm? Do the - systat stats roll the active and dirty pages together for the - active queue count?