Whitespace-change only.

Make the output a little bit more readable by

- addings some list (orderedlist, itemized list)
- adding some additional paragraphs
- rearranging the Q/A section so that something useful is displayed in the list of Q/A.

Suggested by:   Benjamin Lukas (qavvap att googlemail dott com)
This commit is contained in:
Johann Kois 2010-06-17 11:24:25 +00:00
parent 77f5bcecff
commit f74d50dc72
Notes: svn2git 2020-12-08 03:00:23 +00:00
svn path=/head/; revision=35897

View file

@ -96,22 +96,33 @@
environmental factors. In any direct comparison between platforms, environmental factors. In any direct comparison between platforms,
these issues become most apparent when system resources begin to get these issues become most apparent when system resources begin to get
stressed. As I describe &os;'s VM/Swap subsystem the reader should stressed. As I describe &os;'s VM/Swap subsystem the reader should
always keep two points in mind. First, the most important aspect of always keep two points in mind:</para>
performance design is what is known as <quote>Optimizing the Critical
Path</quote>. It is often the case that performance optimizations add a <orderedlist>
little bloat to the code in order to make the critical path perform <listitem>
better. Second, a solid, generalized design outperforms a <para>The most important aspect of performance design is what is
heavily-optimized design over the long run. While a generalized design known as <quote>Optimizing the Critical Path</quote>. It is often
may end up being slower than an heavily-optimized design when they are the case that performance optimizations add a little bloat to the
first implemented, the generalized design tends to be easier to adapt to code in order to make the critical path perform better.</para>
changing conditions and the heavily-optimized design winds up having to </listitem>
be thrown away. Any codebase that will survive and be maintainable for
<listitem>
<para>A solid, generalized design outperforms a heavily-optimized
design over the long run. While a generalized design may end up
being slower than an heavily-optimized design when they are
first implemented, the generalized design tends to be easier to
adapt to changing conditions and the heavily-optimized design
winds up having to be thrown away.</para>
</listitem>
</orderedlist>
<para>Any codebase that will survive and be maintainable for
years must therefore be designed properly from the beginning even if it years must therefore be designed properly from the beginning even if it
costs some performance. Twenty years ago people were still arguing that costs some performance. Twenty years ago people were still arguing that
programming in assembly was better than programming in a high-level programming in assembly was better than programming in a high-level
language because it produced code that was ten times as fast. Today, language because it produced code that was ten times as fast. Today,
the fallibility of that argument is obvious&mdash;as are the parallels the fallibility of that argument is obvious &nbsp;&mdash;&nbsp;as are
to algorithmic design and code generalization.</para> the parallels to algorithmic design and code generalization.</para>
</sect1> </sect1>
<sect1 id="vm-objects"> <sect1 id="vm-objects">
@ -318,40 +329,85 @@
memory that does not otherwise have it. &os; allocates the swap memory that does not otherwise have it. &os; allocates the swap
management structure for a VM Object only when it is actually needed. management structure for a VM Object only when it is actually needed.
However, the swap management structure has had problems However, the swap management structure has had problems
historically.</para> historically:</para>
<itemizedlist>
<listitem>
<para>Under &os; 3.X the swap management structure preallocates an <para>Under &os; 3.X the swap management structure preallocates an
array that encompasses the entire object requiring swap backing array that encompasses the entire object requiring swap backing
store&mdash;even if only a few pages of that object are swap-backed. store&mdash;even if only a few pages of that object are
This creates a kernel memory fragmentation problem when large objects swap-backed. This creates a kernel memory fragmentation problem
are mapped, or processes with large runsizes (RSS) fork. Also, in order when large objects are mapped, or processes with large runsizes
to keep track of swap space, a <quote>list of holes</quote> is kept in (RSS) fork.</para>
kernel memory, and this tends to get severely fragmented as well. Since </listitem>
the <quote>list of holes</quote> is a linear list, the swap allocation and freeing
performance is a non-optimal O(n)-per-page. It also requires kernel
memory allocations to take place during the swap freeing process, and
that creates low memory deadlock problems. The problem is further
exacerbated by holes created due to the interleaving algorithm. Also,
the swap block map can become fragmented fairly easily resulting in
non-contiguous allocations. Kernel memory must also be allocated on the
fly for additional swap management structures when a swapout occurs. It
is evident that there was plenty of room for improvement.</para>
<para>For &os; 4.X, I completely rewrote the swap subsystem. With this <listitem>
rewrite, swap management structures are allocated through a hash table <para>Also, in order to keep track of swap space, a <quote>list of
rather than a linear array giving them a fixed allocation size and much holes</quote> is kept in kernel memory, and this tends to get
finer granularity. Rather then using a linearly linked list to keep severely fragmented as well. Since the <quote>list of
track of swap space reservations, it now uses a bitmap of swap blocks holes</quote> is a linear list, the swap allocation and freeing
arranged in a radix tree structure with free-space hinting in the radix performance is a non-optimal O(n)-per-page.</para>
node structures. This effectively makes swap allocation and freeing an </listitem>
O(1) operation. The entire radix tree bitmap is also preallocated in
order to avoid having to allocate kernel memory during critical low <listitem>
memory swapping operations. After all, the system tends to swap when it <para>It requires kernel memory allocations to take place during
is low on memory so we should avoid allocating kernel memory at such the swap freeing process, and that creates low memory deadlock
times in order to avoid potential deadlocks. Finally, to reduce problems.</para>
fragmentation the radix tree is capable of allocating large contiguous </listitem>
chunks at once, skipping over smaller fragmented chunks. I did not take
the final step of having an <quote>allocating hint pointer</quote> that would trundle <listitem>
<para>The problem is further exacerbated by holes created due to
the interleaving algorithm.</para>
</listitem>
<listitem>
<para>Also, the swap block map can become fragmented fairly easily
resulting in non-contiguous allocations.</para>
</listitem>
<listitem>
<para>Kernel memory must also be allocated on the fly for additional
swap management structures when a swapout occurs.</para>
</listitem>
</itemizedlist>
<para>It is evident from that list that there was plenty of room for
improvement. For &os; 4.X, I completely rewrote the swap
subsystem:</para>
<itemizedlist>
<listitem>
<para>Swap management structures are allocated through a hash
table rather than a linear array giving them a fixed allocation
size and much finer granularity.</para>
</listitem>
<listitem>
<para>Rather then using a linearly linked list to keep track of
swap space reservations, it now uses a bitmap of swap blocks
arranged in a radix tree structure with free-space hinting in
the radix node structures. This effectively makes swap
allocation and freeing an O(1) operation.</para>
</listitem>
<listitem>
<para>The entire radix tree bitmap is also preallocated in
order to avoid having to allocate kernel memory during critical
low memory swapping operations. After all, the system tends to
swap when it is low on memory so we should avoid allocating
kernel memory at such times in order to avoid potential
deadlocks.</para>
</listitem>
<listitem>
<para>To reduce fragmentation the radix tree is capable
of allocating large contiguous chunks at once, skipping over
smaller fragmented chunks.</para>
</listitem>
</itemizedlist>
<para>I did not take the final step of having an
<quote>allocating hint pointer</quote> that would trundle
through a portion of swap as allocations were made in order to further through a portion of swap as allocations were made in order to further
guarantee contiguous allocations or at least locality of reference, but guarantee contiguous allocations or at least locality of reference, but
I ensured that such an addition could be made.</para> I ensured that such an addition could be made.</para>
@ -431,7 +487,9 @@
systems with very low cache queue counts and high active queue counts systems with very low cache queue counts and high active queue counts
when doing a <command>systat -vm</command> command. As the VM system when doing a <command>systat -vm</command> command. As the VM system
becomes more stressed, it makes a greater effort to maintain the various becomes more stressed, it makes a greater effort to maintain the various
page queues at the levels determined to be the most effective. An urban page queues at the levels determined to be the most effective.</para>
<para>An urban
myth has circulated for years that Linux did a better job avoiding myth has circulated for years that Linux did a better job avoiding
swapouts than &os;, but this in fact is not true. What was actually swapouts than &os;, but this in fact is not true. What was actually
occurring was that &os; was proactively paging out unused pages in occurring was that &os; was proactively paging out unused pages in
@ -623,6 +681,12 @@
<qandaentry> <qandaentry>
<question> <question>
<para>How is the separation of clean and dirty (inactive) pages
related to the situation where you see low cache queue counts and
high active queue counts in <command>systat -vm</command>? Do the
systat stats roll the active and dirty pages together for the
active queue count?</para>
<para>I do not get the following:</para> <para>I do not get the following:</para>
<blockquote> <blockquote>
@ -635,12 +699,6 @@
cache queue counts and high active queue counts when doing a cache queue counts and high active queue counts when doing a
<command>systat -vm</command> command.</para> <command>systat -vm</command> command.</para>
</blockquote> </blockquote>
<para>How is the separation of clean and dirty (inactive) pages
related to the situation where you see low cache queue counts and
high active queue counts in <command>systat -vm</command>? Do the
systat stats roll the active and dirty pages together for the
active queue count?</para>
</question> </question>
<answer> <answer>