Whitespace-change only.
Make the output a little bit more readable by - addings some list (orderedlist, itemized list) - adding some additional paragraphs - rearranging the Q/A section so that something useful is displayed in the list of Q/A. Suggested by: Benjamin Lukas (qavvap att googlemail dott com)
This commit is contained in:
parent
77f5bcecff
commit
f74d50dc72
Notes:
svn2git
2020-12-08 03:00:23 +00:00
svn path=/head/; revision=35897
1 changed files with 109 additions and 51 deletions
|
@ -96,22 +96,33 @@
|
|||
environmental factors. In any direct comparison between platforms,
|
||||
these issues become most apparent when system resources begin to get
|
||||
stressed. As I describe &os;'s VM/Swap subsystem the reader should
|
||||
always keep two points in mind. First, the most important aspect of
|
||||
performance design is what is known as <quote>Optimizing the Critical
|
||||
Path</quote>. It is often the case that performance optimizations add a
|
||||
little bloat to the code in order to make the critical path perform
|
||||
better. Second, a solid, generalized design outperforms a
|
||||
heavily-optimized design over the long run. While a generalized design
|
||||
may end up being slower than an heavily-optimized design when they are
|
||||
first implemented, the generalized design tends to be easier to adapt to
|
||||
changing conditions and the heavily-optimized design winds up having to
|
||||
be thrown away. Any codebase that will survive and be maintainable for
|
||||
always keep two points in mind:</para>
|
||||
|
||||
<orderedlist>
|
||||
<listitem>
|
||||
<para>The most important aspect of performance design is what is
|
||||
known as <quote>Optimizing the Critical Path</quote>. It is often
|
||||
the case that performance optimizations add a little bloat to the
|
||||
code in order to make the critical path perform better.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>A solid, generalized design outperforms a heavily-optimized
|
||||
design over the long run. While a generalized design may end up
|
||||
being slower than an heavily-optimized design when they are
|
||||
first implemented, the generalized design tends to be easier to
|
||||
adapt to changing conditions and the heavily-optimized design
|
||||
winds up having to be thrown away.</para>
|
||||
</listitem>
|
||||
</orderedlist>
|
||||
|
||||
<para>Any codebase that will survive and be maintainable for
|
||||
years must therefore be designed properly from the beginning even if it
|
||||
costs some performance. Twenty years ago people were still arguing that
|
||||
programming in assembly was better than programming in a high-level
|
||||
language because it produced code that was ten times as fast. Today,
|
||||
the fallibility of that argument is obvious—as are the parallels
|
||||
to algorithmic design and code generalization.</para>
|
||||
the fallibility of that argument is obvious — as are
|
||||
the parallels to algorithmic design and code generalization.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="vm-objects">
|
||||
|
@ -318,40 +329,85 @@
|
|||
memory that does not otherwise have it. &os; allocates the swap
|
||||
management structure for a VM Object only when it is actually needed.
|
||||
However, the swap management structure has had problems
|
||||
historically.</para>
|
||||
historically:</para>
|
||||
|
||||
<para>Under &os; 3.X the swap management structure preallocates an
|
||||
array that encompasses the entire object requiring swap backing
|
||||
store—even if only a few pages of that object are swap-backed.
|
||||
This creates a kernel memory fragmentation problem when large objects
|
||||
are mapped, or processes with large runsizes (RSS) fork. Also, in order
|
||||
to keep track of swap space, a <quote>list of holes</quote> is kept in
|
||||
kernel memory, and this tends to get severely fragmented as well. Since
|
||||
the <quote>list of holes</quote> is a linear list, the swap allocation and freeing
|
||||
performance is a non-optimal O(n)-per-page. It also requires kernel
|
||||
memory allocations to take place during the swap freeing process, and
|
||||
that creates low memory deadlock problems. The problem is further
|
||||
exacerbated by holes created due to the interleaving algorithm. Also,
|
||||
the swap block map can become fragmented fairly easily resulting in
|
||||
non-contiguous allocations. Kernel memory must also be allocated on the
|
||||
fly for additional swap management structures when a swapout occurs. It
|
||||
is evident that there was plenty of room for improvement.</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Under &os; 3.X the swap management structure preallocates an
|
||||
array that encompasses the entire object requiring swap backing
|
||||
store—even if only a few pages of that object are
|
||||
swap-backed. This creates a kernel memory fragmentation problem
|
||||
when large objects are mapped, or processes with large runsizes
|
||||
(RSS) fork.</para>
|
||||
</listitem>
|
||||
|
||||
<para>For &os; 4.X, I completely rewrote the swap subsystem. With this
|
||||
rewrite, swap management structures are allocated through a hash table
|
||||
rather than a linear array giving them a fixed allocation size and much
|
||||
finer granularity. Rather then using a linearly linked list to keep
|
||||
track of swap space reservations, it now uses a bitmap of swap blocks
|
||||
arranged in a radix tree structure with free-space hinting in the radix
|
||||
node structures. This effectively makes swap allocation and freeing an
|
||||
O(1) operation. The entire radix tree bitmap is also preallocated in
|
||||
order to avoid having to allocate kernel memory during critical low
|
||||
memory swapping operations. After all, the system tends to swap when it
|
||||
is low on memory so we should avoid allocating kernel memory at such
|
||||
times in order to avoid potential deadlocks. Finally, to reduce
|
||||
fragmentation the radix tree is capable of allocating large contiguous
|
||||
chunks at once, skipping over smaller fragmented chunks. I did not take
|
||||
the final step of having an <quote>allocating hint pointer</quote> that would trundle
|
||||
<listitem>
|
||||
<para>Also, in order to keep track of swap space, a <quote>list of
|
||||
holes</quote> is kept in kernel memory, and this tends to get
|
||||
severely fragmented as well. Since the <quote>list of
|
||||
holes</quote> is a linear list, the swap allocation and freeing
|
||||
performance is a non-optimal O(n)-per-page.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>It requires kernel memory allocations to take place during
|
||||
the swap freeing process, and that creates low memory deadlock
|
||||
problems.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>The problem is further exacerbated by holes created due to
|
||||
the interleaving algorithm.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>Also, the swap block map can become fragmented fairly easily
|
||||
resulting in non-contiguous allocations.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>Kernel memory must also be allocated on the fly for additional
|
||||
swap management structures when a swapout occurs.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>It is evident from that list that there was plenty of room for
|
||||
improvement. For &os; 4.X, I completely rewrote the swap
|
||||
subsystem:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Swap management structures are allocated through a hash
|
||||
table rather than a linear array giving them a fixed allocation
|
||||
size and much finer granularity.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>Rather then using a linearly linked list to keep track of
|
||||
swap space reservations, it now uses a bitmap of swap blocks
|
||||
arranged in a radix tree structure with free-space hinting in
|
||||
the radix node structures. This effectively makes swap
|
||||
allocation and freeing an O(1) operation.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>The entire radix tree bitmap is also preallocated in
|
||||
order to avoid having to allocate kernel memory during critical
|
||||
low memory swapping operations. After all, the system tends to
|
||||
swap when it is low on memory so we should avoid allocating
|
||||
kernel memory at such times in order to avoid potential
|
||||
deadlocks.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>To reduce fragmentation the radix tree is capable
|
||||
of allocating large contiguous chunks at once, skipping over
|
||||
smaller fragmented chunks.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>I did not take the final step of having an
|
||||
<quote>allocating hint pointer</quote> that would trundle
|
||||
through a portion of swap as allocations were made in order to further
|
||||
guarantee contiguous allocations or at least locality of reference, but
|
||||
I ensured that such an addition could be made.</para>
|
||||
|
@ -431,7 +487,9 @@
|
|||
systems with very low cache queue counts and high active queue counts
|
||||
when doing a <command>systat -vm</command> command. As the VM system
|
||||
becomes more stressed, it makes a greater effort to maintain the various
|
||||
page queues at the levels determined to be the most effective. An urban
|
||||
page queues at the levels determined to be the most effective.</para>
|
||||
|
||||
<para>An urban
|
||||
myth has circulated for years that Linux did a better job avoiding
|
||||
swapouts than &os;, but this in fact is not true. What was actually
|
||||
occurring was that &os; was proactively paging out unused pages in
|
||||
|
@ -623,6 +681,12 @@
|
|||
|
||||
<qandaentry>
|
||||
<question>
|
||||
<para>How is the separation of clean and dirty (inactive) pages
|
||||
related to the situation where you see low cache queue counts and
|
||||
high active queue counts in <command>systat -vm</command>? Do the
|
||||
systat stats roll the active and dirty pages together for the
|
||||
active queue count?</para>
|
||||
|
||||
<para>I do not get the following:</para>
|
||||
|
||||
<blockquote>
|
||||
|
@ -635,12 +699,6 @@
|
|||
cache queue counts and high active queue counts when doing a
|
||||
<command>systat -vm</command> command.</para>
|
||||
</blockquote>
|
||||
|
||||
<para>How is the separation of clean and dirty (inactive) pages
|
||||
related to the situation where you see low cache queue counts and
|
||||
high active queue counts in <command>systat -vm</command>? Do the
|
||||
systat stats roll the active and dirty pages together for the
|
||||
active queue count?</para>
|
||||
</question>
|
||||
|
||||
<answer>
|
||||
|
|
Loading…
Reference in a new issue