Whitespace-change only.
Make the output a little bit more readable by - addings some list (orderedlist, itemized list) - adding some additional paragraphs - rearranging the Q/A section so that something useful is displayed in the list of Q/A. Suggested by: Benjamin Lukas (qavvap att googlemail dott com)
This commit is contained in:
parent
77f5bcecff
commit
f74d50dc72
Notes:
svn2git
2020-12-08 03:00:23 +00:00
svn path=/head/; revision=35897
1 changed files with 109 additions and 51 deletions
|
|
@ -96,22 +96,33 @@
|
||||||
environmental factors. In any direct comparison between platforms,
|
environmental factors. In any direct comparison between platforms,
|
||||||
these issues become most apparent when system resources begin to get
|
these issues become most apparent when system resources begin to get
|
||||||
stressed. As I describe &os;'s VM/Swap subsystem the reader should
|
stressed. As I describe &os;'s VM/Swap subsystem the reader should
|
||||||
always keep two points in mind. First, the most important aspect of
|
always keep two points in mind:</para>
|
||||||
performance design is what is known as <quote>Optimizing the Critical
|
|
||||||
Path</quote>. It is often the case that performance optimizations add a
|
<orderedlist>
|
||||||
little bloat to the code in order to make the critical path perform
|
<listitem>
|
||||||
better. Second, a solid, generalized design outperforms a
|
<para>The most important aspect of performance design is what is
|
||||||
heavily-optimized design over the long run. While a generalized design
|
known as <quote>Optimizing the Critical Path</quote>. It is often
|
||||||
may end up being slower than an heavily-optimized design when they are
|
the case that performance optimizations add a little bloat to the
|
||||||
first implemented, the generalized design tends to be easier to adapt to
|
code in order to make the critical path perform better.</para>
|
||||||
changing conditions and the heavily-optimized design winds up having to
|
</listitem>
|
||||||
be thrown away. Any codebase that will survive and be maintainable for
|
|
||||||
|
<listitem>
|
||||||
|
<para>A solid, generalized design outperforms a heavily-optimized
|
||||||
|
design over the long run. While a generalized design may end up
|
||||||
|
being slower than an heavily-optimized design when they are
|
||||||
|
first implemented, the generalized design tends to be easier to
|
||||||
|
adapt to changing conditions and the heavily-optimized design
|
||||||
|
winds up having to be thrown away.</para>
|
||||||
|
</listitem>
|
||||||
|
</orderedlist>
|
||||||
|
|
||||||
|
<para>Any codebase that will survive and be maintainable for
|
||||||
years must therefore be designed properly from the beginning even if it
|
years must therefore be designed properly from the beginning even if it
|
||||||
costs some performance. Twenty years ago people were still arguing that
|
costs some performance. Twenty years ago people were still arguing that
|
||||||
programming in assembly was better than programming in a high-level
|
programming in assembly was better than programming in a high-level
|
||||||
language because it produced code that was ten times as fast. Today,
|
language because it produced code that was ten times as fast. Today,
|
||||||
the fallibility of that argument is obvious—as are the parallels
|
the fallibility of that argument is obvious — as are
|
||||||
to algorithmic design and code generalization.</para>
|
the parallels to algorithmic design and code generalization.</para>
|
||||||
</sect1>
|
</sect1>
|
||||||
|
|
||||||
<sect1 id="vm-objects">
|
<sect1 id="vm-objects">
|
||||||
|
|
@ -318,40 +329,85 @@
|
||||||
memory that does not otherwise have it. &os; allocates the swap
|
memory that does not otherwise have it. &os; allocates the swap
|
||||||
management structure for a VM Object only when it is actually needed.
|
management structure for a VM Object only when it is actually needed.
|
||||||
However, the swap management structure has had problems
|
However, the swap management structure has had problems
|
||||||
historically.</para>
|
historically:</para>
|
||||||
|
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
<para>Under &os; 3.X the swap management structure preallocates an
|
<para>Under &os; 3.X the swap management structure preallocates an
|
||||||
array that encompasses the entire object requiring swap backing
|
array that encompasses the entire object requiring swap backing
|
||||||
store—even if only a few pages of that object are swap-backed.
|
store—even if only a few pages of that object are
|
||||||
This creates a kernel memory fragmentation problem when large objects
|
swap-backed. This creates a kernel memory fragmentation problem
|
||||||
are mapped, or processes with large runsizes (RSS) fork. Also, in order
|
when large objects are mapped, or processes with large runsizes
|
||||||
to keep track of swap space, a <quote>list of holes</quote> is kept in
|
(RSS) fork.</para>
|
||||||
kernel memory, and this tends to get severely fragmented as well. Since
|
</listitem>
|
||||||
the <quote>list of holes</quote> is a linear list, the swap allocation and freeing
|
|
||||||
performance is a non-optimal O(n)-per-page. It also requires kernel
|
|
||||||
memory allocations to take place during the swap freeing process, and
|
|
||||||
that creates low memory deadlock problems. The problem is further
|
|
||||||
exacerbated by holes created due to the interleaving algorithm. Also,
|
|
||||||
the swap block map can become fragmented fairly easily resulting in
|
|
||||||
non-contiguous allocations. Kernel memory must also be allocated on the
|
|
||||||
fly for additional swap management structures when a swapout occurs. It
|
|
||||||
is evident that there was plenty of room for improvement.</para>
|
|
||||||
|
|
||||||
<para>For &os; 4.X, I completely rewrote the swap subsystem. With this
|
<listitem>
|
||||||
rewrite, swap management structures are allocated through a hash table
|
<para>Also, in order to keep track of swap space, a <quote>list of
|
||||||
rather than a linear array giving them a fixed allocation size and much
|
holes</quote> is kept in kernel memory, and this tends to get
|
||||||
finer granularity. Rather then using a linearly linked list to keep
|
severely fragmented as well. Since the <quote>list of
|
||||||
track of swap space reservations, it now uses a bitmap of swap blocks
|
holes</quote> is a linear list, the swap allocation and freeing
|
||||||
arranged in a radix tree structure with free-space hinting in the radix
|
performance is a non-optimal O(n)-per-page.</para>
|
||||||
node structures. This effectively makes swap allocation and freeing an
|
</listitem>
|
||||||
O(1) operation. The entire radix tree bitmap is also preallocated in
|
|
||||||
order to avoid having to allocate kernel memory during critical low
|
<listitem>
|
||||||
memory swapping operations. After all, the system tends to swap when it
|
<para>It requires kernel memory allocations to take place during
|
||||||
is low on memory so we should avoid allocating kernel memory at such
|
the swap freeing process, and that creates low memory deadlock
|
||||||
times in order to avoid potential deadlocks. Finally, to reduce
|
problems.</para>
|
||||||
fragmentation the radix tree is capable of allocating large contiguous
|
</listitem>
|
||||||
chunks at once, skipping over smaller fragmented chunks. I did not take
|
|
||||||
the final step of having an <quote>allocating hint pointer</quote> that would trundle
|
<listitem>
|
||||||
|
<para>The problem is further exacerbated by holes created due to
|
||||||
|
the interleaving algorithm.</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>Also, the swap block map can become fragmented fairly easily
|
||||||
|
resulting in non-contiguous allocations.</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>Kernel memory must also be allocated on the fly for additional
|
||||||
|
swap management structures when a swapout occurs.</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
|
||||||
|
<para>It is evident from that list that there was plenty of room for
|
||||||
|
improvement. For &os; 4.X, I completely rewrote the swap
|
||||||
|
subsystem:</para>
|
||||||
|
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>Swap management structures are allocated through a hash
|
||||||
|
table rather than a linear array giving them a fixed allocation
|
||||||
|
size and much finer granularity.</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>Rather then using a linearly linked list to keep track of
|
||||||
|
swap space reservations, it now uses a bitmap of swap blocks
|
||||||
|
arranged in a radix tree structure with free-space hinting in
|
||||||
|
the radix node structures. This effectively makes swap
|
||||||
|
allocation and freeing an O(1) operation.</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>The entire radix tree bitmap is also preallocated in
|
||||||
|
order to avoid having to allocate kernel memory during critical
|
||||||
|
low memory swapping operations. After all, the system tends to
|
||||||
|
swap when it is low on memory so we should avoid allocating
|
||||||
|
kernel memory at such times in order to avoid potential
|
||||||
|
deadlocks.</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>To reduce fragmentation the radix tree is capable
|
||||||
|
of allocating large contiguous chunks at once, skipping over
|
||||||
|
smaller fragmented chunks.</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
|
||||||
|
<para>I did not take the final step of having an
|
||||||
|
<quote>allocating hint pointer</quote> that would trundle
|
||||||
through a portion of swap as allocations were made in order to further
|
through a portion of swap as allocations were made in order to further
|
||||||
guarantee contiguous allocations or at least locality of reference, but
|
guarantee contiguous allocations or at least locality of reference, but
|
||||||
I ensured that such an addition could be made.</para>
|
I ensured that such an addition could be made.</para>
|
||||||
|
|
@ -431,7 +487,9 @@
|
||||||
systems with very low cache queue counts and high active queue counts
|
systems with very low cache queue counts and high active queue counts
|
||||||
when doing a <command>systat -vm</command> command. As the VM system
|
when doing a <command>systat -vm</command> command. As the VM system
|
||||||
becomes more stressed, it makes a greater effort to maintain the various
|
becomes more stressed, it makes a greater effort to maintain the various
|
||||||
page queues at the levels determined to be the most effective. An urban
|
page queues at the levels determined to be the most effective.</para>
|
||||||
|
|
||||||
|
<para>An urban
|
||||||
myth has circulated for years that Linux did a better job avoiding
|
myth has circulated for years that Linux did a better job avoiding
|
||||||
swapouts than &os;, but this in fact is not true. What was actually
|
swapouts than &os;, but this in fact is not true. What was actually
|
||||||
occurring was that &os; was proactively paging out unused pages in
|
occurring was that &os; was proactively paging out unused pages in
|
||||||
|
|
@ -623,6 +681,12 @@
|
||||||
|
|
||||||
<qandaentry>
|
<qandaentry>
|
||||||
<question>
|
<question>
|
||||||
|
<para>How is the separation of clean and dirty (inactive) pages
|
||||||
|
related to the situation where you see low cache queue counts and
|
||||||
|
high active queue counts in <command>systat -vm</command>? Do the
|
||||||
|
systat stats roll the active and dirty pages together for the
|
||||||
|
active queue count?</para>
|
||||||
|
|
||||||
<para>I do not get the following:</para>
|
<para>I do not get the following:</para>
|
||||||
|
|
||||||
<blockquote>
|
<blockquote>
|
||||||
|
|
@ -635,12 +699,6 @@
|
||||||
cache queue counts and high active queue counts when doing a
|
cache queue counts and high active queue counts when doing a
|
||||||
<command>systat -vm</command> command.</para>
|
<command>systat -vm</command> command.</para>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
|
|
||||||
<para>How is the separation of clean and dirty (inactive) pages
|
|
||||||
related to the situation where you see low cache queue counts and
|
|
||||||
high active queue counts in <command>systat -vm</command>? Do the
|
|
||||||
systat stats roll the active and dirty pages together for the
|
|
||||||
active queue count?</para>
|
|
||||||
</question>
|
</question>
|
||||||
|
|
||||||
<answer>
|
<answer>
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue