Add report on vnode cache tuning from mckusick
This commit is contained in:
parent
9cfe77d6fd
commit
dda3e6e64d
Notes:
svn2git
2020-12-08 03:00:23 +00:00
svn path=/head/; revision=47974
1 changed files with 108 additions and 0 deletions
|
|
@ -558,4 +558,112 @@
|
|||
portions and committed.</p>
|
||||
</body>
|
||||
</project>
|
||||
|
||||
<project cat='kern'>
|
||||
<title>Kernel Vnode Cache Tuning</title>
|
||||
|
||||
<contact>
|
||||
<person>
|
||||
<name>
|
||||
<given>Kirk</given>
|
||||
<common>McKusick</common>
|
||||
</name>
|
||||
<email>mckusick@mckusick.com</email>
|
||||
</person>
|
||||
|
||||
<person>
|
||||
<name>
|
||||
<given>Bruce</given>
|
||||
<common>Evans</common>
|
||||
</name>
|
||||
<email>bde@FreeBSD.org</email>
|
||||
</person>
|
||||
|
||||
<person>
|
||||
<name>
|
||||
<given>Konstantin</given>
|
||||
<common>Belousov</common>
|
||||
</name>
|
||||
<email>kib@FreeBSD.org</email>
|
||||
</person>
|
||||
|
||||
<person>
|
||||
<name>
|
||||
<given>Peter</given>
|
||||
<common>Holm</common>
|
||||
</name>
|
||||
<email>pho@FreeBSD.org</email>
|
||||
</person>
|
||||
|
||||
<person>
|
||||
<name>
|
||||
<given>Mateusz</given>
|
||||
<common>Guzik</common>
|
||||
</name>
|
||||
<email>mjg@FreeBSD.org</email>
|
||||
</person>
|
||||
</contact>
|
||||
|
||||
<links>
|
||||
<url href="https://reviews.FreeBSD.org/rS292895">MFC to stable/10</url>
|
||||
</links>
|
||||
|
||||
<body>
|
||||
<p>This completed project includes changes to better manage
|
||||
the vnode freelist and to streamline the allocation and freeing of
|
||||
vnodes.</p>
|
||||
|
||||
<p>Vnode cache recycling was reworked to meet free and unused
|
||||
vnodes targets. Free vnodes are rarely completely free; rather,
|
||||
they are just ones that are cheap to recycle. Usually they are
|
||||
for files which have been stat'd but not read; these usually have
|
||||
inode and namecache data attached to them. The free vnode target
|
||||
is the preferred minimum size of a sub-cache consisting mostly of
|
||||
such files. The system balances the size of this sub-cache with
|
||||
its complement to try to prevent either from thrashing while the
|
||||
other is relatively inactive. The targets express a preference
|
||||
for the best balance.</p>
|
||||
|
||||
<p>"Above" this target there are 2 further targets
|
||||
(watermarks) related to the recyling of free vnodes. In the
|
||||
best-operating case, the cache is exactly full, the free list has
|
||||
size between vlowat and vhiwat above the free target, and
|
||||
recycling from the free list and normal use maintains this state.
|
||||
Sometimes the free list is below vlowat or even empty, but this
|
||||
state is even better for immediate use, provided the cache is not
|
||||
full. Otherwise, vnlru_proc() runs to reclaim enough vnodes
|
||||
(usually non-free ones) to reach one of these states. The
|
||||
watermarks are currently hard-coded as 4% and 9% of the available
|
||||
space. These, and the default of 25% for wantfreevnodes, are too
|
||||
large if the memory size is large. E.g., 9% of 75% of MAXVNODES
|
||||
is more than 566000 vnodes to reclaim whenever vnlru_proc()
|
||||
becomes active.</p>
|
||||
|
||||
<p>The <tt>vfs.vlru_alloc_cache_src</tt> sysctl is removed.
|
||||
New code frees namecache sources as the last chance to satisfy the
|
||||
highest watermark, instead of selecting source vnodes randomly.
|
||||
This provides good enough behaviour to keep vn_fullpath() working
|
||||
in most situations. Filesystem layouts with deep trees, where the
|
||||
removed knob was required, is thus handled automatically.</p>
|
||||
|
||||
<p>As the kernel allocates and frees vnodes, it fully
|
||||
initializes them on every allocation and fully releases them on
|
||||
every free. These are not trivial costs: it starts by zeroing a
|
||||
large structure, then initializes a mutex, a lock manager lock, an
|
||||
rw lock, four lists, and six pointers. Looking at
|
||||
<tt>vfs.vnodes_created</tt>, these operations are being done
|
||||
millions of times an hour on a busy machine.</p>
|
||||
|
||||
<p>As a performance optimization, this code update uses the
|
||||
uma_init and uma_fini routines to do these initializations and
|
||||
cleanups only as the vnodes enter and leave the vnode zone. With
|
||||
this change, the initializations are done <tt>kern.maxvnodes</tt>
|
||||
times at system startup, and then only rarely again. The frees
|
||||
are done only if the vnode zone shrinks, which never happens in
|
||||
practice. For those curious about the avoided work, look at the
|
||||
vnode_init() and vnode_fini() functions in sys/kern/vfs_subr.c to
|
||||
see the code that has been removed from the main vnode
|
||||
allocation/free path.</p>
|
||||
</body>
|
||||
</project>
|
||||
</report>
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue