Various improvements to the use of English in the "Tuning Disks"

section.  This is based on a patch I sent to -doc on 7th February in
response to a message from Eric Ferguson.

Reviewed By:	keramida
This commit is contained in:
Tom Hukins 2002-02-28 22:21:43 +00:00
parent 7978d9ab3b
commit 8430c3ef83
Notes: svn2git 2020-12-08 03:00:23 +00:00
svn path=/head/; revision=12331

View file

@ -816,17 +816,16 @@ kern.maxfiles: 2088 -> 5000</screen>
</indexterm>
<para>The <varname>vfs.vmiodirenable</varname> sysctl variable
defaults to 1 (on) and may
be set to 0 (off) or 1 (on). This parameter controls how
may be set to either 0 (off) or 1 (on); it is 1 by default. This variable controls how
directories are cached by the system. Most directories are
small and use but a single fragment (typically 1K) in the
filesystem and even less (typically 512 bytes) in the buffer
small, using just a single fragment (typically 1K) in the
filesystem and less (typically 512 bytes) in the buffer
cache. However, when operating in the default mode the buffer
cache will only cache a fixed number of directories even if
you have a huge amount of memory. Turning on this sysctl
allows the buffer cache to use the VM Page Cache to cache the
directories. The advantage is that all of memory is now
available for caching directories. The disadvantage is that
directories, making all the memory available for caching
directories. However,
the minimum in-core memory used to cache a directory is the
physical page size (typically 4K) rather than 512 bytes. We
recommend turning this option on if you are running any
@ -847,15 +846,15 @@ kern.maxfiles: 2088 -> 5000</screen>
<para>FreeBSD 4.3 flirted with turning off IDE write caching.
This reduced write bandwidth to IDE disks but was considered
necessary due to serious data consistency issues introduced
by hard drive vendors. Basically the problem is that IDE
by hard drive vendors. The problem is that IDE
drives lie about when a write completes. With IDE write
caching turned on, IDE hard drives will not only write data
to disk out of order, they will sometimes delay some of the
caching turned on, IDE hard drives not only write data
to disk out of order, but will sometimes delay writing some
blocks indefinitely when under heavy disk loads. A crash or
power failure can result in serious filesystem corruption.
So our default was changed to be safe. Unfortunately, the
result was such a huge loss in performance that we caved in
and changed the default back to on after the release. You
power failure may cause serious filesystem corruption.
FreeBSD's default was changed to be safe. Unfortunately, the
result was such a huge performance loss that we changed
write caching back to on by default after the release. You
should check the default on your system by observing the
<varname>hw.ata.wc</varname> sysctl variable. If IDE write
caching is turned off, you can turn it back on by setting
@ -898,44 +897,44 @@ kern.maxfiles: 2088 -> 5000</screen>
updating the physical disk. If your system crashes you may lose more
work than otherwise. Secondly, Soft Updates delays the freeing of
filesystem blocks. If you have a filesystem (such as the root
filesystem) which is close to full, doing a major update of it, e.g.
<command>make installworld</command>, can run it out of space and
cause the update to fail.</para>
filesystem) which is almost full, performing a major update, such as
<command>make installworld</command>, can cause the filesystem to run
out of space and the update to fail.</para>
<sect3>
<title>More details about Soft Updates</title>
<indexterm><primary>Soft Updates (Details)</primary></indexterm>
<para>There are two classical approaches how to write metadata of
a filesystem back to disk. (Metadata updates are updates to
<para>There are two traditional approaches to writing a filesystem's meta-data
back to disk. (Metadata updates are updates to
non-content data like i-nodes or directories.)</para>
<para>Historically, the default behaviour was to write out
metadata updates synchronously. If a directory had been
changed, the system waited until the change was actually
written to disk. The file data buffers (file contents) have
been passed through the buffer cache however, and backed up
written to disk. The file data buffers (file contents) were
passed through the buffer cache and backed up
to disk later on asynchronously. The advantage of this
implementation is that it is operating very safely. If there is
a failure during an update the metadata are always in a
consistent state. A file has either been completely created
implementation is that it operates safely. If there is
a failure during an update, the meta-data are always in a
consistent state. A file is either created completely
or not at all. If the data blocks of a file did not find
their way out of the buffer cache onto the disk by the time
of the crash, &man.fsck.8; is able to recognize this and to
repair the filesystem (e. g. the file length will be set to
of the crash, &man.fsck.8; is able to recognize this and
repair the filesystem by setting the file length to
0). Additionally, the implementation is clear and simple.
The disadvantage is that metadata changes are very slow. A
<command>rm -r</command> for instance touches all files of a
directory sequentially, but every single one of these directory
changes (deletion of a file) will be written synchronously
The disadvantage is that meta-data changes are slow. An
<command>rm -r</command>, for instance, touches all the files in a
directory sequentially, but each directory
change (deletion of a file) will be written synchronously
to the disk. This includes updates to the directory itself,
to the i-node table, and possibly to indirect blocks
allocated by the file. Similar considerations apply for
unrolling large hierachies (<command>tar -x</command>).</para>
<para>The second case are asynchronous metadata updates. This
is e. g. the default for Linux/ext2fs or achieved by
<para>The second case is asynchronous meta-data updates. This
is the default for Linux/ext2fs and
<command>mount -o async</command> for *BSD ufs. All
metadata updates are simply being passed through the buffer
cache too, that is, they will be intermixed with the updates
@ -951,32 +950,32 @@ kern.maxfiles: 2088 -> 5000</screen>
that updated large amounts of metadata (like a power
failure, or someone pressing the reset button),
the file system
will be left in an unpredictable state. There is no chance
will be left in an unpredictable state. There is no opportunity
to examine the state of the file system when the system
comes up again; the data blocks of a file could already have
been written to the disk while the updates of the i-node
table or the associated directory were not. It is actually
impossible to implement a <command>fsck</command> which is
able to clean up the resulting chaos (because the necessary
information is just not available on the disk). If the
information is not available on the disk). If the
filesystem has been damaged beyond repair, the only choice
is to <command>newfs</command> it and restore it from backup.
</para>
<para>The usual solution for this problem was to implement a
<emphasis>dirty region logging</emphasis> (sometimes also
referred to as <emphasis>journalling</emphasis>, albeit that
term has not been used consistently and occasionally applied
to other forms of transaction logging as well). Metadata
updates are still written out synchronously, but only into a
small region of the disk. Later on they will be distributed
from there to their proper location. Because the logging
area is only a small, contiguous region on the disk, there
<para>The usual solution for this problem was to implement
<emphasis>dirty region logging</emphasis>, which is also
referred to as <emphasis>journaling</emphasis>, although that
term is not used consistently and is occasionally applied
to other forms of transaction logging as well. Meta-data
updates are still written synchronously, but only into a
small region of the disk. Later on they will be moved
to their proper location. Because the logging
area is a small, contiguous region on the disk, there
are no long distances for the disk heads to move, even
during heavy operations, so these operations are accelerated
quite a bit compared to the classical synchronous updates.
during heavy operations, so these operations are quicker
than synchronous updates.
Additionally the complexity of the implementation is fairly
limited and thus the risk for bugs still low. A disadvatage
limited, so the risk of bugs being present is low. A disadvatage
is that all metadata are written twice (once into the
logging region and once to the proper location) so for
normal work, a performance <quote>pessimization</quote>
@ -985,42 +984,42 @@ kern.maxfiles: 2088 -> 5000</screen>
or completed from the logging area after the system comes
up again, resulting in a fast filesystem startup.</para>
<para>Now, Kirk McKusick's (the developer of Berkeley FFS)
solution to the problem are Soft Updates: all pending
<para>Kirk McKusick, the developer of Berkeley FFS,
solved this problem with Soft Updates: all pending
metadata updates are kept in memory and written out to disk
in a sorted sequence (<quote>ordered metadata
updates</quote>). This has the effect that, in case of
heavy metadata operations, later updates of a certain item
<quote>catch</quote> the earlier ones if those are still in
heavy meta-data operations, later updates to an item
<quote>catch</quote> the earlier ones if the earlier ones are still in
memory and have not already been written to disk. So all
operations on, say, a directory are generally done still in
operations on, say, a directory are generally performed in
memory before the update is written to disk (the data
blocks are sorted to their according position as well so
blocks are sorted according to their position so
that they will not be on the disk ahead of their metadata).
In case of a crash this causes an implicit <quote>log
If the system crashes, this causes an implicit <quote>log
rewind</quote>: all operations which did not find their way
to the disk appear as if they had never happened. A
consistent filesystem state is maintained that appears to
be the one of 30--60 seconds earlier. The
algorithm used guarantees that all actually used resources
be the one of 30 to 60 seconds earlier. The
algorithm used guarantees that all resources in use
are marked as such in their appropriate bitmaps: blocks and i-nodes.
After a crash, the only resource allocation error
that occur are that resources are
marked as <quote>used</quote> which actually are <quote>free</quote>.
&man.fsck.8; then recognizes this situation,
and free up those no longer used resources. It is safe to
ignore the dirty state of the filesystem after a crash, by
that occurs is that resources are
marked as <quote>used</quote> which are actually <quote>free</quote>.
&man.fsck.8; recognizes this situation,
and frees the resources that are no longer used. It is safe to
ignore the dirty state of the filesystem after a crash by
forcibly mounting it with <command>mount -f</command>. In
order to free up possibly unused resources, &man.fsck.8;
order to free up resources that may be unused, &man.fsck.8;
needs to be run at a later time. This is the idea behind
the <emphasis>background fsck</emphasis>: at system startup
time, only a <emphasis>snapshot</emphasis> from the
filesystem is recorded, that <command>fsck</command> can be
run against later on. All filesystems can then be mounted
<quote>dirty</quote>, and system startup proceeds to
time, only a <emphasis>snapshot</emphasis> of the
filesystem is recorded, the <command>fsck</command> can be
run later on. All filesystems can then be mounted
<quote>dirty</quote>, so the system startup proceeds in
multiuser mode. Then, background <command>fsck</command>s
will be scheduled for all filesystems that need it, to free
up possibly unused resources. (Filesystems that do not use
will be scheduled for all filesystems where this is required, to free
resources that may be unused. (Filesystems that do not use
soft updates still need the usual foreground
<command>fsck</command> though.)</para>
@ -1031,18 +1030,18 @@ kern.maxfiles: 2088 -> 5000</screen>
the code (implying a higher risk for bugs in an area that
is highly sensitive regarding loss of user data), and a
higher memory consumption. Additionally there are some
<quote>idiosyncrasies</quote> one has to get used to.
idiosyncrasies one has to get used to.
After a crash, the state of the filesystem appears to be
somewhat <quote>older</quote>; e. g. in situations where
somewhat <quote>older</quote>. In situations where
the standard synchronous approach would have caused some
zero-length files to remain after the
<command>fsck</command>, these files do not exist at all
with a soft updates filesystem because neither the metadata
nor the file contents have ever been written to disk.
After a <command>rm</command>, the released disk space is
not instantly available but only after the updates have
written to disk. This can in particular cause problems
when installing large amounts of data into a filesystem
Disk space is not released until the updates have been
written to disk, which may take place some time after
running <command>rm</command>. This may cause problems
when installing large amounts of data on a filesystem
that does not have enough free space to hold all the files
twice.</para>
</sect3>