Various improvements to the use of English in the "Tuning Disks"
section. This is based on a patch I sent to -doc on 7th February in response to a message from Eric Ferguson. Reviewed By: keramida
This commit is contained in:
		
							parent
							
								
									7978d9ab3b
								
							
						
					
					
						commit
						8430c3ef83
					
				
				
				Notes:
				
					svn2git
				
				2020-12-08 03:00:23 +00:00 
				
			
			svn path=/head/; revision=12331
					 1 changed files with 71 additions and 72 deletions
				
			
		|  | @ -816,17 +816,16 @@ kern.maxfiles: 2088 -> 5000</screen> | |||
| 	</indexterm> | ||||
| 	 | ||||
| 	<para>The <varname>vfs.vmiodirenable</varname> sysctl variable | ||||
| 	  defaults to 1 (on) and may | ||||
| 	  be set to 0 (off) or 1 (on).  This parameter controls how | ||||
| 	  may be set to either 0 (off) or 1 (on); it is 1 by default.  This variable controls how | ||||
| 	  directories are cached by the system.  Most directories are | ||||
| 	  small and use but a single fragment (typically 1K) in the | ||||
| 	  filesystem and even less (typically 512 bytes) in the buffer | ||||
| 	  small, using just a single fragment (typically 1K) in the | ||||
| 	  filesystem and less (typically 512 bytes) in the buffer | ||||
| 	  cache.  However, when operating in the default mode the buffer | ||||
| 	  cache will only cache a fixed number of directories even if | ||||
| 	  you have a huge amount of memory.  Turning on this sysctl | ||||
| 	  allows the buffer cache to use the VM Page Cache to cache the | ||||
| 	  directories.  The advantage is that all of memory is now | ||||
| 	  available for caching directories.  The disadvantage is that | ||||
| 	  directories, making all the memory available for caching | ||||
| 	  directories.  However, | ||||
| 	  the minimum in-core memory used to cache a directory is the | ||||
| 	  physical page size (typically 4K) rather than 512 bytes.  We | ||||
| 	  recommend turning this option on if you are running any | ||||
|  | @ -847,15 +846,15 @@ kern.maxfiles: 2088 -> 5000</screen> | |||
| 	<para>FreeBSD 4.3 flirted with turning off IDE write caching. | ||||
| 	  This reduced write bandwidth to IDE disks but was considered | ||||
| 	  necessary due to serious data consistency issues introduced | ||||
| 	  by hard drive vendors.  Basically the problem is that IDE | ||||
| 	  by hard drive vendors.  The problem is that IDE | ||||
| 	  drives lie about when a write completes.  With IDE write | ||||
| 	  caching turned on, IDE hard drives will not only write data | ||||
| 	  to disk out of order, they will sometimes delay some of the | ||||
| 	  caching turned on, IDE hard drives not only write data | ||||
| 	  to disk out of order, but will sometimes delay writing some | ||||
| 	  blocks indefinitely when under heavy disk loads.  A crash or | ||||
| 	  power failure can result in serious filesystem corruption. | ||||
| 	  So our default was changed to be safe.  Unfortunately, the | ||||
| 	  result was such a huge loss in performance that we caved in | ||||
| 	  and changed the default back to on after the release.  You | ||||
| 	  power failure may cause serious filesystem corruption. | ||||
| 	  FreeBSD's default was changed to be safe.  Unfortunately, the | ||||
| 	  result was such a huge performance loss that we changed | ||||
| 	  write caching back to on by default after the release.  You | ||||
| 	  should check the default on your system by observing the | ||||
| 	  <varname>hw.ata.wc</varname> sysctl variable.  If IDE write | ||||
| 	  caching is turned off, you can turn it back on by setting | ||||
|  | @ -898,44 +897,44 @@ kern.maxfiles: 2088 -> 5000</screen> | |||
|         updating the physical disk.  If your system crashes you may lose more | ||||
|         work than otherwise.  Secondly, Soft Updates delays the freeing of | ||||
|         filesystem blocks.  If you have a filesystem (such as the root | ||||
|         filesystem) which is close to full, doing a major update of it, e.g. | ||||
|         <command>make installworld</command>, can run it out of space and | ||||
|         cause the update to fail.</para> | ||||
| 	filesystem) which is almost full, performing a major update, such as | ||||
|         <command>make installworld</command>, can cause the filesystem to run | ||||
| 	out of space and the update to fail.</para> | ||||
| 
 | ||||
|       <sect3> | ||||
| 	<title>More details about Soft Updates</title> | ||||
| 	 | ||||
| 	<indexterm><primary>Soft Updates (Details)</primary></indexterm> | ||||
| 
 | ||||
| 	<para>There are two classical approaches how to write metadata of | ||||
|     	  a filesystem back to disk.  (Metadata updates are updates to | ||||
| 	<para>There are two traditional approaches to writing a filesystem's meta-data | ||||
|     	  back to disk.  (Metadata updates are updates to | ||||
| 	  non-content data like i-nodes or directories.)</para> | ||||
| 	 | ||||
| 	<para>Historically, the default behaviour was to write out | ||||
| 	  metadata updates synchronously.  If a directory had been | ||||
| 	  changed, the system waited until the change was actually | ||||
| 	  written to disk.  The file data buffers (file contents) have | ||||
| 	  been passed through the buffer cache however, and backed up | ||||
| 	  written to disk.  The file data buffers (file contents) were | ||||
| 	  passed through the buffer cache and backed up | ||||
| 	  to disk later on asynchronously.  The advantage of this | ||||
| 	  implementation is that it is operating very safely.  If there is | ||||
| 	  a failure during an update the metadata are always in a | ||||
| 	  consistent state.  A file has either been completely created | ||||
| 	  implementation is that it operates safely.  If there is | ||||
| 	  a failure during an update, the meta-data are always in a | ||||
| 	  consistent state.  A file is either created completely | ||||
| 	  or not at all.  If the data blocks of a file did not find | ||||
| 	  their way out of the buffer cache onto the disk by the time | ||||
| 	  of the crash, &man.fsck.8; is able to recognize this and to | ||||
| 	  repair the filesystem (e. g. the file length will be set to | ||||
| 	  of the crash, &man.fsck.8; is able to recognize this and | ||||
| 	  repair the filesystem by setting the file length to | ||||
| 	  0).  Additionally, the implementation is clear and simple. | ||||
| 	  The disadvantage is that metadata changes are very slow.  A | ||||
| 	  <command>rm -r</command> for instance touches all files of a | ||||
| 	  directory sequentially, but every single one of these directory | ||||
| 	  changes (deletion of a file) will be written synchronously | ||||
| 	  The disadvantage is that meta-data changes are slow.  An | ||||
| 	  <command>rm -r</command>, for instance, touches all the files in a | ||||
| 	  directory sequentially, but each directory | ||||
| 	  change (deletion of a file) will be written synchronously | ||||
| 	  to the disk.  This includes updates to the directory itself, | ||||
| 	  to the i-node table, and possibly to indirect blocks | ||||
| 	  allocated by the file.  Similar considerations apply for | ||||
| 	  unrolling large hierachies (<command>tar -x</command>).</para> | ||||
| 
 | ||||
| 	<para>The second case are asynchronous metadata updates.  This | ||||
|   	  is e. g. the default for Linux/ext2fs or achieved by | ||||
| 	<para>The second case is asynchronous meta-data updates.  This | ||||
|   	  is the default for Linux/ext2fs and | ||||
|   	  <command>mount -o async</command> for *BSD ufs.  All | ||||
|   	  metadata updates are simply being passed through the buffer | ||||
|   	  cache too, that is, they will be intermixed with the updates | ||||
|  | @ -951,32 +950,32 @@ kern.maxfiles: 2088 -> 5000</screen> | |||
|   	  that updated large amounts of metadata (like a power | ||||
|   	  failure, or someone pressing the reset button), | ||||
| 	  the file system | ||||
|   	  will be left in an unpredictable state.  There is no chance | ||||
|   	  will be left in an unpredictable state.  There is no opportunity | ||||
|   	  to examine the state of the file system when the system | ||||
|   	  comes up again; the data blocks of a file could already have | ||||
|   	  been written to the disk while the updates of the i-node | ||||
|   	  table or the associated directory were not.  It is actually | ||||
|   	  impossible to implement a <command>fsck</command> which is | ||||
|   	  able to clean up the resulting chaos (because the necessary | ||||
|   	  information is just not available on the disk).  If the | ||||
|   	  information is not available on the disk).  If the | ||||
| 	  filesystem has been damaged beyond repair, the only choice | ||||
| 	  is to <command>newfs</command> it and restore it from backup. | ||||
| 	  </para> | ||||
| 
 | ||||
| 	<para>The usual solution for this problem was to implement a | ||||
| 	  <emphasis>dirty region logging</emphasis> (sometimes also | ||||
| 	  referred to as <emphasis>journalling</emphasis>, albeit that | ||||
| 	  term has not been used consistently and occasionally applied | ||||
| 	  to other forms of transaction logging as well).  Metadata | ||||
| 	  updates are still written out synchronously, but only into a | ||||
| 	  small region of the disk.  Later on they will be distributed | ||||
| 	  from there to their proper location.  Because the logging | ||||
| 	  area is only a small, contiguous region on the disk, there | ||||
| 	<para>The usual solution for this problem was to implement | ||||
| 	  <emphasis>dirty region logging</emphasis>, which is also | ||||
| 	  referred to as <emphasis>journaling</emphasis>, although that | ||||
| 	  term is not used consistently and is occasionally applied | ||||
| 	  to other forms of transaction logging as well.  Meta-data | ||||
| 	  updates are still written synchronously, but only into a | ||||
| 	  small region of the disk.  Later on they will be moved | ||||
| 	  to their proper location.  Because the logging | ||||
| 	  area is a small, contiguous region on the disk, there | ||||
| 	  are no long distances for the disk heads to move, even | ||||
| 	  during heavy operations, so these operations are accelerated | ||||
| 	  quite a bit compared to the classical synchronous updates. | ||||
| 	  during heavy operations, so these operations are quicker | ||||
| 	  than synchronous updates. | ||||
| 	  Additionally the complexity of the implementation is fairly | ||||
| 	  limited and thus the risk for bugs still low.  A disadvatage | ||||
| 	  limited, so the risk of bugs being present is low.  A disadvatage | ||||
| 	  is that all metadata are written twice (once into the | ||||
| 	  logging region and once to the proper location) so for | ||||
| 	  normal work, a performance <quote>pessimization</quote> | ||||
|  | @ -985,42 +984,42 @@ kern.maxfiles: 2088 -> 5000</screen> | |||
| 	  or completed from the logging area after the system comes | ||||
| 	  up again, resulting in a fast filesystem startup.</para> | ||||
|       | ||||
| 	<para>Now, Kirk McKusick's (the developer of Berkeley FFS) | ||||
| 	   solution to the problem are Soft Updates: all pending | ||||
| 	<para>Kirk McKusick, the developer of Berkeley FFS, | ||||
| 	   solved this problem with Soft Updates: all pending | ||||
| 	   metadata updates are kept in memory and written out to disk | ||||
| 	   in a sorted sequence (<quote>ordered metadata | ||||
| 	   updates</quote>).  This has the effect that, in case of | ||||
| 	   heavy metadata operations, later updates of a certain item | ||||
| 	   <quote>catch</quote> the earlier ones if those are still in | ||||
| 	   heavy meta-data operations, later updates to an item | ||||
| 	   <quote>catch</quote> the earlier ones if the earlier ones are still in | ||||
| 	   memory and have not already been written to disk.  So all | ||||
| 	   operations on, say, a directory are generally done still in | ||||
| 	   operations on, say, a directory are generally performed in | ||||
| 	   memory before the update is written to disk (the data | ||||
| 	   blocks are sorted to their according position as well so | ||||
| 	   blocks are sorted according to their position so | ||||
| 	   that they will not be on the disk ahead of their metadata). | ||||
| 	   In case of a crash this causes an implicit <quote>log | ||||
| 	   If the system crashes, this causes an implicit <quote>log | ||||
| 	   rewind</quote>: all operations which did not find their way | ||||
| 	   to the disk appear as if they had never happened.  A | ||||
| 	   consistent filesystem state is maintained that appears to | ||||
| 	   be the one of 30--60 seconds earlier.  The | ||||
| 	   algorithm used guarantees that all actually used resources | ||||
| 	   be the one of 30 to 60 seconds earlier.  The | ||||
| 	   algorithm used guarantees that all resources in use | ||||
| 	   are marked as such in their appropriate bitmaps: blocks and i-nodes. | ||||
| 	   After a crash, the only resource allocation error | ||||
| 	   that occur are that resources are | ||||
| 	   marked as <quote>used</quote> which actually are <quote>free</quote>. | ||||
| 	   &man.fsck.8; then recognizes this situation, | ||||
| 	   and free up those no longer used resources.  It is safe to | ||||
| 	   ignore the dirty state of the filesystem after a crash, by | ||||
| 	   that occurs is that resources are | ||||
| 	   marked as <quote>used</quote> which are actually <quote>free</quote>. | ||||
| 	   &man.fsck.8; recognizes this situation, | ||||
| 	   and frees the resources that are no longer used.  It is safe to | ||||
| 	   ignore the dirty state of the filesystem after a crash by | ||||
| 	   forcibly mounting it with <command>mount -f</command>.  In | ||||
| 	   order to free up possibly unused resources, &man.fsck.8; | ||||
| 	   order to free up resources that may be unused, &man.fsck.8; | ||||
| 	   needs to be run at a later time.  This is the idea behind | ||||
| 	   the <emphasis>background fsck</emphasis>: at system startup | ||||
| 	   time, only a <emphasis>snapshot</emphasis> from the | ||||
| 	   filesystem is recorded, that <command>fsck</command> can be | ||||
| 	   run against later on.  All filesystems can then be mounted | ||||
| 	   <quote>dirty</quote>, and system startup proceeds to | ||||
| 	   time, only a <emphasis>snapshot</emphasis> of the | ||||
| 	   filesystem is recorded, the <command>fsck</command> can be | ||||
| 	   run later on.  All filesystems can then be mounted | ||||
| 	   <quote>dirty</quote>, so the system startup proceeds in | ||||
| 	   multiuser mode.  Then, background <command>fsck</command>s | ||||
| 	   will be scheduled for all filesystems that need it, to free | ||||
| 	   up possibly unused resources.  (Filesystems that do not use | ||||
| 	   will be scheduled for all filesystems where this is required, to free | ||||
| 	   resources that may be unused.  (Filesystems that do not use | ||||
| 	   soft updates still need the usual foreground | ||||
| 	   <command>fsck</command> though.)</para> | ||||
| 
 | ||||
|  | @ -1031,18 +1030,18 @@ kern.maxfiles: 2088 -> 5000</screen> | |||
| 	   the code (implying a higher risk for bugs in an area that | ||||
| 	   is highly sensitive regarding loss of user data), and a | ||||
| 	   higher memory consumption.  Additionally there are some | ||||
| 	   <quote>idiosyncrasies</quote> one has to get used to. | ||||
| 	   idiosyncrasies one has to get used to. | ||||
| 	   After a crash, the state of the filesystem appears to be | ||||
| 	   somewhat <quote>older</quote>; e. g. in situations where | ||||
| 	   somewhat <quote>older</quote>.  In situations where | ||||
| 	   the standard synchronous approach would have caused some | ||||
| 	   zero-length files to remain after the | ||||
| 	   <command>fsck</command>, these files do not exist at all | ||||
| 	   with a soft updates filesystem because neither the metadata | ||||
| 	   nor the file contents have ever been written to disk. | ||||
| 	   After a <command>rm</command>, the released disk space is | ||||
| 	   not instantly available but only after the updates have | ||||
| 	   written to disk.  This can in particular cause problems | ||||
| 	   when installing large amounts of data into a filesystem | ||||
| 	   Disk space is not released until the updates have been | ||||
| 	   written to disk, which may take place some time after | ||||
| 	   running <command>rm</command>.  This may cause problems | ||||
| 	   when installing large amounts of data on a filesystem | ||||
| 	   that does not have enough free space to hold all the files | ||||
| 	   twice.</para> | ||||
|       </sect3> | ||||
|  |  | |||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue