Editorial review of first 1/2 of HAST chapter.

Sponsored by: iXsystems
svn path=/head/; revision=44485
2014-04-08 15:18:09 +00:00 · 2014-04-08 15:18:09 +00:00 · c9c8b80069 · 2020-12-08 03:00:23 +00:00
commit c9c8b80069
parent aa33908aca
1 changed files with 141 additions and 173 deletions
--- a/en_US.ISO8859-1/books/handbook/disks/chapter.xml
+++ b/en_US.ISO8859-1/books/handbook/disks/chapter.xml
@ -3297,7 +3297,7 @@ Device          1K-blocks     Used    Avail Capacity
  <sect1 xml:id="disks-hast">
    <info>
-      <title>Highly Available Storage (HAST)</title>
+      <title>Highly Available Storage (<acronym>HAST</acronym>)</title>
      <authorgroup>
 	<author>
@ -3348,75 +3348,24 @@ Device          1K-blocks     Used    Avail Capacity
    <para>High availability is one of the main requirements in
      serious business applications and highly-available storage is a
-      key component in such environments.  Highly Available STorage,
+      key component in such environments.  In &os;, the Highly Available STorage
-      or <acronym>HAST<remark role="acronym">Highly
+      (<acronym>HAST</acronym>)
-	  Available STorage</remark></acronym>, was developed by
+      framework allows transparent storage of
      &a.pjd.email; as a framework which allows transparent storage of
      the same data across several physically separated machines
-      connected by a TCP/IP network.  <acronym>HAST</acronym> can be
+      connected by a <acronym>TCP/IP</acronym> network.  <acronym>HAST</acronym> can be
      understood as a network-based RAID1 (mirror), and is similar to
-      the DRBD&reg; storage system known from the GNU/&linux;
+      the DRBD&reg; storage system used in the GNU/&linux;
      platform.  In combination with other high-availability features
      of &os; like <acronym>CARP</acronym>, <acronym>HAST</acronym>
      makes it possible to build a highly-available storage cluster
      that is resistant to hardware failures.</para>
-    <para>After reading this section, you will know:</para>
+      <para>The following are the main features of
-
+	<acronym>HAST</acronym>:</para>
    <itemizedlist>
      <listitem>
 	<para>What <acronym>HAST</acronym> is, how it works and
 	  which features it provides.</para>
      </listitem>
      <listitem>
 	<para>How to set up and use <acronym>HAST</acronym> on
 	  &os;.</para>
      </listitem>
      <listitem>
 	<para>How to integrate <acronym>CARP</acronym> and
 	  &man.devd.8; to build a robust storage system.</para>
      </listitem>
    </itemizedlist>
    <para>Before reading this section, you should:</para>
    <itemizedlist>
      <listitem>
 	<para>Understand &unix; and <link
 	    linkend="basics">&os; basics</link>.</para>
      </listitem>
      <listitem>
 	<para>Know how to <link
 	    linkend="config-tuning">configure</link> network
 	  interfaces and other core &os; subsystems.</para>
      </listitem>
      <listitem>
 	<para>Have a good understanding of <link
 	    linkend="network-communication">&os;
 	  networking</link>.</para>
      </listitem>
    </itemizedlist>
    <para>The <acronym>HAST</acronym> project was sponsored by The
      &os; Foundation with support from <link
 	xlink:href="http://www.omc.net/">OMCnet Internet Service
 	GmbH</link> and <link
 	xlink:href="http://www.transip.nl/">TransIP
 	BV</link>.</para>
    <sect2>
      <title>HAST Features</title>
      <para>The main features of the <acronym>HAST</acronym> system
 	are:</para>
      <itemizedlist>
 	<listitem>
-	  <para>Can be used to mask I/O errors on local hard
+	  <para>Can be used to mask <acronym>I/O</acronym> errors on local hard
 	    drives.</para>
 	</listitem>
@ -3426,9 +3375,9 @@ Device          1K-blocks     Used    Avail Capacity
 	</listitem>
 	<listitem>
-	  <para>Efficient and quick resynchronization, synchronizing
+	  <para>Efficient and quick resynchronization as
-	    only blocks that were modified during the downtime of a
+	    only the blocks that were modified during the downtime of a
-	    node.</para>
+	    node are synchronized.</para>
 	</listitem>
 	<!--
@ -3450,64 +3399,94 @@ Device          1K-blocks     Used    Avail Capacity
 	    system.</para>
 	</listitem>
      </itemizedlist>
-    </sect2>
+
    <para>After reading this section, you will know:</para>
    <itemizedlist>
      <listitem>
 	<para>What <acronym>HAST</acronym> is, how it works, and
 	  which features it provides.</para>
      </listitem>
      <listitem>
 	<para>How to set up and use <acronym>HAST</acronym> on
 	  &os;.</para>
      </listitem>
      <listitem>
 	<para>How to integrate <acronym>CARP</acronym> and
 	  &man.devd.8; to build a robust storage system.</para>
      </listitem>
    </itemizedlist>
    <para>Before reading this section, you should:</para>
    <itemizedlist>
      <listitem>
 	<para>Understand &unix; and &os; basics (<xref
 	    linkend="basics"/>).</para>
      </listitem>
      <listitem>
 	<para>Know how to configure network
 	  interfaces and other core &os; subsystems (<xref
 	    linkend="config-tuning"/>).</para>
      </listitem>
      <listitem>
 	<para>Have a good understanding of &os;
 	  networking (<xref
 	    linkend="network-communication"/>).</para>
      </listitem>
    </itemizedlist>
    <para>The <acronym>HAST</acronym> project was sponsored by The
      &os; Foundation with support from <link
 	xlink:href="http://www.omc.net/">http://www.omc.net/</link> and <link
 	xlink:href="http://www.transip.nl/">http://www.transip.nl/</link>.</para>
    <sect2>
      <title>HAST Operation</title>
-      <para>As <acronym>HAST</acronym> provides a synchronous
+      <para><acronym>HAST</acronym> provides synchronous
-	block-level replication of any storage media to several
+	block-level replication between two
-	machines, it requires at least two physical machines:
+	physical machines:
-	the <literal>primary</literal>, also known as the
+	the <emphasis>primary</emphasis>, also known as the
-	<literal>master</literal> node, and the
+	<emphasis>master</emphasis> node, and the
-	<literal>secondary</literal> or <literal>slave</literal>
+	<emphasis>secondary</emphasis>, or <emphasis>slave</emphasis>
 	node.  These two machines together are referred to as a
 	cluster.</para>
      <note>
 	<para>HAST is currently limited to two cluster nodes in
 	  total.</para>
      </note>
      <para>Since <acronym>HAST</acronym> works in a
 	primary-secondary configuration, it allows only one of the
 	cluster nodes to be active at any given time.  The
-	<literal>primary</literal> node, also called
+	primary node, also called
-	<literal>active</literal>, is the one which will handle all
+	<emphasis>active</emphasis>, is the one which will handle all
-	the I/O requests to <acronym>HAST</acronym>-managed
+	the <acronym>I/O</acronym> requests to <acronym>HAST</acronym>-managed
-	devices.  The <literal>secondary</literal> node is
+	devices.  The secondary node is
-	automatically synchronized from the <literal>primary</literal>
+	automatically synchronized from the primary
 	node.</para>
      <para>The physical components of the <acronym>HAST</acronym>
-	system are:</para>
+	system are the local disk on primary node, and the
-
+	disk on the remote, secondary node.</para>
      <itemizedlist>
 	<listitem>
 	  <para>local disk on primary node, and</para>
 	</listitem>
 	<listitem>
 	  <para>disk on remote, secondary node.</para>
 	</listitem>
      </itemizedlist>
      <para><acronym>HAST</acronym> operates synchronously on a block
 	level, making it transparent to file systems and applications.
 	<acronym>HAST</acronym> provides regular GEOM providers in
 	<filename>/dev/hast/</filename> for use by
-	other tools or applications, thus there is no difference
+	other tools or applications.  There is no difference
 	between using <acronym>HAST</acronym>-provided devices and
 	raw disks or partitions.</para>
-      <para>Each write, delete, or flush operation is sent to the
+      <para>Each write, delete, or flush operation is sent to both the
-	local disk and to the remote disk over TCP/IP.  Each read
+	local disk and to the remote disk over <acronym>TCP/IP</acronym>.  Each read
 	operation is served from the local disk, unless the local disk
-	is not up-to-date or an I/O error occurs.  In such case, the
+	is not up-to-date or an <acronym>I/O</acronym> error occurs.  In such cases, the
 	read operation is sent to the secondary node.</para>
      <para><acronym>HAST</acronym> tries to provide fast failure
-	recovery.  For this reason, it is very important to reduce
+	recovery.  For this reason, it is important to reduce
 	synchronization time after a node's outage.  To provide fast
 	synchronization, <acronym>HAST</acronym> manages an on-disk
 	bitmap of dirty extents and only synchronizes those during a
@ -3520,29 +3499,29 @@ Device          1K-blocks     Used    Avail Capacity
      <itemizedlist>
 	<listitem>
-	  <para><emphasis>memsync</emphasis>: report write operation
+	  <para><emphasis>memsync</emphasis>: This mode reports a write operation
 	    as completed when the local write operation is finished
 	    and when the remote node acknowledges data arrival, but
 	    before actually storing the data.  The data on the remote
 	    node will be stored directly after sending the
 	    acknowledgement.  This mode is intended to reduce
-	    latency, but still provides very good
+	    latency, but still provides good
 	    reliability.</para>
 	</listitem>
 	<listitem>
-	  <para><emphasis>fullsync</emphasis>: report write
+	  <para><emphasis>fullsync</emphasis>: This mode reports a write
-	    operation as completed when local write completes and
+	    operation as completed when both the local write and the
-	    when remote write completes.  This is the safest and the
+	    remote write complete.  This is the safest and the
 	    slowest replication mode.  This mode is the
 	    default.</para>
 	</listitem>
 	<listitem>
-	  <para><emphasis>async</emphasis>: report write operation as
+	  <para><emphasis>async</emphasis>: This mode reports a write operation as
-	    completed when local write completes.  This is the
+	    completed when the local write completes.  This is the
 	    fastest and the most dangerous replication mode.  It
-	    should be used when replicating to a distant node where
+	    should only be used when replicating to a distant node where
 	    latency is too high for other modes.</para>
 	</listitem>
      </itemizedlist>
@ -3551,65 +3530,64 @@ Device          1K-blocks     Used    Avail Capacity
    <sect2>
      <title>HAST Configuration</title>
      <para><acronym>HAST</acronym> requires
 	<literal>GEOM_GATE</literal> support which is not present in
 	the default <literal>GENERIC</literal> kernel.  However, the
 	<varname>geom_gate.ko</varname> loadable module is available
 	in the default &os; installation.  Alternatively, to build
 	<literal>GEOM_GATE</literal> support into the kernel
 	statically, add this line to the custom kernel configuration
 	file:</para>
      <programlisting>options	GEOM_GATE</programlisting>
      <para>The <acronym>HAST</acronym> framework consists of several
-	parts from the operating system's point of view:</para>
+	components:</para>
      <itemizedlist>
 	<listitem>
-	  <para>the &man.hastd.8; daemon responsible for data
+	  <para>The &man.hastd.8; daemon which provides data
-	    synchronization,</para>
+	    synchronization.  When this daemon is started, it will
 	    automatically load <varname>geom_gate.ko</varname>.</para>
 	</listitem>
 	<listitem>
-	  <para>the &man.hastctl.8; userland management
+	  <para>The userland management
-	    utility,</para>
+	    utility, &man.hastctl.8;.</para>
 	</listitem>
 	<listitem>
-	  <para>and the &man.hast.conf.5; configuration file.</para>
+	  <para>The &man.hast.conf.5; configuration file.  This file
 	    must exist before starting
 	    <application>hastd</application>.</para>
 	</listitem>
      </itemizedlist>
      <para>Users who prefer to statically build
 	<literal>GEOM_GATE</literal> support into the kernel
 	should add this line to the custom kernel configuration
 	file, then rebuild the kernel using the instructions in <xref
 	  linkend="kernelconfig"/>:</para>
      <programlisting>options	GEOM_GATE</programlisting>
      <para>The following example describes how to configure two nodes
-	in <literal>master</literal>-<literal>slave</literal> /
+	in master-slave/primary-secondary
 	<literal>primary</literal>-<literal>secondary</literal>
 	operation using <acronym>HAST</acronym> to replicate the data
 	between the two.  The nodes will be called
-	<literal><replaceable>hasta</replaceable></literal> with an IP address of
+	<literal>hasta</literal>, with an <acronym>IP</acronym> address of
-	<replaceable>172.16.0.1</replaceable> and
+	<literal>172.16.0.1</literal>, and
-	<literal><replaceable>hastb</replaceable></literal> with an IP of address
+	<literal>hastb</literal>, with an <acronym>IP</acronym> of address
-	<replaceable>172.16.0.2</replaceable>.  Both nodes will have a
+	<literal>172.16.0.2</literal>.  Both nodes will have a
-	dedicated hard drive <filename>/dev/<replaceable>ad6</replaceable></filename> of the same
+	dedicated hard drive <filename>/dev/ad6</filename> of the same
 	size for <acronym>HAST</acronym> operation.  The
-	<acronym>HAST</acronym> pool, sometimes also referred to as a
+	<acronym>HAST</acronym> pool, sometimes referred to as a
-	resource or the GEOM provider in
+	resource or the <acronym>GEOM</acronym> provider in
 	<filename class="directory">/dev/hast/</filename>, will be called
-	<filename><replaceable>test</replaceable></filename>.</para>
+	<literal>test</literal>.</para>
      <para>Configuration of <acronym>HAST</acronym> is done using
-	<filename>/etc/hast.conf</filename>.  This file should be the
+	<filename>/etc/hast.conf</filename>.  This file should be
-	same on both nodes.  The simplest configuration possible
+	identical on both nodes.  The simplest configuration
 	is:</para>
-      <programlisting>resource test {
+      <programlisting>resource <replaceable>test</replaceable> {
-	on hasta {
+	on <replaceable>hasta</replaceable> {
-		local /dev/ad6
+		local <replaceable>/dev/ad6</replaceable>
-		remote 172.16.0.2
+		remote <replaceable>172.16.0.2</replaceable>
 	}
-	on hastb {
+	on <replaceable>hastb</replaceable> {
-		local /dev/ad6
+		local <replaceable>/dev/ad6</replaceable>
-		remote 172.16.0.1
+		remote <replaceable>172.16.0.1</replaceable>
 	}
 }</programlisting>
@ -3618,18 +3596,18 @@ Device          1K-blocks     Used    Avail Capacity
      <tip>
 	<para>It is also possible to use host names in the
-	  <literal>remote</literal> statements.  In such a case, make
+	  <literal>remote</literal> statements if
-	  sure that these hosts are resolvable and are defined in
+	  the hosts are resolvable and defined either in
 	  <filename>/etc/hosts</filename> or in the local
 	  <acronym>DNS</acronym>.</para>
      </tip>
-      <para>Now that the configuration exists on both nodes,
+      <para>Once the configuration exists on both nodes,
 	the <acronym>HAST</acronym> pool can be created.  Run these
 	commands on both nodes to place the initial metadata onto the
 	local disk and to start &man.hastd.8;:</para>
-      <screen>&prompt.root; <userinput>hastctl create test</userinput>
+      <screen>&prompt.root; <userinput>hastctl create <replaceable>test</replaceable></userinput>
 &prompt.root; <userinput>service hastd onestart</userinput></screen>
      <note>
@ -3646,50 +3624,40 @@ Device          1K-blocks     Used    Avail Capacity
 	administrator, or software like
 	<application>Heartbeat</application>, using &man.hastctl.8;.
 	On the primary node,
-	<literal><replaceable>hasta</replaceable></literal>, issue
+	<literal>hasta</literal>, issue
 	this command:</para>
-      <screen>&prompt.root; <userinput>hastctl role primary test</userinput></screen>
+      <screen>&prompt.root; <userinput>hastctl role primary <replaceable>test</replaceable></userinput></screen>
-      <para>Similarly, run this command on the secondary node,
+      <para>Run this command on the secondary node,
-	<literal><replaceable>hastb</replaceable></literal>:</para>
+	<literal>hastb</literal>:</para>
-      <screen>&prompt.root; <userinput>hastctl role secondary test</userinput></screen>
+      <screen>&prompt.root; <userinput>hastctl role secondary <replaceable>test</replaceable></userinput></screen>
-      <caution>
+      <para>Verify the result by running <command>hastctl</command> on each
 	<para>When the nodes are unable to communicate with each
 	  other, and both are configured as primary nodes, the
 	  condition is called <literal>split-brain</literal>.  To
 	  troubleshoot this situation, follow the steps described in
 	  <xref linkend="disks-hast-sb"/>.</para>
      </caution>
      <para>Verify the result by running &man.hastctl.8; on each
 	node:</para>
-      <screen>&prompt.root; <userinput>hastctl status test</userinput></screen>
+      <screen>&prompt.root; <userinput>hastctl status <replaceable>test</replaceable></userinput></screen>
-      <para>The important text is the <literal>status</literal> line,
+      <para>Check the <literal>status</literal> line in the output.
-	which should say <literal>complete</literal>
+	If it says <literal>degraded</literal>,
-	on each of the nodes.  If it says <literal>degraded</literal>,
+	something is wrong with the configuration file.  It should say <literal>complete</literal>
-	something went wrong.  At this point, the synchronization
+	on each node, meaning that the synchronization
-	between the nodes has already started.  The synchronization
+	between the nodes has started.  The synchronization
 	completes when <command>hastctl status</command>
 	reports 0 bytes of <literal>dirty</literal> extents.</para>
      <para>The next step is to create a file system on the
-	<filename>/dev/hast/<replaceable>test</replaceable></filename>
+	<acronym>GEOM</acronym> provider and mount it.  This must be done on the
-	GEOM provider and mount it.  This must be done on the
+	<literal>primary</literal> node.  Creating
 	<literal>primary</literal> node, as
 	<filename>/dev/hast/<replaceable>test</replaceable></filename>
 	appears only on the <literal>primary</literal> node.  Creating
 	the file system can take a few minutes, depending on the size
-	of the hard drive:</para>
+	of the hard drive.  This example creates a <acronym>UFS</acronym>
 	file system on <filename>/dev/hast/test</filename>:</para>
-      <screen>&prompt.root; <userinput>newfs -U /dev/hast/test</userinput>
+      <screen>&prompt.root; <userinput>newfs -U /dev/hast/<replaceable>test</replaceable></userinput>
-&prompt.root; <userinput>mkdir /hast/test</userinput>
+&prompt.root; <userinput>mkdir /hast/<replaceable>test</replaceable></userinput>
-&prompt.root; <userinput>mount /dev/hast/test /hast/test</userinput></screen>
+&prompt.root; <userinput>mount /dev/hast/<replaceable>test</replaceable> <replaceable>/hast/test</replaceable></userinput></screen>
      <para>Once the <acronym>HAST</acronym> framework is configured
 	properly, the final step is to make sure that