Improve the HAST section of the Handbook.

Approved by:	bcr (mentor)
This commit is contained in:
Warren Block 2011-12-05 23:46:43 +00:00
parent 2026567af5
commit d5795b93d1
Notes: svn2git 2020-12-08 03:00:23 +00:00
svn path=/head/; revision=38003

View file

@ -4038,7 +4038,7 @@ Device 1K-blocks Used Avail Capacity
<sect2>
<title>Synopsis</title>
<para>High-availability is one of the main requirements in serious
<para>High availability is one of the main requirements in serious
business applications and highly-available storage is a key
component in such environments. Highly Available STorage, or
<acronym>HAST<remark role="acronym">Highly Available
@ -4109,7 +4109,7 @@ Device 1K-blocks Used Avail Capacity
drives.</para>
</listitem>
<listitem>
<para>File system agnostic, thus allowing to use any file
<para>File system agnostic; works with any file
system supported by &os;.</para>
</listitem>
<listitem>
@ -4152,7 +4152,7 @@ Device 1K-blocks Used Avail Capacity
total.</para>
</note>
<para>Since the <acronym>HAST</acronym> works in
<para>Since <acronym>HAST</acronym> works in a
primary-secondary configuration, it allows only one of the
cluster nodes to be active at any given time. The
<literal>primary</literal> node, also called
@ -4175,7 +4175,7 @@ Device 1K-blocks Used Avail Capacity
</itemizedlist>
<para><acronym>HAST</acronym> operates synchronously on a block
level, which makes it transparent for file systems and
level, making it transparent to file systems and
applications. <acronym>HAST</acronym> provides regular GEOM
providers in <filename class="directory">/dev/hast/</filename>
directory for use by other tools or applications, thus there is
@ -4252,7 +4252,7 @@ Device 1K-blocks Used Avail Capacity
For stripped-down systems, make sure this module is available.
Alternatively, it is possible to build
<literal>GEOM_GATE</literal> support into the kernel
statically, by adding the following line to the custom kernel
statically, by adding this line to the custom kernel
configuration file:</para>
<programlisting>options GEOM_GATE</programlisting>
@ -4290,10 +4290,10 @@ Device 1K-blocks Used Avail Capacity
class="directory">/dev/hast/</filename>) will be called
<filename><replaceable>test</replaceable></filename>.</para>
<para>The configuration of <acronym>HAST</acronym> is being done
<para>Configuration of <acronym>HAST</acronym> is done
in the <filename>/etc/hast.conf</filename> file. This file
should be the same on both nodes. The simplest configuration
possible is following:</para>
possible is:</para>
<programlisting>resource test {
on hasta {
@ -4317,9 +4317,9 @@ Device 1K-blocks Used Avail Capacity
alternatively in the local <acronym>DNS</acronym>.</para>
</tip>
<para>Now that the configuration exists on both nodes, it is
possible to create the <acronym>HAST</acronym> pool. Run the
following commands on both nodes to place the initial metadata
<para>Now that the configuration exists on both nodes,
the <acronym>HAST</acronym> pool can be created. Run these
commands on both nodes to place the initial metadata
onto the local disk, and start the &man.hastd.8; daemon:</para>
<screen>&prompt.root; <userinput>hastctl create test</userinput>
@ -4334,52 +4334,52 @@ Device 1K-blocks Used Avail Capacity
available.</para>
</note>
<para>HAST is not responsible for selecting node's role
(<literal>primary</literal> or <literal>secondary</literal>).
Node's role has to be configured by an administrator or other
software like <application>Heartbeat</application> using the
<para>A HAST node's role (<literal>primary</literal> or
<literal>secondary</literal>) is selected by an administrator
or other
software like <application>Heartbeat</application> using the
&man.hastctl.8; utility. Move to the primary node
(<literal><replaceable>hasta</replaceable></literal>) and
issue the following command:</para>
issue this command:</para>
<screen>&prompt.root; <userinput>hastctl role primary test</userinput></screen>
<para>Similarly, run the following command on the secondary node
<para>Similarly, run this command on the secondary node
(<literal><replaceable>hastb</replaceable></literal>):</para>
<screen>&prompt.root; <userinput>hastctl role secondary test</userinput></screen>
<caution>
<para>It may happen that both of the nodes are not able to
communicate with each other and both are configured as
primary nodes; the consequence of this condition is called
<literal>split-brain</literal>. In order to troubleshoot
<para>When the nodes are unable to
communicate with each other, and both are configured as
primary nodes, the condition is called
<literal>split-brain</literal>. To troubleshoot
this situation, follow the steps described in <xref
linkend="disks-hast-sb">.</para>
</caution>
<para>It is possible to verify the result with the
<para>Verify the result with the
&man.hastctl.8; utility on each node:</para>
<screen>&prompt.root; <userinput>hastctl status test</userinput></screen>
<para>The important text is the <literal>status</literal> line
from its output and it should say <literal>complete</literal>
<para>The important text is the <literal>status</literal> line,
which should say <literal>complete</literal>
on each of the nodes. If it says <literal>degraded</literal>,
something went wrong. At this point, the synchronization
between the nodes has already started. The synchronization
completes when the <command>hastctl status</command> command
completes when <command>hastctl status</command>
reports 0 bytes of <literal>dirty</literal> extents.</para>
<para>The last step is to create a filesystem on the
<para>The next step is to create a filesystem on the
<devicename>/dev/hast/<replaceable>test</replaceable></devicename>
GEOM provider and mount it. This has to be done on the
<literal>primary</literal> node (as the
GEOM provider and mount it. This must be done on the
<literal>primary</literal> node, as
<filename>/dev/hast/<replaceable>test</replaceable></filename>
appears only on the <literal>primary</literal> node), and
it can take a few minutes depending on the size of the hard
drive:</para>
appears only on the <literal>primary</literal> node.
Creating the filesystem can take a few minutes, depending on the
size of the hard drive:</para>
<screen>&prompt.root; <userinput>newfs -U /dev/hast/test</userinput>
&prompt.root; <userinput>mkdir /hast/test</userinput>
@ -4387,9 +4387,9 @@ Device 1K-blocks Used Avail Capacity
<para>Once the <acronym>HAST</acronym> framework is configured
properly, the final step is to make sure that
<acronym>HAST</acronym> is started during the system boot time
automatically. The following line should be added to the
<filename>/etc/rc.conf</filename> file:</para>
<acronym>HAST</acronym> is started automatically during the system
boot. Add this line to
<filename>/etc/rc.conf</filename>:</para>
<programlisting>hastd_enable="YES"</programlisting>
@ -4397,26 +4397,25 @@ Device 1K-blocks Used Avail Capacity
<title>Failover Configuration</title>
<para>The goal of this example is to build a robust storage
system which is resistant from the failures of any given node.
The key task here is to remedy a scenario when a
<literal>primary</literal> node of the cluster fails. Should
it happen, the <literal>secondary</literal> node is there to
system which is resistant to the failure of any given node.
The scenario is that a
<literal>primary</literal> node of the cluster fails. If
this happens, the <literal>secondary</literal> node is there to
take over seamlessly, check and mount the file system, and
continue to work without missing a single bit of data.</para>
<para>In order to accomplish this task, it will be required to
utilize another feature available under &os; which provides
<para>To accomplish this task, another &os; feature provides
for automatic failover on the IP layer &mdash;
<acronym>CARP</acronym>. <acronym>CARP</acronym> stands for
Common Address Redundancy Protocol and allows multiple hosts
<acronym>CARP</acronym>. <acronym>CARP</acronym> (Common Address
Redundancy Protocol) allows multiple hosts
on the same network segment to share an IP address. Set up
<acronym>CARP</acronym> on both nodes of the cluster according
to the documentation available in <xref linkend="carp">.
After completing this task, each node should have its own
After setup, each node will have its own
<devicename>carp0</devicename> interface with a shared IP
address <replaceable>172.16.0.254</replaceable>.
Obviously, the primary <acronym>HAST</acronym> node of the
cluster has to be the master <acronym>CARP</acronym>
The primary <acronym>HAST</acronym> node of the
cluster must be the master <acronym>CARP</acronym>
node.</para>
<para>The <acronym>HAST</acronym> pool created in the previous
@ -4430,17 +4429,17 @@ Device 1K-blocks Used Avail Capacity
<para>In the event of <acronym>CARP</acronym> interfaces going
up or down, the &os; operating system generates a &man.devd.8;
event, which makes it possible to watch for the state changes
event, making it possible to watch for the state changes
on the <acronym>CARP</acronym> interfaces. A state change on
the <acronym>CARP</acronym> interface is an indication that
one of the nodes failed or came back online. In such a case,
it is possible to run a particular script which will
automatically handle the failover.</para>
one of the nodes failed or came back online. These state change
events make it possible to run a script which will
automatically handle the HAST failover.</para>
<para>To be able to catch the state changes on the
<acronym>CARP</acronym> interfaces, the following
configuration has to be added to the
<filename>/etc/devd.conf</filename> file on each node:</para>
<para>To be able to catch state changes on the
<acronym>CARP</acronym> interfaces, add this
configuration to
<filename>/etc/devd.conf</filename> on each node:</para>
<programlisting>notify 30 {
match "system" "IFNET";
@ -4456,12 +4455,12 @@ notify 30 {
action "/usr/local/sbin/carp-hast-switch slave";
};</programlisting>
<para>To put the new configuration into effect, run the
following command on both nodes:</para>
<para>Restart &man.devd.8; on both nodes to put the new configuration
into effect:</para>
<screen>&prompt.root; <userinput>/etc/rc.d/devd restart</userinput></screen>
<para>In the event that the <devicename>carp0</devicename>
<para>When the <devicename>carp0</devicename>
interface goes up or down (i.e. the interface state changes),
the system generates a notification, allowing the &man.devd.8;
subsystem to run an arbitrary script, in this case
@ -4471,7 +4470,7 @@ notify 30 {
&man.devd.8; configuration, please consult the
&man.devd.conf.5; manual page.</para>
<para>An example of such a script could be following:</para>
<para>An example of such a script could be:</para>
<programlisting>#!/bin/sh
@ -4557,13 +4556,13 @@ case "$1" in
;;
esac</programlisting>
<para>In a nutshell, the script does the following when a node
<para>In a nutshell, the script takes these actions when a node
becomes <literal>master</literal> /
<literal>primary</literal>:</para>
<itemizedlist>
<listitem>
<para>Promotes the <acronym>HAST</acronym> pools as
<para>Promotes the <acronym>HAST</acronym> pools to
primary on a given node.</para>
</listitem>
<listitem>
@ -4571,7 +4570,7 @@ esac</programlisting>
<acronym>HAST</acronym> pool.</para>
</listitem>
<listitem>
<para>Mounts the pools at appropriate place.</para>
<para>Mounts the pools at an appropriate place.</para>
</listitem>
</itemizedlist>
@ -4590,15 +4589,15 @@ esac</programlisting>
<caution>
<para>Keep in mind that this is just an example script which
should serve as a proof of concept solution. It does not
should serve as a proof of concept. It does not
handle all the possible scenarios and can be extended or
altered in any way, for example it can start/stop required
services etc.</para>
services, etc.</para>
</caution>
<tip>
<para>For the purpose of this example we used a standard UFS
file system. In order to reduce the time needed for
<para>For this example, we used a standard UFS
file system. To reduce the time needed for
recovery, a journal-enabled UFS or ZFS file system can
be used.</para>
</tip>
@ -4615,41 +4614,40 @@ esac</programlisting>
<sect3>
<title>General Troubleshooting Tips</title>
<para><acronym>HAST</acronym> should be generally working
without any issues, however as with any other software
<para><acronym>HAST</acronym> should generally work
without issues. However, as with any other software
product, there may be times when it does not work as
supposed. The sources of the problems may be different, but
the rule of thumb is to ensure that the time is synchronized
between all nodes of the cluster.</para>
<para>The debugging level of the &man.hastd.8; should be
increased when troubleshooting <acronym>HAST</acronym>
problems. This can be accomplished by starting the
<para>When troubleshooting <acronym>HAST</acronym> problems,
the debugging level of &man.hastd.8; should be increased
by starting the
&man.hastd.8; daemon with the <literal>-d</literal>
argument. Note, that this argument may be specified
argument. Note that this argument may be specified
multiple times to further increase the debugging level. A
lot of useful information may be obtained this way. It
should be also considered to use <literal>-F</literal>
argument, which will start the &man.hastd.8; daemon in
lot of useful information may be obtained this way. Consider
also using the <literal>-F</literal>
argument, which starts the &man.hastd.8; daemon in the
foreground.</para>
</sect3>
<sect3 id="disks-hast-sb">
<title>Recovering from the Split-brain Condition</title>
<para>The consequence of a situation when both nodes of the
cluster are not able to communicate with each other and both
are configured as primary nodes is called
<literal>split-brain</literal>. This is a dangerous
<para><literal>Split-brain</literal> is when the nodes of the
cluster are unable to communicate with each other, and both
are configured as primary. This is a dangerous
condition because it allows both nodes to make incompatible
changes to the data. This situation has to be handled by
the system administrator manually.</para>
changes to the data. This problem must be corrected
manually by the system administrator.</para>
<para>In order to fix this situation the administrator has to
<para>The administrator must
decide which node has more important changes (or merge them
manually) and let the <acronym>HAST</acronym> perform
the full synchronization of the node which has the broken
data. To do this, issue the following commands on the node
manually) and let <acronym>HAST</acronym> perform
full synchronization of the node which has the broken
data. To do this, issue these commands on the node
which needs to be resynchronized:</para>
<screen>&prompt.root; <userinput>hastctl role init &lt;resource&gt;</userinput>