Improve the HAST section of the Handbook.
Approved by: bcr (mentor)
This commit is contained in:
parent
2026567af5
commit
d5795b93d1
Notes:
svn2git
2020-12-08 03:00:23 +00:00
svn path=/head/; revision=38003
1 changed files with 80 additions and 82 deletions
|
|
@ -4038,7 +4038,7 @@ Device 1K-blocks Used Avail Capacity
|
|||
<sect2>
|
||||
<title>Synopsis</title>
|
||||
|
||||
<para>High-availability is one of the main requirements in serious
|
||||
<para>High availability is one of the main requirements in serious
|
||||
business applications and highly-available storage is a key
|
||||
component in such environments. Highly Available STorage, or
|
||||
<acronym>HAST<remark role="acronym">Highly Available
|
||||
|
|
@ -4109,7 +4109,7 @@ Device 1K-blocks Used Avail Capacity
|
|||
drives.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>File system agnostic, thus allowing to use any file
|
||||
<para>File system agnostic; works with any file
|
||||
system supported by &os;.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
|
|
@ -4152,7 +4152,7 @@ Device 1K-blocks Used Avail Capacity
|
|||
total.</para>
|
||||
</note>
|
||||
|
||||
<para>Since the <acronym>HAST</acronym> works in
|
||||
<para>Since <acronym>HAST</acronym> works in a
|
||||
primary-secondary configuration, it allows only one of the
|
||||
cluster nodes to be active at any given time. The
|
||||
<literal>primary</literal> node, also called
|
||||
|
|
@ -4175,7 +4175,7 @@ Device 1K-blocks Used Avail Capacity
|
|||
</itemizedlist>
|
||||
|
||||
<para><acronym>HAST</acronym> operates synchronously on a block
|
||||
level, which makes it transparent for file systems and
|
||||
level, making it transparent to file systems and
|
||||
applications. <acronym>HAST</acronym> provides regular GEOM
|
||||
providers in <filename class="directory">/dev/hast/</filename>
|
||||
directory for use by other tools or applications, thus there is
|
||||
|
|
@ -4252,7 +4252,7 @@ Device 1K-blocks Used Avail Capacity
|
|||
For stripped-down systems, make sure this module is available.
|
||||
Alternatively, it is possible to build
|
||||
<literal>GEOM_GATE</literal> support into the kernel
|
||||
statically, by adding the following line to the custom kernel
|
||||
statically, by adding this line to the custom kernel
|
||||
configuration file:</para>
|
||||
|
||||
<programlisting>options GEOM_GATE</programlisting>
|
||||
|
|
@ -4290,10 +4290,10 @@ Device 1K-blocks Used Avail Capacity
|
|||
class="directory">/dev/hast/</filename>) will be called
|
||||
<filename><replaceable>test</replaceable></filename>.</para>
|
||||
|
||||
<para>The configuration of <acronym>HAST</acronym> is being done
|
||||
<para>Configuration of <acronym>HAST</acronym> is done
|
||||
in the <filename>/etc/hast.conf</filename> file. This file
|
||||
should be the same on both nodes. The simplest configuration
|
||||
possible is following:</para>
|
||||
possible is:</para>
|
||||
|
||||
<programlisting>resource test {
|
||||
on hasta {
|
||||
|
|
@ -4317,9 +4317,9 @@ Device 1K-blocks Used Avail Capacity
|
|||
alternatively in the local <acronym>DNS</acronym>.</para>
|
||||
</tip>
|
||||
|
||||
<para>Now that the configuration exists on both nodes, it is
|
||||
possible to create the <acronym>HAST</acronym> pool. Run the
|
||||
following commands on both nodes to place the initial metadata
|
||||
<para>Now that the configuration exists on both nodes,
|
||||
the <acronym>HAST</acronym> pool can be created. Run these
|
||||
commands on both nodes to place the initial metadata
|
||||
onto the local disk, and start the &man.hastd.8; daemon:</para>
|
||||
|
||||
<screen>&prompt.root; <userinput>hastctl create test</userinput>
|
||||
|
|
@ -4334,52 +4334,52 @@ Device 1K-blocks Used Avail Capacity
|
|||
available.</para>
|
||||
</note>
|
||||
|
||||
<para>HAST is not responsible for selecting node's role
|
||||
(<literal>primary</literal> or <literal>secondary</literal>).
|
||||
Node's role has to be configured by an administrator or other
|
||||
software like <application>Heartbeat</application> using the
|
||||
<para>A HAST node's role (<literal>primary</literal> or
|
||||
<literal>secondary</literal>) is selected by an administrator
|
||||
or other
|
||||
software like <application>Heartbeat</application> using the
|
||||
&man.hastctl.8; utility. Move to the primary node
|
||||
(<literal><replaceable>hasta</replaceable></literal>) and
|
||||
issue the following command:</para>
|
||||
issue this command:</para>
|
||||
|
||||
<screen>&prompt.root; <userinput>hastctl role primary test</userinput></screen>
|
||||
|
||||
<para>Similarly, run the following command on the secondary node
|
||||
<para>Similarly, run this command on the secondary node
|
||||
(<literal><replaceable>hastb</replaceable></literal>):</para>
|
||||
|
||||
<screen>&prompt.root; <userinput>hastctl role secondary test</userinput></screen>
|
||||
|
||||
<caution>
|
||||
<para>It may happen that both of the nodes are not able to
|
||||
communicate with each other and both are configured as
|
||||
primary nodes; the consequence of this condition is called
|
||||
<literal>split-brain</literal>. In order to troubleshoot
|
||||
<para>When the nodes are unable to
|
||||
communicate with each other, and both are configured as
|
||||
primary nodes, the condition is called
|
||||
<literal>split-brain</literal>. To troubleshoot
|
||||
this situation, follow the steps described in <xref
|
||||
linkend="disks-hast-sb">.</para>
|
||||
</caution>
|
||||
|
||||
<para>It is possible to verify the result with the
|
||||
<para>Verify the result with the
|
||||
&man.hastctl.8; utility on each node:</para>
|
||||
|
||||
<screen>&prompt.root; <userinput>hastctl status test</userinput></screen>
|
||||
|
||||
<para>The important text is the <literal>status</literal> line
|
||||
from its output and it should say <literal>complete</literal>
|
||||
<para>The important text is the <literal>status</literal> line,
|
||||
which should say <literal>complete</literal>
|
||||
on each of the nodes. If it says <literal>degraded</literal>,
|
||||
something went wrong. At this point, the synchronization
|
||||
between the nodes has already started. The synchronization
|
||||
completes when the <command>hastctl status</command> command
|
||||
completes when <command>hastctl status</command>
|
||||
reports 0 bytes of <literal>dirty</literal> extents.</para>
|
||||
|
||||
|
||||
<para>The last step is to create a filesystem on the
|
||||
<para>The next step is to create a filesystem on the
|
||||
<devicename>/dev/hast/<replaceable>test</replaceable></devicename>
|
||||
GEOM provider and mount it. This has to be done on the
|
||||
<literal>primary</literal> node (as the
|
||||
GEOM provider and mount it. This must be done on the
|
||||
<literal>primary</literal> node, as
|
||||
<filename>/dev/hast/<replaceable>test</replaceable></filename>
|
||||
appears only on the <literal>primary</literal> node), and
|
||||
it can take a few minutes depending on the size of the hard
|
||||
drive:</para>
|
||||
appears only on the <literal>primary</literal> node.
|
||||
Creating the filesystem can take a few minutes, depending on the
|
||||
size of the hard drive:</para>
|
||||
|
||||
<screen>&prompt.root; <userinput>newfs -U /dev/hast/test</userinput>
|
||||
&prompt.root; <userinput>mkdir /hast/test</userinput>
|
||||
|
|
@ -4387,9 +4387,9 @@ Device 1K-blocks Used Avail Capacity
|
|||
|
||||
<para>Once the <acronym>HAST</acronym> framework is configured
|
||||
properly, the final step is to make sure that
|
||||
<acronym>HAST</acronym> is started during the system boot time
|
||||
automatically. The following line should be added to the
|
||||
<filename>/etc/rc.conf</filename> file:</para>
|
||||
<acronym>HAST</acronym> is started automatically during the system
|
||||
boot. Add this line to
|
||||
<filename>/etc/rc.conf</filename>:</para>
|
||||
|
||||
<programlisting>hastd_enable="YES"</programlisting>
|
||||
|
||||
|
|
@ -4397,26 +4397,25 @@ Device 1K-blocks Used Avail Capacity
|
|||
<title>Failover Configuration</title>
|
||||
|
||||
<para>The goal of this example is to build a robust storage
|
||||
system which is resistant from the failures of any given node.
|
||||
The key task here is to remedy a scenario when a
|
||||
<literal>primary</literal> node of the cluster fails. Should
|
||||
it happen, the <literal>secondary</literal> node is there to
|
||||
system which is resistant to the failure of any given node.
|
||||
The scenario is that a
|
||||
<literal>primary</literal> node of the cluster fails. If
|
||||
this happens, the <literal>secondary</literal> node is there to
|
||||
take over seamlessly, check and mount the file system, and
|
||||
continue to work without missing a single bit of data.</para>
|
||||
|
||||
<para>In order to accomplish this task, it will be required to
|
||||
utilize another feature available under &os; which provides
|
||||
<para>To accomplish this task, another &os; feature provides
|
||||
for automatic failover on the IP layer —
|
||||
<acronym>CARP</acronym>. <acronym>CARP</acronym> stands for
|
||||
Common Address Redundancy Protocol and allows multiple hosts
|
||||
<acronym>CARP</acronym>. <acronym>CARP</acronym> (Common Address
|
||||
Redundancy Protocol) allows multiple hosts
|
||||
on the same network segment to share an IP address. Set up
|
||||
<acronym>CARP</acronym> on both nodes of the cluster according
|
||||
to the documentation available in <xref linkend="carp">.
|
||||
After completing this task, each node should have its own
|
||||
After setup, each node will have its own
|
||||
<devicename>carp0</devicename> interface with a shared IP
|
||||
address <replaceable>172.16.0.254</replaceable>.
|
||||
Obviously, the primary <acronym>HAST</acronym> node of the
|
||||
cluster has to be the master <acronym>CARP</acronym>
|
||||
The primary <acronym>HAST</acronym> node of the
|
||||
cluster must be the master <acronym>CARP</acronym>
|
||||
node.</para>
|
||||
|
||||
<para>The <acronym>HAST</acronym> pool created in the previous
|
||||
|
|
@ -4430,17 +4429,17 @@ Device 1K-blocks Used Avail Capacity
|
|||
|
||||
<para>In the event of <acronym>CARP</acronym> interfaces going
|
||||
up or down, the &os; operating system generates a &man.devd.8;
|
||||
event, which makes it possible to watch for the state changes
|
||||
event, making it possible to watch for the state changes
|
||||
on the <acronym>CARP</acronym> interfaces. A state change on
|
||||
the <acronym>CARP</acronym> interface is an indication that
|
||||
one of the nodes failed or came back online. In such a case,
|
||||
it is possible to run a particular script which will
|
||||
automatically handle the failover.</para>
|
||||
one of the nodes failed or came back online. These state change
|
||||
events make it possible to run a script which will
|
||||
automatically handle the HAST failover.</para>
|
||||
|
||||
<para>To be able to catch the state changes on the
|
||||
<acronym>CARP</acronym> interfaces, the following
|
||||
configuration has to be added to the
|
||||
<filename>/etc/devd.conf</filename> file on each node:</para>
|
||||
<para>To be able to catch state changes on the
|
||||
<acronym>CARP</acronym> interfaces, add this
|
||||
configuration to
|
||||
<filename>/etc/devd.conf</filename> on each node:</para>
|
||||
|
||||
<programlisting>notify 30 {
|
||||
match "system" "IFNET";
|
||||
|
|
@ -4456,12 +4455,12 @@ notify 30 {
|
|||
action "/usr/local/sbin/carp-hast-switch slave";
|
||||
};</programlisting>
|
||||
|
||||
<para>To put the new configuration into effect, run the
|
||||
following command on both nodes:</para>
|
||||
<para>Restart &man.devd.8; on both nodes to put the new configuration
|
||||
into effect:</para>
|
||||
|
||||
<screen>&prompt.root; <userinput>/etc/rc.d/devd restart</userinput></screen>
|
||||
|
||||
<para>In the event that the <devicename>carp0</devicename>
|
||||
<para>When the <devicename>carp0</devicename>
|
||||
interface goes up or down (i.e. the interface state changes),
|
||||
the system generates a notification, allowing the &man.devd.8;
|
||||
subsystem to run an arbitrary script, in this case
|
||||
|
|
@ -4471,7 +4470,7 @@ notify 30 {
|
|||
&man.devd.8; configuration, please consult the
|
||||
&man.devd.conf.5; manual page.</para>
|
||||
|
||||
<para>An example of such a script could be following:</para>
|
||||
<para>An example of such a script could be:</para>
|
||||
|
||||
<programlisting>#!/bin/sh
|
||||
|
||||
|
|
@ -4557,13 +4556,13 @@ case "$1" in
|
|||
;;
|
||||
esac</programlisting>
|
||||
|
||||
<para>In a nutshell, the script does the following when a node
|
||||
<para>In a nutshell, the script takes these actions when a node
|
||||
becomes <literal>master</literal> /
|
||||
<literal>primary</literal>:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Promotes the <acronym>HAST</acronym> pools as
|
||||
<para>Promotes the <acronym>HAST</acronym> pools to
|
||||
primary on a given node.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
|
|
@ -4571,7 +4570,7 @@ esac</programlisting>
|
|||
<acronym>HAST</acronym> pool.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Mounts the pools at appropriate place.</para>
|
||||
<para>Mounts the pools at an appropriate place.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
|
|
@ -4590,15 +4589,15 @@ esac</programlisting>
|
|||
|
||||
<caution>
|
||||
<para>Keep in mind that this is just an example script which
|
||||
should serve as a proof of concept solution. It does not
|
||||
should serve as a proof of concept. It does not
|
||||
handle all the possible scenarios and can be extended or
|
||||
altered in any way, for example it can start/stop required
|
||||
services etc.</para>
|
||||
services, etc.</para>
|
||||
</caution>
|
||||
|
||||
<tip>
|
||||
<para>For the purpose of this example we used a standard UFS
|
||||
file system. In order to reduce the time needed for
|
||||
<para>For this example, we used a standard UFS
|
||||
file system. To reduce the time needed for
|
||||
recovery, a journal-enabled UFS or ZFS file system can
|
||||
be used.</para>
|
||||
</tip>
|
||||
|
|
@ -4615,41 +4614,40 @@ esac</programlisting>
|
|||
<sect3>
|
||||
<title>General Troubleshooting Tips</title>
|
||||
|
||||
<para><acronym>HAST</acronym> should be generally working
|
||||
without any issues, however as with any other software
|
||||
<para><acronym>HAST</acronym> should generally work
|
||||
without issues. However, as with any other software
|
||||
product, there may be times when it does not work as
|
||||
supposed. The sources of the problems may be different, but
|
||||
the rule of thumb is to ensure that the time is synchronized
|
||||
between all nodes of the cluster.</para>
|
||||
|
||||
<para>The debugging level of the &man.hastd.8; should be
|
||||
increased when troubleshooting <acronym>HAST</acronym>
|
||||
problems. This can be accomplished by starting the
|
||||
<para>When troubleshooting <acronym>HAST</acronym> problems,
|
||||
the debugging level of &man.hastd.8; should be increased
|
||||
by starting the
|
||||
&man.hastd.8; daemon with the <literal>-d</literal>
|
||||
argument. Note, that this argument may be specified
|
||||
argument. Note that this argument may be specified
|
||||
multiple times to further increase the debugging level. A
|
||||
lot of useful information may be obtained this way. It
|
||||
should be also considered to use <literal>-F</literal>
|
||||
argument, which will start the &man.hastd.8; daemon in
|
||||
lot of useful information may be obtained this way. Consider
|
||||
also using the <literal>-F</literal>
|
||||
argument, which starts the &man.hastd.8; daemon in the
|
||||
foreground.</para>
|
||||
</sect3>
|
||||
|
||||
<sect3 id="disks-hast-sb">
|
||||
<title>Recovering from the Split-brain Condition</title>
|
||||
|
||||
<para>The consequence of a situation when both nodes of the
|
||||
cluster are not able to communicate with each other and both
|
||||
are configured as primary nodes is called
|
||||
<literal>split-brain</literal>. This is a dangerous
|
||||
<para><literal>Split-brain</literal> is when the nodes of the
|
||||
cluster are unable to communicate with each other, and both
|
||||
are configured as primary. This is a dangerous
|
||||
condition because it allows both nodes to make incompatible
|
||||
changes to the data. This situation has to be handled by
|
||||
the system administrator manually.</para>
|
||||
changes to the data. This problem must be corrected
|
||||
manually by the system administrator.</para>
|
||||
|
||||
<para>In order to fix this situation the administrator has to
|
||||
<para>The administrator must
|
||||
decide which node has more important changes (or merge them
|
||||
manually) and let the <acronym>HAST</acronym> perform
|
||||
the full synchronization of the node which has the broken
|
||||
data. To do this, issue the following commands on the node
|
||||
manually) and let <acronym>HAST</acronym> perform
|
||||
full synchronization of the node which has the broken
|
||||
data. To do this, issue these commands on the node
|
||||
which needs to be resynchronized:</para>
|
||||
|
||||
<screen>&prompt.root; <userinput>hastctl role init <resource></userinput>
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue