Finish editorial review of HAST chapter.
Sponsored by: iXsystems
This commit is contained in:
parent
5368297f96
commit
1b904db935
Notes:
svn2git
2020-12-08 03:00:23 +00:00
svn path=/head/; revision=44500
1 changed files with 51 additions and 48 deletions
|
@ -3675,22 +3675,22 @@ Device 1K-blocks Used Avail Capacity
|
|||
|
||||
<para>The goal of this example is to build a robust storage
|
||||
system which is resistant to the failure of any given node.
|
||||
The scenario is that a <literal>primary</literal> node of
|
||||
the cluster fails. If this happens, the
|
||||
<literal>secondary</literal> node is there to take over
|
||||
If the primary node
|
||||
fails, the
|
||||
secondary node is there to take over
|
||||
seamlessly, check and mount the file system, and continue to
|
||||
work without missing a single bit of data.</para>
|
||||
|
||||
<para>To accomplish this task, another &os; feature,
|
||||
<acronym>CARP</acronym>, provides for automatic failover on
|
||||
the IP layer. <acronym>CARP</acronym> (Common
|
||||
Address Redundancy Protocol) allows multiple hosts on the
|
||||
same network segment to share an IP address. Set up
|
||||
<para>To accomplish this task, the Common
|
||||
Address Redundancy Protocol
|
||||
(<acronym>CARP</acronym>) is used to provide for automatic failover at
|
||||
the <acronym>IP</acronym> layer. <acronym>CARP</acronym> allows multiple hosts on the
|
||||
same network segment to share an <acronym>IP</acronym> address. Set up
|
||||
<acronym>CARP</acronym> on both nodes of the cluster
|
||||
according to the documentation available in
|
||||
<xref linkend="carp"/>. After setup, each node will
|
||||
have its own <filename>carp0</filename> interface with a
|
||||
shared IP address of
|
||||
<xref linkend="carp"/>. In this example, each node will
|
||||
have its own management <acronym>IP</acronym> address and a
|
||||
shared <acronym>IP</acronym> address of
|
||||
<replaceable>172.16.0.254</replaceable>. The primary
|
||||
<acronym>HAST</acronym> node of the cluster must be the
|
||||
master <acronym>CARP</acronym> node.</para>
|
||||
|
@ -3699,7 +3699,7 @@ Device 1K-blocks Used Avail Capacity
|
|||
section is now ready to be exported to the other hosts on
|
||||
the network. This can be accomplished by exporting it
|
||||
through <acronym>NFS</acronym> or
|
||||
<application>Samba</application>, using the shared IP
|
||||
<application>Samba</application>, using the shared <acronym>IP</acronym>
|
||||
address <replaceable>172.16.0.254</replaceable>. The only
|
||||
problem which remains unresolved is an automatic failover
|
||||
should the primary node fail.</para>
|
||||
|
@ -3713,7 +3713,7 @@ Device 1K-blocks Used Avail Capacity
|
|||
These state change events make it possible to run a script
|
||||
which will automatically handle the HAST failover.</para>
|
||||
|
||||
<para>To be able to catch state changes on the
|
||||
<para>To catch state changes on the
|
||||
<acronym>CARP</acronym> interfaces, add this
|
||||
configuration to
|
||||
<filename>/etc/devd.conf</filename> on each node:</para>
|
||||
|
@ -3732,21 +3732,27 @@ notify 30 {
|
|||
action "/usr/local/sbin/carp-hast-switch slave";
|
||||
};</programlisting>
|
||||
|
||||
<note>
|
||||
<para>If the systems are running &os; 10 or higher,
|
||||
replace <filename>carp0</filename> with the name of the
|
||||
<acronym>CARP</acronym>-configured interface.</para>
|
||||
</note>
|
||||
|
||||
<para>Restart &man.devd.8; on both nodes to put the new
|
||||
configuration into effect:</para>
|
||||
|
||||
<screen>&prompt.root; <userinput>service devd restart</userinput></screen>
|
||||
|
||||
<para>When the <filename>carp0</filename> interface state
|
||||
<para>When the specified interface state
|
||||
changes by going up or down , the system generates a
|
||||
notification, allowing the &man.devd.8; subsystem to run an
|
||||
arbitrary script, in this case
|
||||
<filename>/usr/local/sbin/carp-hast-switch</filename>. This
|
||||
script handles the automatic failover. For further
|
||||
clarification about the above &man.devd.8; configuration,
|
||||
notification, allowing the &man.devd.8; subsystem to run the
|
||||
specified automatic failover script,
|
||||
<filename>/usr/local/sbin/carp-hast-switch</filename>.
|
||||
For further
|
||||
clarification about this configuration,
|
||||
refer to &man.devd.conf.5;.</para>
|
||||
|
||||
<para>An example of such a script could be:</para>
|
||||
<para>Here is an example of an automated failover script:</para>
|
||||
|
||||
<programlisting>#!/bin/sh
|
||||
|
||||
|
@ -3755,7 +3761,7 @@ notify 30 {
|
|||
# and Viktor Petersson <vpetersson@wireload.net>
|
||||
|
||||
# The names of the HAST resources, as listed in /etc/hast.conf
|
||||
resources="test"
|
||||
resources="<replaceable>test</replaceable>"
|
||||
|
||||
# delay in mounting HAST resource after becoming master
|
||||
# make your best guess
|
||||
|
@ -3833,13 +3839,12 @@ case "$1" in
|
|||
esac</programlisting>
|
||||
|
||||
<para>In a nutshell, the script takes these actions when a
|
||||
node becomes <literal>master</literal> /
|
||||
<literal>primary</literal>:</para>
|
||||
node becomes master:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Promotes the <acronym>HAST</acronym> pools to
|
||||
primary on a given node.</para>
|
||||
<para>Promotes the <acronym>HAST</acronym> pool to
|
||||
primary on the other node.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
|
@ -3848,41 +3853,40 @@ esac</programlisting>
|
|||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>Mounts the pools at an appropriate place.</para>
|
||||
<para>Mounts the pool.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>When a node becomes <literal>backup</literal> /
|
||||
<literal>secondary</literal>:</para>
|
||||
<para>When a node becomes
|
||||
secondary:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Unmounts the <acronym>HAST</acronym> pools.</para>
|
||||
<para>Unmounts the <acronym>HAST</acronym> pool.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>Degrades the <acronym>HAST</acronym> pools to
|
||||
<para>Degrades the <acronym>HAST</acronym> pool to
|
||||
secondary.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<caution>
|
||||
<para>Keep in mind that this is just an example script which
|
||||
<para>This is just an example script which
|
||||
serves as a proof of concept. It does not handle all the
|
||||
possible scenarios and can be extended or altered in any
|
||||
way, for example, to start/stop required services.</para>
|
||||
way, for example, to start or stop required services.</para>
|
||||
</caution>
|
||||
|
||||
<tip>
|
||||
<para>For this example, a standard UFS file system was used.
|
||||
<para>For this example, a standard <acronym>UFS</acronym> file system was used.
|
||||
To reduce the time needed for recovery, a journal-enabled
|
||||
UFS or ZFS file system can be used instead.</para>
|
||||
<acronym>UFS</acronym> or <acronym>ZFS</acronym> file system can be used instead.</para>
|
||||
</tip>
|
||||
|
||||
<para>More detailed information with additional examples can
|
||||
be found in the <link
|
||||
xlink:href="http://wiki.FreeBSD.org/HAST">HAST Wiki</link>
|
||||
page.</para>
|
||||
be found at <link
|
||||
xlink:href="http://wiki.FreeBSD.org/HAST">http://wiki.FreeBSD.org/HAST</link>.</para>
|
||||
</sect3>
|
||||
</sect2>
|
||||
|
||||
|
@ -3893,22 +3897,21 @@ esac</programlisting>
|
|||
issues. However, as with any other software product, there
|
||||
may be times when it does not work as supposed. The sources
|
||||
of the problems may be different, but the rule of thumb is to
|
||||
ensure that the time is synchronized between all nodes of the
|
||||
ensure that the time is synchronized between the nodes of the
|
||||
cluster.</para>
|
||||
|
||||
<para>When troubleshooting <acronym>HAST</acronym> problems, the
|
||||
<para>When troubleshooting <acronym>HAST</acronym>, the
|
||||
debugging level of &man.hastd.8; should be increased by
|
||||
starting &man.hastd.8; with <literal>-d</literal>. This
|
||||
starting <command>hastd</command> with <literal>-d</literal>. This
|
||||
argument may be specified multiple times to further increase
|
||||
the debugging level. A lot of useful information may be
|
||||
obtained this way. Consider also using
|
||||
<literal>-F</literal>, which starts &man.hastd.8; in the
|
||||
the debugging level. Consider also using
|
||||
<literal>-F</literal>, which starts <command>hastd</command> in the
|
||||
foreground.</para>
|
||||
|
||||
<sect3 xml:id="disks-hast-sb">
|
||||
<title>Recovering from the Split-brain Condition</title>
|
||||
|
||||
<para><literal>Split-brain</literal> is when the nodes of the
|
||||
<para><firstterm>Split-brain</firstterm> occurs when the nodes of the
|
||||
cluster are unable to communicate with each other, and both
|
||||
are configured as primary. This is a dangerous condition
|
||||
because it allows both nodes to make incompatible changes to
|
||||
|
@ -3916,15 +3919,15 @@ esac</programlisting>
|
|||
system administrator.</para>
|
||||
|
||||
<para>The administrator must decide which node has more
|
||||
important changes (or merge them manually) and let
|
||||
important changes or merge them manually. Then, let
|
||||
<acronym>HAST</acronym> perform full synchronization of the
|
||||
node which has the broken data. To do this, issue these
|
||||
commands on the node which needs to be
|
||||
resynchronized:</para>
|
||||
|
||||
<screen>&prompt.root; <userinput>hastctl role init <resource></userinput>
|
||||
&prompt.root; <userinput>hastctl create <resource></userinput>
|
||||
&prompt.root; <userinput>hastctl role secondary <resource></userinput></screen>
|
||||
<screen>&prompt.root; <userinput>hastctl role init <replaceable>test</replaceable></userinput>
|
||||
&prompt.root; <userinput>hastctl create <replaceable>test</replaceable></userinput>
|
||||
&prompt.root; <userinput>hastctl role secondary <replaceable>test</replaceable></userinput></screen>
|
||||
</sect3>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
|
Loading…
Reference in a new issue