Editorial review of first 1/2 of HAST chapter.

Sponsored by:	iXsystems
This commit is contained in:
Dru Lavigne 2014-04-08 15:18:09 +00:00
parent aa33908aca
commit c9c8b80069
Notes: svn2git 2020-12-08 03:00:23 +00:00
svn path=/head/; revision=44485

View file

@ -3297,7 +3297,7 @@ Device 1K-blocks Used Avail Capacity
<sect1 xml:id="disks-hast"> <sect1 xml:id="disks-hast">
<info> <info>
<title>Highly Available Storage (HAST)</title> <title>Highly Available Storage (<acronym>HAST</acronym>)</title>
<authorgroup> <authorgroup>
<author> <author>
@ -3348,75 +3348,24 @@ Device 1K-blocks Used Avail Capacity
<para>High availability is one of the main requirements in <para>High availability is one of the main requirements in
serious business applications and highly-available storage is a serious business applications and highly-available storage is a
key component in such environments. Highly Available STorage, key component in such environments. In &os;, the Highly Available STorage
or <acronym>HAST<remark role="acronym">Highly (<acronym>HAST</acronym>)
Available STorage</remark></acronym>, was developed by framework allows transparent storage of
&a.pjd.email; as a framework which allows transparent storage of
the same data across several physically separated machines the same data across several physically separated machines
connected by a TCP/IP network. <acronym>HAST</acronym> can be connected by a <acronym>TCP/IP</acronym> network. <acronym>HAST</acronym> can be
understood as a network-based RAID1 (mirror), and is similar to understood as a network-based RAID1 (mirror), and is similar to
the DRBD&reg; storage system known from the GNU/&linux; the DRBD&reg; storage system used in the GNU/&linux;
platform. In combination with other high-availability features platform. In combination with other high-availability features
of &os; like <acronym>CARP</acronym>, <acronym>HAST</acronym> of &os; like <acronym>CARP</acronym>, <acronym>HAST</acronym>
makes it possible to build a highly-available storage cluster makes it possible to build a highly-available storage cluster
that is resistant to hardware failures.</para> that is resistant to hardware failures.</para>
<para>After reading this section, you will know:</para> <para>The following are the main features of
<acronym>HAST</acronym>:</para>
<itemizedlist>
<listitem>
<para>What <acronym>HAST</acronym> is, how it works and
which features it provides.</para>
</listitem>
<listitem>
<para>How to set up and use <acronym>HAST</acronym> on
&os;.</para>
</listitem>
<listitem>
<para>How to integrate <acronym>CARP</acronym> and
&man.devd.8; to build a robust storage system.</para>
</listitem>
</itemizedlist>
<para>Before reading this section, you should:</para>
<itemizedlist>
<listitem>
<para>Understand &unix; and <link
linkend="basics">&os; basics</link>.</para>
</listitem>
<listitem>
<para>Know how to <link
linkend="config-tuning">configure</link> network
interfaces and other core &os; subsystems.</para>
</listitem>
<listitem>
<para>Have a good understanding of <link
linkend="network-communication">&os;
networking</link>.</para>
</listitem>
</itemizedlist>
<para>The <acronym>HAST</acronym> project was sponsored by The
&os; Foundation with support from <link
xlink:href="http://www.omc.net/">OMCnet Internet Service
GmbH</link> and <link
xlink:href="http://www.transip.nl/">TransIP
BV</link>.</para>
<sect2>
<title>HAST Features</title>
<para>The main features of the <acronym>HAST</acronym> system
are:</para>
<itemizedlist> <itemizedlist>
<listitem> <listitem>
<para>Can be used to mask I/O errors on local hard <para>Can be used to mask <acronym>I/O</acronym> errors on local hard
drives.</para> drives.</para>
</listitem> </listitem>
@ -3426,9 +3375,9 @@ Device 1K-blocks Used Avail Capacity
</listitem> </listitem>
<listitem> <listitem>
<para>Efficient and quick resynchronization, synchronizing <para>Efficient and quick resynchronization as
only blocks that were modified during the downtime of a only the blocks that were modified during the downtime of a
node.</para> node are synchronized.</para>
</listitem> </listitem>
<!-- <!--
@ -3450,64 +3399,94 @@ Device 1K-blocks Used Avail Capacity
system.</para> system.</para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
</sect2>
<para>After reading this section, you will know:</para>
<itemizedlist>
<listitem>
<para>What <acronym>HAST</acronym> is, how it works, and
which features it provides.</para>
</listitem>
<listitem>
<para>How to set up and use <acronym>HAST</acronym> on
&os;.</para>
</listitem>
<listitem>
<para>How to integrate <acronym>CARP</acronym> and
&man.devd.8; to build a robust storage system.</para>
</listitem>
</itemizedlist>
<para>Before reading this section, you should:</para>
<itemizedlist>
<listitem>
<para>Understand &unix; and &os; basics (<xref
linkend="basics"/>).</para>
</listitem>
<listitem>
<para>Know how to configure network
interfaces and other core &os; subsystems (<xref
linkend="config-tuning"/>).</para>
</listitem>
<listitem>
<para>Have a good understanding of &os;
networking (<xref
linkend="network-communication"/>).</para>
</listitem>
</itemizedlist>
<para>The <acronym>HAST</acronym> project was sponsored by The
&os; Foundation with support from <link
xlink:href="http://www.omc.net/">http://www.omc.net/</link> and <link
xlink:href="http://www.transip.nl/">http://www.transip.nl/</link>.</para>
<sect2> <sect2>
<title>HAST Operation</title> <title>HAST Operation</title>
<para>As <acronym>HAST</acronym> provides a synchronous <para><acronym>HAST</acronym> provides synchronous
block-level replication of any storage media to several block-level replication between two
machines, it requires at least two physical machines: physical machines:
the <literal>primary</literal>, also known as the the <emphasis>primary</emphasis>, also known as the
<literal>master</literal> node, and the <emphasis>master</emphasis> node, and the
<literal>secondary</literal> or <literal>slave</literal> <emphasis>secondary</emphasis>, or <emphasis>slave</emphasis>
node. These two machines together are referred to as a node. These two machines together are referred to as a
cluster.</para> cluster.</para>
<note>
<para>HAST is currently limited to two cluster nodes in
total.</para>
</note>
<para>Since <acronym>HAST</acronym> works in a <para>Since <acronym>HAST</acronym> works in a
primary-secondary configuration, it allows only one of the primary-secondary configuration, it allows only one of the
cluster nodes to be active at any given time. The cluster nodes to be active at any given time. The
<literal>primary</literal> node, also called primary node, also called
<literal>active</literal>, is the one which will handle all <emphasis>active</emphasis>, is the one which will handle all
the I/O requests to <acronym>HAST</acronym>-managed the <acronym>I/O</acronym> requests to <acronym>HAST</acronym>-managed
devices. The <literal>secondary</literal> node is devices. The secondary node is
automatically synchronized from the <literal>primary</literal> automatically synchronized from the primary
node.</para> node.</para>
<para>The physical components of the <acronym>HAST</acronym> <para>The physical components of the <acronym>HAST</acronym>
system are:</para> system are the local disk on primary node, and the
disk on the remote, secondary node.</para>
<itemizedlist>
<listitem>
<para>local disk on primary node, and</para>
</listitem>
<listitem>
<para>disk on remote, secondary node.</para>
</listitem>
</itemizedlist>
<para><acronym>HAST</acronym> operates synchronously on a block <para><acronym>HAST</acronym> operates synchronously on a block
level, making it transparent to file systems and applications. level, making it transparent to file systems and applications.
<acronym>HAST</acronym> provides regular GEOM providers in <acronym>HAST</acronym> provides regular GEOM providers in
<filename>/dev/hast/</filename> for use by <filename>/dev/hast/</filename> for use by
other tools or applications, thus there is no difference other tools or applications. There is no difference
between using <acronym>HAST</acronym>-provided devices and between using <acronym>HAST</acronym>-provided devices and
raw disks or partitions.</para> raw disks or partitions.</para>
<para>Each write, delete, or flush operation is sent to the <para>Each write, delete, or flush operation is sent to both the
local disk and to the remote disk over TCP/IP. Each read local disk and to the remote disk over <acronym>TCP/IP</acronym>. Each read
operation is served from the local disk, unless the local disk operation is served from the local disk, unless the local disk
is not up-to-date or an I/O error occurs. In such case, the is not up-to-date or an <acronym>I/O</acronym> error occurs. In such cases, the
read operation is sent to the secondary node.</para> read operation is sent to the secondary node.</para>
<para><acronym>HAST</acronym> tries to provide fast failure <para><acronym>HAST</acronym> tries to provide fast failure
recovery. For this reason, it is very important to reduce recovery. For this reason, it is important to reduce
synchronization time after a node's outage. To provide fast synchronization time after a node's outage. To provide fast
synchronization, <acronym>HAST</acronym> manages an on-disk synchronization, <acronym>HAST</acronym> manages an on-disk
bitmap of dirty extents and only synchronizes those during a bitmap of dirty extents and only synchronizes those during a
@ -3520,29 +3499,29 @@ Device 1K-blocks Used Avail Capacity
<itemizedlist> <itemizedlist>
<listitem> <listitem>
<para><emphasis>memsync</emphasis>: report write operation <para><emphasis>memsync</emphasis>: This mode reports a write operation
as completed when the local write operation is finished as completed when the local write operation is finished
and when the remote node acknowledges data arrival, but and when the remote node acknowledges data arrival, but
before actually storing the data. The data on the remote before actually storing the data. The data on the remote
node will be stored directly after sending the node will be stored directly after sending the
acknowledgement. This mode is intended to reduce acknowledgement. This mode is intended to reduce
latency, but still provides very good latency, but still provides good
reliability.</para> reliability.</para>
</listitem> </listitem>
<listitem> <listitem>
<para><emphasis>fullsync</emphasis>: report write <para><emphasis>fullsync</emphasis>: This mode reports a write
operation as completed when local write completes and operation as completed when both the local write and the
when remote write completes. This is the safest and the remote write complete. This is the safest and the
slowest replication mode. This mode is the slowest replication mode. This mode is the
default.</para> default.</para>
</listitem> </listitem>
<listitem> <listitem>
<para><emphasis>async</emphasis>: report write operation as <para><emphasis>async</emphasis>: This mode reports a write operation as
completed when local write completes. This is the completed when the local write completes. This is the
fastest and the most dangerous replication mode. It fastest and the most dangerous replication mode. It
should be used when replicating to a distant node where should only be used when replicating to a distant node where
latency is too high for other modes.</para> latency is too high for other modes.</para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
@ -3551,65 +3530,64 @@ Device 1K-blocks Used Avail Capacity
<sect2> <sect2>
<title>HAST Configuration</title> <title>HAST Configuration</title>
<para><acronym>HAST</acronym> requires
<literal>GEOM_GATE</literal> support which is not present in
the default <literal>GENERIC</literal> kernel. However, the
<varname>geom_gate.ko</varname> loadable module is available
in the default &os; installation. Alternatively, to build
<literal>GEOM_GATE</literal> support into the kernel
statically, add this line to the custom kernel configuration
file:</para>
<programlisting>options GEOM_GATE</programlisting>
<para>The <acronym>HAST</acronym> framework consists of several <para>The <acronym>HAST</acronym> framework consists of several
parts from the operating system's point of view:</para> components:</para>
<itemizedlist> <itemizedlist>
<listitem> <listitem>
<para>the &man.hastd.8; daemon responsible for data <para>The &man.hastd.8; daemon which provides data
synchronization,</para> synchronization. When this daemon is started, it will
automatically load <varname>geom_gate.ko</varname>.</para>
</listitem> </listitem>
<listitem> <listitem>
<para>the &man.hastctl.8; userland management <para>The userland management
utility,</para> utility, &man.hastctl.8;.</para>
</listitem> </listitem>
<listitem> <listitem>
<para>and the &man.hast.conf.5; configuration file.</para> <para>The &man.hast.conf.5; configuration file. This file
must exist before starting
<application>hastd</application>.</para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
<para>Users who prefer to statically build
<literal>GEOM_GATE</literal> support into the kernel
should add this line to the custom kernel configuration
file, then rebuild the kernel using the instructions in <xref
linkend="kernelconfig"/>:</para>
<programlisting>options GEOM_GATE</programlisting>
<para>The following example describes how to configure two nodes <para>The following example describes how to configure two nodes
in <literal>master</literal>-<literal>slave</literal> / in master-slave/primary-secondary
<literal>primary</literal>-<literal>secondary</literal>
operation using <acronym>HAST</acronym> to replicate the data operation using <acronym>HAST</acronym> to replicate the data
between the two. The nodes will be called between the two. The nodes will be called
<literal><replaceable>hasta</replaceable></literal> with an IP address of <literal>hasta</literal>, with an <acronym>IP</acronym> address of
<replaceable>172.16.0.1</replaceable> and <literal>172.16.0.1</literal>, and
<literal><replaceable>hastb</replaceable></literal> with an IP of address <literal>hastb</literal>, with an <acronym>IP</acronym> of address
<replaceable>172.16.0.2</replaceable>. Both nodes will have a <literal>172.16.0.2</literal>. Both nodes will have a
dedicated hard drive <filename>/dev/<replaceable>ad6</replaceable></filename> of the same dedicated hard drive <filename>/dev/ad6</filename> of the same
size for <acronym>HAST</acronym> operation. The size for <acronym>HAST</acronym> operation. The
<acronym>HAST</acronym> pool, sometimes also referred to as a <acronym>HAST</acronym> pool, sometimes referred to as a
resource or the GEOM provider in resource or the <acronym>GEOM</acronym> provider in
<filename class="directory">/dev/hast/</filename>, will be called <filename class="directory">/dev/hast/</filename>, will be called
<filename><replaceable>test</replaceable></filename>.</para> <literal>test</literal>.</para>
<para>Configuration of <acronym>HAST</acronym> is done using <para>Configuration of <acronym>HAST</acronym> is done using
<filename>/etc/hast.conf</filename>. This file should be the <filename>/etc/hast.conf</filename>. This file should be
same on both nodes. The simplest configuration possible identical on both nodes. The simplest configuration
is:</para> is:</para>
<programlisting>resource test { <programlisting>resource <replaceable>test</replaceable> {
on hasta { on <replaceable>hasta</replaceable> {
local /dev/ad6 local <replaceable>/dev/ad6</replaceable>
remote 172.16.0.2 remote <replaceable>172.16.0.2</replaceable>
} }
on hastb { on <replaceable>hastb</replaceable> {
local /dev/ad6 local <replaceable>/dev/ad6</replaceable>
remote 172.16.0.1 remote <replaceable>172.16.0.1</replaceable>
} }
}</programlisting> }</programlisting>
@ -3618,18 +3596,18 @@ Device 1K-blocks Used Avail Capacity
<tip> <tip>
<para>It is also possible to use host names in the <para>It is also possible to use host names in the
<literal>remote</literal> statements. In such a case, make <literal>remote</literal> statements if
sure that these hosts are resolvable and are defined in the hosts are resolvable and defined either in
<filename>/etc/hosts</filename> or in the local <filename>/etc/hosts</filename> or in the local
<acronym>DNS</acronym>.</para> <acronym>DNS</acronym>.</para>
</tip> </tip>
<para>Now that the configuration exists on both nodes, <para>Once the configuration exists on both nodes,
the <acronym>HAST</acronym> pool can be created. Run these the <acronym>HAST</acronym> pool can be created. Run these
commands on both nodes to place the initial metadata onto the commands on both nodes to place the initial metadata onto the
local disk and to start &man.hastd.8;:</para> local disk and to start &man.hastd.8;:</para>
<screen>&prompt.root; <userinput>hastctl create test</userinput> <screen>&prompt.root; <userinput>hastctl create <replaceable>test</replaceable></userinput>
&prompt.root; <userinput>service hastd onestart</userinput></screen> &prompt.root; <userinput>service hastd onestart</userinput></screen>
<note> <note>
@ -3646,50 +3624,40 @@ Device 1K-blocks Used Avail Capacity
administrator, or software like administrator, or software like
<application>Heartbeat</application>, using &man.hastctl.8;. <application>Heartbeat</application>, using &man.hastctl.8;.
On the primary node, On the primary node,
<literal><replaceable>hasta</replaceable></literal>, issue <literal>hasta</literal>, issue
this command:</para> this command:</para>
<screen>&prompt.root; <userinput>hastctl role primary test</userinput></screen> <screen>&prompt.root; <userinput>hastctl role primary <replaceable>test</replaceable></userinput></screen>
<para>Similarly, run this command on the secondary node, <para>Run this command on the secondary node,
<literal><replaceable>hastb</replaceable></literal>:</para> <literal>hastb</literal>:</para>
<screen>&prompt.root; <userinput>hastctl role secondary test</userinput></screen> <screen>&prompt.root; <userinput>hastctl role secondary <replaceable>test</replaceable></userinput></screen>
<caution> <para>Verify the result by running <command>hastctl</command> on each
<para>When the nodes are unable to communicate with each
other, and both are configured as primary nodes, the
condition is called <literal>split-brain</literal>. To
troubleshoot this situation, follow the steps described in
<xref linkend="disks-hast-sb"/>.</para>
</caution>
<para>Verify the result by running &man.hastctl.8; on each
node:</para> node:</para>
<screen>&prompt.root; <userinput>hastctl status test</userinput></screen> <screen>&prompt.root; <userinput>hastctl status <replaceable>test</replaceable></userinput></screen>
<para>The important text is the <literal>status</literal> line, <para>Check the <literal>status</literal> line in the output.
which should say <literal>complete</literal> If it says <literal>degraded</literal>,
on each of the nodes. If it says <literal>degraded</literal>, something is wrong with the configuration file. It should say <literal>complete</literal>
something went wrong. At this point, the synchronization on each node, meaning that the synchronization
between the nodes has already started. The synchronization between the nodes has started. The synchronization
completes when <command>hastctl status</command> completes when <command>hastctl status</command>
reports 0 bytes of <literal>dirty</literal> extents.</para> reports 0 bytes of <literal>dirty</literal> extents.</para>
<para>The next step is to create a file system on the <para>The next step is to create a file system on the
<filename>/dev/hast/<replaceable>test</replaceable></filename> <acronym>GEOM</acronym> provider and mount it. This must be done on the
GEOM provider and mount it. This must be done on the <literal>primary</literal> node. Creating
<literal>primary</literal> node, as
<filename>/dev/hast/<replaceable>test</replaceable></filename>
appears only on the <literal>primary</literal> node. Creating
the file system can take a few minutes, depending on the size the file system can take a few minutes, depending on the size
of the hard drive:</para> of the hard drive. This example creates a <acronym>UFS</acronym>
file system on <filename>/dev/hast/test</filename>:</para>
<screen>&prompt.root; <userinput>newfs -U /dev/hast/test</userinput> <screen>&prompt.root; <userinput>newfs -U /dev/hast/<replaceable>test</replaceable></userinput>
&prompt.root; <userinput>mkdir /hast/test</userinput> &prompt.root; <userinput>mkdir /hast/<replaceable>test</replaceable></userinput>
&prompt.root; <userinput>mount /dev/hast/test /hast/test</userinput></screen> &prompt.root; <userinput>mount /dev/hast/<replaceable>test</replaceable> <replaceable>/hast/test</replaceable></userinput></screen>
<para>Once the <acronym>HAST</acronym> framework is configured <para>Once the <acronym>HAST</acronym> framework is configured
properly, the final step is to make sure that properly, the final step is to make sure that