doc/en_US.ISO8859-1/articles/vinum/article.sgml
Chern Lee d2eb18e44b megabytes -> gigabytes
Submitted by: Nate Lawson <nate@rootlabs.com>
2001-10-31 23:12:55 +00:00

2542 lines
92 KiB
Text

<!-- $FreeBSD$ -->
<!-- FreeBSD Documentation Project -->
<!DOCTYPE article PUBLIC "-//FreeBSD//DTD DocBook V4.1-Based Extension//EN" [
<!ENTITY % man PUBLIC "-//FreeBSD//ENTITIES DocBook Manual Page Entities//EN">
<!ENTITY vinum.ap "<application>Vinum</application>">
%man;
]>
<article>
<articleinfo>
<title>
Bootstrapping Vinum: A Foundation for Reliable Servers
</title>
<author>
<firstname>Robert A.</firstname>
<surname>Van Valzah</surname>
</author>
<copyright>
<year>2001</year>
<holder>Robert A. Van Valzah</holder>
</copyright>
<pubdate>$Date: 2001-10-31 23:12:55 $ GMT</pubdate>
<releaseinfo>$Id: article.sgml,v 1.4 2001-10-31 23:12:55 chern Exp $</releaseinfo>
</articleinfo>
<abstract>
<para> In the most abstract sense, these instructions show how
to build a pair of disk drives where either one is adequate
to keep your server running if the other fails.
Life is better if they are both working, but your server will never die
unless both disk drives die at once.
If you choose ATAPI drives and use a fairly generic kernel, you can
be confident that either of these drives can be plugged into most any
main board to produce a working server in a pinch.
The drives need not be identical.
These techniques work equally well with SCSI drives as they do with ATAPI,
but I will focus on ATAPI here because main boards with this interface are
ubiquitous.
After building the foundation of a reliable server as shown here, you
can expand to as many disk drives as necessary to build the
failure-resilient server of your dreams.</para>
</abstract>
<section id="Introduction">
<title>Introduction</title>
<para>Any machine that is going to provide reliable service needs
to have either redundant components on-line or a pool of
off-line spares that can be promptly swapped in. Commodity
PC hardware makes it affordable for even small organizations
to have some spare parts available that could be pressed
into service following the failure of production equipment.
In many organizations, a failed power supply, NIC, memory,
or main board could easily be swapped with a standby in a
matter of minutes and be ready to return to production work.</para>
<para>If a disk drive fails, however, it often has to be restored
from a tape backup. This may take many hours. With disk
drive capacities rising faster than tape drive capacities,
the time needed to restore a failed disk drive seems to
increase as technology progresses.</para>
<para>&vinum.ap;
is a volume manager for FreeBSD that provides a standard block
I/O layer interface to the file system code just as any hardware
device driver would.
It works by managing partitions
of type <literal>vinum</literal> and
allows you to subdivide and group the space in such
partitions into logical devices called
<firstterm>volumes</firstterm> that
can be used in the same way as disk partitions.
Volumes can
be configured for resilience, performance, or both. Experienced
system administrators will immediately recognize the benefits
of being able to configure each file system to match the way
it is most often used.</para>
<para>In some ways, <application>Vinum</application> is similar to
&man.ccd.4;, but it is far more flexible and robust in the face
of failures.
It is only slightly more difficult to set up than &man.ccd.4;.
&man.ccd.4; may meet your needs if you are only interested in
concatenation.</para>
<section id="Terminology">
<title>Terminology</title>
<para>Discussion of storage management can get very tricky
simply because of the terminology involved.
As we will see below,
the terms <firstterm>disk</firstterm>,
<firstterm>slice</firstterm>, <firstterm>partition</firstterm>,
<firstterm>subdisk</firstterm>, and <firstterm>volume</firstterm>
each refer to different things that present the same interface
to a kernel function like swapping.
The potential for confusion is compounded because the objects
that these terms represent can be nested inside each other.</para>
<para>I will refer to a physical disk drive as a
<firstterm>spindle</firstterm>.
A <firstterm>partition</firstterm> here means a BSD partition as
maintained by <command>disklabel</command>.
It does not refer to <firstterm>slices</firstterm> or
<firstterm>BIOS partitions</firstterm> as
maintained by <command>fdisk</command>.</para>
</section>
<section id="Objects">
<title>Vinum Objects</title>
<para><application>Vinum</application>
defines a hierarchy of four objects that it uses to manage storage
(see <xref linkend="arch">).
Different combinations of these objects are used to achieve
failure resilience, performance, and/or extra capacity.
I will give a whirlwind tour of the objects here--see the
<ulink url="http://www.vinumvm.org/">Vinum web site</ulink>
for a more thorough description.</para>
<figure id="arch">
<title>Vinum Objects and Architecture</title>
<mediaobject>
<imageobject>
<imagedata fileref="arch" format="EPS">
</imageobject>
<textobject>
<literallayout class="monospaced">+-----+------+------+
| UFS | swap | Etc. |
+---+-+------+----+ +
| volume | |
+ V +-------------+ +
| i plex | |
+ n +-------------+ +
| u subdisk | |
+ m +-------------+ +
| drive | |
+-----------------+ +
| Block I/O devices |
+-------------------+</literallayout>
</textobject>
<textobject>
<phrase>Vinum Objects and Architecture</phrase>
</textobject>
</mediaobject>
</figure>
<para>The top object, a vinum <firstterm>volume</firstterm>,
implements a virtual disk that
provides a standard block I/O layer
interface to other parts of the kernel.
The bottom object, a vinum <firstterm>drive</firstterm>,
uses this same interface to
request I/O from physical devices below it.</para>
<para>In between these two (from top to bottom) we have objects called
a vinum <firstterm>plex</firstterm>
and a vinum <firstterm>subdisk</firstterm>.
As you can probably guess from the name, a vinum subdisk is a
contiguous subset of the space available on a vinum drive.
It lets you subdivide a vinum drive in much the same way that
a disk BSD partition lets you subdivide a BIOS slice.</para>
<para>A plex allows subdisks to be grouped together making the space
of all subdisks available as a single object.</para>
<para>A plex can be organized with its constituent subdisks concatenated
or striped.
Both organizations are useful for spreading I/O requests across
spindles since plexes reside on distinct spindles.
A striped plex will switch spindles each time a multiple of the
strip size is reached.
A concatenated plex will switch spindles only when the end of
a subdisk is reached.</para>
<para>An important characteristic of a
<application>Vinum</application> volume is that it can be
made up of more than one plex.
In this case, writes go to all plexes and a read may be satisfied
by any plex.
Configuring two or more plexes on distinct spindles yields a
volume that is resilient to failure.</para>
<para><application>Vinum</application> maintains a
<firstterm>configuration</firstterm>
that defines instances of the above objects and the way they
are related to each other.
This configuration is automatically written to all spindles under
<application>Vinum</application> management whenever it changes.</para>
</section>
<section id="Organizations">
<title>Vinum Volume/Plex Organization</title>
<para>Although <application>Vinum</application>
can manage any number of spindles,
I will only cover scenarios with two spindles here
for simplification.
See <xref linkend=OrgCompare> to see how
two spindles organized with <application>Vinum</application>
compare to two spindles without <application>Vinum</application>.</para>
<para>
<table id=OrgCompare frame=all>
<title>Characteristics of Two Spindles Organized with Vinum</title>
<tgroup cols="5">
<thead>
<row>
<entry>Organization</entry>
<entry>Total Capacity</entry>
<entry>Failure Resilient</entry>
<entry>Peak Read Performance</entry>
<entry>Peak Write Performance</entry>
</row>
</thead>
<tbody>
<row>
<entry>Concatenated Plexes</entry>
<entry>Unchanged, but appears as a single drive</entry>
<entry>No</entry>
<entry>Unchanged</entry>
<entry>Unchanged</entry>
</row>
<row>
<entry>Striped Plexes (RAID-0)</entry>
<entry>Unchanged, but appears as a single drive</entry>
<entry>No</entry>
<entry>2x</entry>
<entry>2x</entry>
</row>
<row>
<entry>Mirrored Volumes (RAID-1)</entry>
<entry>1/2, appearing as a single drive</entry>
<entry>Yes</entry>
<entry>2x</entry>
<entry>Unchanged</entry>
</row>
</tbody>
</tgroup>
</table>
</para>
<para><xref linkend=OrgCompare> shows that striping yields
the same capacity and lack of failure resilience
as concatenation, but it has better peak read and write performance.
Hence we will not be using concatenation in any of the examples here.
Mirrored volumes provide the benefits of improved peak read performance
and failure resilience--but this comes at a loss in capacity.</para>
<note><para>Both concatenation and striping bring their benefits over a
single spindle at the cost of increased likelihood of failure since
more than one spindle is now involved.</para></note>
<para>When three or more spindles are present,
<application>Vinum</application> also supports rotated,
block-interleaved parity (also called <firstterm>RAID-5</firstterm>)
that provides better
capacity than mirroring (but not quite as good as striping), better
read performance than both mirroring and striping,
and good failure resilience.
There is, however,
a substantial decrease in write performance with RAID-5.
Most of the benefits become more pronounced with five or more
spindles.</para>
<para>The organizations described above may be combined to provide
benefits that no single organization can match.
For example, mirroring and striping can be combined to provide
failure-resilience with very fast read performance.</para>
</section>
<section id="History">
<title>Vinum History</title>
<para><application>Vinum</application>
is a standard part of even a "minimum" FreeBSD distribution and
it has been standard since 3.0-RELEASE.
The official pronunciation of the name is
<emphasis>VEE-noom</emphasis>.</para>
<para>&vinum.ap; was inspired by the Veritas Volume Manager, but
was not derived from it.
The name is a play on that history and the Latin adage
<foreignphrase>In Vino Veritas</foreignphrase>
(<foreignphrase>Vino</foreignphrase> is the accusative form of
<foreignphrase>Vinum</foreignphrase>).
Literally translated, that is "Truth lies in wine" hinting that
drunkards have a hard time lying.
</para>
<para>I have been using it in production on six different servers for
over two years with no data loss.
Like the rest of FreeBSD, <application>Vinum</application>
provides "rock-stable performance."
(On a personal note, I have seen <application>Vinum</application>
panic when I misconfigured something, but I have
never had any trouble in normal operation.)
Greg Lehey wrote
<application>Vinum</application> for FreeBSD,
but he is seeking
help in porting it to NetBSD and OpenBSD.</para>
<warning>
<para>Just like the rest of FreeBSD, <application>Vinum</application>
is undergoing continuous
development.
Several subtle, but significant bugs have been fixed in recent
releases.
It is always best to use the most recent code base that meets your
stability requirements.</para></warning>
</section>
<section id="Strategy">
<title>Vinum Deployment Strategy</title>
<para><application>Vinum</application>,
coupled with prudent partition management, lets you
keep "warm-spare" spindles on-line so that failures
are transparent to users. Failed spindles can be replaced
during regular maintenance periods or whenever it is convenient.
When all spindles are working, the server benefits from increased
performance and capacity.</para>
<para>Having redundant copies of your home directory does not
help you if the spindle holding root,
<filename>/usr</filename>, or swap fails on your server.
Hence I focus here on building a simple
foundation for a failure-resilient server covering the root,
<filename>/usr</filename>,
<filename>/home</filename>, and swap partitions.</para>
<warning>
<para><application>Vinum</application>
mirroring does not remove the need for making backups!
Mirroring cannot help you recover from site disasters
or the dreaded
<command>rm -r -f /</command> command.</para></warning>
</section>
<section id="WhyBootstrap">
<title>Why Bootstrap Vinum?</title>
<para>It is possible to add <application>Vinum</application>
to a server configuration after
it is already in production use, but this is much harder than
designing for it from the start. Ironically,
<application>Vinum</application> is not supported by
<command>/stand/sysinstall</command>
and hence you cannot install
<filename>/usr</filename> right onto a
<application>Vinum</application> volume.</para>
<note><para><application>Vinum</application> currently does not
support the root file system (this feature
is in development).</para></note>
<para>Hence it is a bit
tricky to get started using
<application>Vinum</application>, but these instructions
take you though the process of planning for
<application>Vinum</application>, installing FreeBSD
without it, and then beginning to use it.</para>
<para>I have come to call this whole process "bootstrapping Vinum."
That is, the process of getting <application>Vinum</application>
initially installed
and operating to the point where you have met your resilience
or performance goals. My purpose here is to document a
<application>Vinum</application>
bootstrapping method that I have found that works well for me.</para>
</section>
<section id="Benefits">
<title>Vinum Benefits</title>
<para>The server foundation scenario I have chosen here allows me
to show you examples of configuring for resilience on
<filename>/usr</filename> and
<filename>/home</filename>.
Yet <application>Vinum</application>
provides benefits other than resilience--namely
performance, capacity, and manageability.
It can significantly improve disk performance (especially
under multi-user loads).
<application>Vinum</application>
can easily concatenate many smaller disks to produce the
illusion of a single larger disk (but my server foundation
scenario does not allow me to illustrate these benefits here).</para>
<para>For servers with many spindles, <application>Vinum</application>
provides substantial
benefits in volume management, particularly when coupled with
hot-pluggable hardware. Data can be moved from spindle to
spindle while the system is running without loss of production
time. Again, details of this will not be given here, but once
you get your feet wet with <application>Vinum</application>,
other documentation will help you do things like this.
See
"<ulink url="http://www.vinumvm.org/vinum/vinum.ps">The Vinum
Volume Manager</ulink>" for a technical introduction to
<application>Vinum</application>,
&man.vinum.8; for a description of the <command>vinum</command>
command, and
&man.vinum.4;
for a description of the vinum device
driver and the way <application>Vinum</application>
objects are named.</para>
<note>
<para>Breaking up your disk space into smaller and smaller partitions
has the benefit of allowing you to "tune" for the most common
type of access and tends to keep disk hogs "within their pens."
However it also causes some loss in total available disk space
due to fragmentation.</para></note>
</section>
<section id="DegradedOperation">
<title>Server Operation in Degraded Mode</title>
<para>Some disk failures in this two-spindle scenario will result in
<application>Vinum</application>
automatically routing
all disk I/O to the remaining good spindle.
Others will require brief manual intervention on the console
to configure the server for degraded mode operation and a quick reboot.
Other than actual hardware repairs, most recovery work
can be done while the server is running in multi-user degraded
mode so there is as little production impact
from failures as possible.</para>
<para>I give the instructions in <xref linkend=Failures> needed to
configure the server for degraded mode operation
in those cases where <application>Vinum</application>
cannot do it automatically.
I also give the instructions needed to
return to normal operation once the failed hardware is repaired.
You might call these instructions <application>Vinum</application>
failure recovery techniques.</para>
<para>I recommend practicing using these instructions
by recovering from simulated failures.
For each failure scenario, I also give tips below for simulating
a failure even when your hardware is working well.
Even a minimum <application>Vinum</application>
system as described in
<xref linkend="HW">
below can be a good place to experiment with
recovery techniques without impacting production equipment.</para>
</section>
<section id="HWvsSW">
<title>Hardware RAID vs. Vinum (Software RAID)</title>
<para>Manual intervention is sometimes required to configure a server for
degraded mode because
<application>Vinum</application>
is implemented in software that runs after the FreeBSD
kernel is loaded. One disadvantage of such
<firstterm>software RAID</firstterm>
solutions is that there is nothing that can be done to hide spindle
failures from the BIOS or the FreeBSD boot sequence. Hence
the manual reconfiguration of the server
for degraded operation mentioned
above just informs the BIOS and boot sequence of failed
spindles.
<firstterm>Hardware RAID</firstterm> solutions generally have an
advantage in that they require no such reconfiguration since
spindle failures are hidden from the BIOS and boot sequence.</para>
<para>Hardware RAID, however, may have some disadvantages that can
be significant in some cases:
<itemizedlist>
<listitem><para>
The hardware RAID controller itself may become a single
point of failure for the system.
</para></listitem>
<listitem><para>
The data is usually kept in a proprietary
format so that a disk drive cannot be simply plugged
into another main board and booted.
</para></listitem>
<listitem><para>
You often cannot mix and
match drives with different sizes and interfaces.
</para></listitem>
<listitem><para>
You are often limited to the number of drives supported by the
hardware RAID controller (often only four or eight).
</para></listitem>
</itemizedlist>
In other words, &vinum.ap; may offer advantages in that
there is no single point of failure,
the drives can boot on most any main board, and
you are free to mix and match as many drives using
whatever interface you choose.</para>
<tip>
<para>Keep your kernel fairly generic (or at least keep
<filename>/kernel.GENERIC</filename> around).
This will improve the chances that you can come back up on
"foreign" hardware more quickly.</para>
</tip>
<para>The pros and cons discussed above suggest
that the root file system and swap partition are good
candidates for hardware RAID if available.
This is especially true for servers where it is difficult for
administrators to get console access (recall that this is sometimes
required to configure a server for degraded mode operation).
A server with only software RAID is well suited to office and home
environments where an administrator can be close at hand.</para>
<note><para>A common myth is that hardware RAID is always faster
than software RAID.
Since it runs on the host CPU, <application>Vinum</application>
often has more CPU power and memory available than a
dedicated RAID controller would have.
If performance is a prime concern, it is best to benchmark
your application running on your CPU with your spindles using
both hardware and software RAID systems before making
a decision.</para></note>
</section>
<section id="HW">
<title>Hardware for Vinum</title>
<para>These instructions may be timely since commodity PC hardware
can now easily host several hundred gigabytes of reasonably
high-performance disk space at a low price. Many disk
drive manufactures now sell 7,200 RPM disk drives with quite
low seek times and high transfer rates through ATA-100
interfaces, all at very attractive prices. Four such drives,
attached to a suitable main board and configured with
<application>Vinum</application>
and prudent partitioning, yields a failure-resilient, high
performance disk server at a very reasonable cost.</para>
<para>However, you can indeed get started with
<application>Vinum</application> very simply.
A minimum system can be as simple as
an old CPU (even a 486 is fine) and a pair of drives
that are 500 MB or more. They need not be the same size or
even use the same interface (i.e., it is fine to mix ATAPI and
SCSI). So get busy and give this a try today! You will have
the foundation of a failure-resilient server running in an
hour or so!</para>
</section>
</section>
<section id="BootstrappingPhases">
<title>Bootstrapping Phases</title>
<para>Greg Lehey suggested this bootstrapping method.
It uses knowledge of how <application>Vinum</application>
internally allocates disk space to avoid copying data.
Instead, <application>Vinum</application>
objects are configured so that they occupy the
same disk space where <command>/stand/sysinstall</command> built
file systems.
The file systems are thus embedded within
<application>Vinum</application> objects without copying.</para>
<para>There are several distinct phases to the
<application>Vinum</application> bootstrapping
procedure. Each of these phases is presented in a separate section below.
The section starts with a general overview of the phase and its goals.
It then gives example steps for the two-spindle scenario
presented here and advice on how to adapt them for your server.
(If you are reading for a general understanding
of <application>Vinum</application>
bootstrapping, the example sections for each phase
can safely be skipped.)
The remainder of this section gives
an overview of the entire bootstrapping process.</para>
<para>Phase 1 involves planning and preparation.
We will balance requirements
for the server against available resources and make design
tradeoffs.
We will plan the transition from no
<application>Vinum</application> to
<application>Vinum</application>
on just one spindle, to <application>Vinum</application>
on two spindles.</para>
<para>In phase 2, we will install a minimum FreeBSD system on a
single spindle using partitions of type
<literal>4.2BSD</literal> (regular UFS file systems).</para>
<para>Phase 3 will embed the non-root file systems from phase 2 in
<application>Vinum</application> objects.
Note that <application>Vinum</application> will be up and
running at this point,
but it cannot yet provide any resilience since it only has
one spindle on which to store data.</para>
<para>Finally in phase 4, we configure <application>Vinum</application>
on a second spindle and make a backup copy of the root file system.
This will give us resilience on all file systems.</para>
<section id="P1">
<title>Bootstrapping Phase 1: Planning and Preparation</title>
<para>Our goal in this phase is to define the different partitions
we will need and examine their requirements.
We will also look at available disk drives and controllers and allocate
partitions to them.
Finally, we will determine the size of
each partition and its use during the bootstrapping process.
After this planning is complete, we can optionally prepare to use some
tools that will make bootstrapping <application>Vinum</application>
easier.</para>
<para>Several key questions must be answered in this
planning phase:</para>
<itemizedlist>
<listitem><para>
What file system and partitions will be needed?
</para></listitem>
<listitem><para>
How will they be used?
</para></listitem>
<listitem><para>
How will we name each spindle?
</para></listitem>
<listitem><para>
How will the partitions be ordered for each spindle?
</para></listitem>
<listitem><para>
How will partitions be assigned to the spindles?
</para></listitem>
<listitem><para>
How will partitions be configured? Resilience or performance?
</para></listitem>
<listitem><para>
What technique will be used to achieve resilience?
</para></listitem>
<listitem><para>
What spindles will be used?
</para></listitem>
<listitem><para>
How will they be configured on the available controllers?
</para></listitem>
<listitem><para>
How much space is required for each partition?
</para></listitem>
</itemizedlist>
<section id="P1E">
<title>Phase 1 Example</title>
<para>In this example, I will assume a scenario
where we are building
a minimal foundation for a failure-resilient server.
Hence we will need at least root,
<filename>/usr</filename>,
<filename>/home</filename>,
and swap partitions.
The root,
<filename>/usr</filename>, and
<filename>/home</filename> file systems all need resilience since the
server will not be much good without them.
The swap partition needs performance first and
generally does
not need resilience since nothing it holds needs to be retained
across a reboot.</para>
<section>
<title>Spindle Naming</title>
<para>The kernel would refer to the master spindle on
the primary and secondary ATA controllers as
<devicename>/dev/ad0</devicename> and
<devicename>/dev/ad2</devicename> respectively.
<footnote>
<para>
This assumes that you have not removed the line
<programlisting>options ATA_STATIC_ID</programlisting>
from your kernel configuration.
</para>
</footnote>
But <application>Vinum</application>
also needs to have a name for each spindle
that will stay the same name regardless
of how it is attached to the CPU (i.e., if the drive moves, the
<application>Vinum</application> name moves with the drive).</para>
<para>Some recovery techniques documented below suggest
moving a spindle from
the secondary ATA controller to the primary ATA controller.
(Indeed, the flexibility of making such moves is a key benefit
of <application>Vinum</application>
especially if you are managing a large number of spindles.)
After such a drive/controller swap,
the kernel will see what used to be
<devicename>/dev/ad2</devicename> as
<devicename>/dev/ad0</devicename>
but <application>Vinum</application>
will still call
it by whatever name it had when it was attached to
<devicename>/dev/ad2</devicename>
(i.e., when it was "created" or first made known to
<application>Vinum</application>).</para>
<para>Since connections can change, it is best to give
each spindle a unique, abstract
name that gives no hint of how it is attached.
Avoid names that suggest a manufacturer, model number,
physical location, or membership in a sequence
(e.g. avoid names like
<literal>upper</literal>, <literal>lower</literal>, etc.,
<literal>alpha</literal>, <literal>beta</literal>, etc.,
<literal>SCSI1</literal>, <literal>SCSI2</literal>, etc., or
<literal>Seagate1</literal>, <literal>Seagate2</literal> etc.).
Such names are likely to lose their uniqueness or
get out of sequence
someday even if they seem like great names today.</para>
<tip>
<para>Once you have picked names for your spindles,
label them with a permanent marker.
If you have hot-swappable hardware, write the names on the sleds
in which the spindles are mounted.
This will significantly reduce the likelihood of
error when you are moving spindles around later as
part of failure recovery or routine system management
procedures.</para></tip>
<para>In the instructions that follow,
<application>Vinum</application>
will name the root spindle <literal>YouCrazy</literal>
and the rootback spindle <literal>UpWindow</literal>.
I will only use <devicename>/dev/ad0</devicename>
when I want to refer to whichever
of the two spindles is currently attached as
<devicename>/dev/ad0</devicename>.</para>
</section
<section>
<title>Partition Ordering</title>
<para>Modern disk drives operate with fairly uniform areal
density across the surface of the disk.
That implies that more data is available under the heads without
seeking on the outer cylinders than on the inner cylinders.
We will allocate partitions most critical to system performance
from these outer cylinders as
<command>/stand/sysinstall</command> generally does.</para>
<para>The root file system is traditionally the outermost, even though
it generally is not as critical to system performance as others.
(However root can have a larger impact on performance if it contains
<filename>/tmp</filename> and <filename>/var</filename> as it
does in this example.)
The FreeBSD boot loaders assume that the
root file system lives in the <literal>a</literal> partition.
There is no requirement that the <literal>a</literal>
partition start on the outermost cylinders, but this
convention makes it easier to manage disk labels.</para>
<para>Swap performance is critical so it comes next on our way toward
the center.
I/O operations here tend to be large and contiguous.
Having as much data under the heads as possible avoids seeking
while swapping.</para>
<para>With all the smaller partitions out of the way, we finish
up the disk with
<filename>/home</filename> and
<filename>/usr</filename>.
Access patterns here tend not to be as intense as for other
file systems (especially if there is an abundant supply of RAM
and read cache hit rates are high).</para>
<para>If the pair of spindles you have are large enough to allow
for more than
<filename>/home</filename> and
<filename>/usr</filename>,
it is fine to plan for additional file systems here.</para>
</section
<section>
<title>Assigning Partitions to Spindles</title>
<para>We will want to assign
partitions to these spindles so that either can fail
without loss of data on file systems configured for
resilience.</para>
<para>Reliability on
<filename>/usr</filename> and
<filename>/home</filename>
is best achieved using <application>Vinum</application>
mirroring.
Resilience will have to come differently, however, for the root
file system since <application>Vinum</application>
is not a part of the FreeBSD boot sequence.
Here we will have to settle for two identical
partitions with a periodic copy from the primary to the
backup secondary.</para>
<para>The kernel already has support for interleaved swap across
all available partitions so there is no need for help from
<application>Vinum</application> here.
<command>/stand/sysinstall</command>
will automatically configure <filename>/etc/fstab</filename>
for all swap partitions given.</para>
<para>The &vinum.ap; bootstrapping method given below
requires a pair of spindles that I will call the
<firstterm>root spindle</firstterm> and the
<firstterm>rootback spindle</firstterm>.</para>
<important><para>The rootback spindle must be the same size or
larger than the root spindle.</para></important>
<para>These instructions first allocate all space on the root
spindle and then allocate exactly that amount of space on
a rootback spindle.
(After &vinum.ap; is bootstrapped, there is nothing special
about either of these spindles--they are interchangeable.)
You can later use the remaining space on the rootback spindle for
other file systems.</para>
<para>If you have more than two spindles, the
<literal>bootvinum</literal> Perl script and the procedure
below will help you initialize them for use with &vinum.ap;.
However you will have to figure out how to assign partitions
to them on your own.</para>
</section>
<section>
<title>Assigning Space to Partitions</title>
<para>For this example, I will use two spindles: one with
4,124,673 blocks (about 2 GB) on <devicename>/dev/ad0</devicename>
and one with 8,420,769 blocks (about 4 GB) on
<devicename>/dev/ad2</devicename>.</para>
<para>It is best to configure your two spindles on separate
controllers so that both can operate in parallel and
so that you will have failure resilience in case a
controller dies.
Note that mirrored volume write performance will be halved
in cases where both spindles share a controller that requires
they operate serially (as is often the case with ATA controllers).
One spindle will be the master on the primary ATA
controller and the other will be the master on the
secondary ATA controller.</para>
<para>Recall that we will be allocating space on the smaller
spindle first and the larger spindle second.</para>
</section>
<section id=AssignSmall>
<title>Assigning Partitions on the Root Spindle</title>
<para>We will allocate 200,000 blocks (about 93 MB)
for a root file system on each spindle
(<devicename>/dev/ad0s1a</devicename> and
<devicename>/dev/ad2s1a</devicename>).
We will initially allocate 200,265 blocks for a swap partition
on each spindle,
giving a total of about 186 MB of
swap space (<devicename>/dev/ad0s1b</devicename> and
<devicename>/dev/ad2s1b</devicename>).</para>
<note><para>We will lose 265 blocks from each swap partition
as part of the bootstrapping process.
This is the size of the space used by
<application>Vinum</application> to store configuration
information.
The space will be taken from swap and given to a vinum
partition but will be unavailable for
<application>Vinum</application> subdisks.</para></note>
<note><para>I have done the partition allocation in nice round
numbers of blocks just to emphasize where the 265 blocks go.
There is nothing wrong with allocating space in MB if that is
more convenient for you.</para></note>
<para>This leaves 4,124,673 - 200,000 - 200,265 = 3,724,408 blocks
(about 1,818 MB) on the root spindle for
<application>Vinum</application>
partitions (<devicename>/dev/ad0s1e</devicename> and
<devicename>/dev/ad2s1f</devicename>).
From this, allocate the 265 blocks for
<application>Vinum</application> configuration information,
1,000,000 blocks (about 488 MB)
for <filename>/home</filename>, and the remaining
2,724,408 blocks (about 1,330 MB) for
<filename>/usr</filename>.
See <xref linkend=ad0b4aft> below to see this graphically.</para>
<para>The left-hand side of
<xref linkend="ad0b4aft"> below shows what spindle ad0 will
look like at the end of phase 2.
The right-hand side shows what it will look like at the
end of phase 3.</para>
<figure id="ad0b4aft">
<title>Spindle ad0 Before and After Vinum</title>
<mediaobject>
<imageobject>
<imagedata fileref="ad0b4aft" format="EPS">
</imageobject>
<textobject>
<literallayout class="monospaced"> ad0 Before Vinum Offset (blocks) ad0 After Vinum
+----------------------+ <-- 0--> +----------------------+
| root | | root |
| /dev/ad0s1a | | /dev/ad0s1a |
+----------------------+ <-- 200000--> +----------------------+
| swap | | swap |
| /dev/ad0s1b | | /dev/ad0s1b |
| | 400000--> +----------------------+
| | | Vinum drive YouCrazy |
| | | /dev/ad0s1h |
+----------------------+ <-- 400265--> +-----------------+ |
| /home | | Vinum sd | |
| /dev/ad0s1e | | home.p0.s0 | |
+----------------------+ <--1400265--> +-----------------+ |
| /usr | | Vinum sd | |
| /dev/ad0s1f | | usr.p0.s0 | |
+----------------------+ <--4124673--> +-----------------+----+
Not to scale</literallayout>
</textobject>
<textobject>
<phrase>Spindle /dev/ad0 Before and After Vinum</phrase>
</textobject>
</mediaobject>
</figure>
</section>
<section id=AssignLarge>
<title>Assigning Partitions on the Rootback Spindle</title>
<para>The <filename>/rootback</filename> and swap partition sizes
on the rootback spindle must
match the root and swap partition sizes on the root spindle.
That leaves 8,420,769 - 200,000 - 200,265 = 8,020,504
blocks for the <application>Vinum</application> partition.
Mirrors of <filename>/home</filename> and
<filename>/usr</filename> receive the same allocation as on
the root spindle.
That will leave an extra 2 GB or so that we can deal
with later.
See <xref linkend=ad2b4aft> below to see this graphically.</para>
<para>The left-hand side of
<xref linkend="ad2b4aft"> below shows what spindle ad2 will
look like at the beginning of phase 4.
The right-hand side shows what it will look like at the end.</para>
<figure id="ad2b4aft">
<title>Spindle ad2 Before and After Vinum</title>
<mediaobject>
<imageobject>
<imagedata fileref="ad2b4aft" format="EPS">
</imageobject>
<textobject>
<literallayout class="monospaced"> ad2 Before Vinum Offset (blocks) ad2 After Vinum
+----------------------+ <-- 0--> +----------------------+
| /rootback | | /rootback |
| /dev/ad2s1e | | /dev/ad2s1a |
+----------------------+ <-- 200000--> +----------------------+
| swap | | swap |
| /dev/ad2s1b | | /dev/ad2s1b |
| | 400000--> +----------------------+
| | | Vinum drive UpWindow |
| | | /dev/ad2s1h |
+----------------------+ <-- 400265--> +-----------------+ |
| /NOFUTURE | | Vinum sd | |
| /dev/ad2s1f | | home.p1.s0 | |
| | 1400265--> +-----------------+ |
| | | Vinum sd | |
| | | usr.p1.s0 | |
| | 4124673--> +-----------------+ |
| | | Vinum sd | |
| | | hope.p0.s0 | |
+----------------------+ <--8420769--> +-----------------+----+
Not to scale</literallayout>
</textobject>
<textobject>
<phrase>Spindle ad2 Before and After Vinum</phrase>
</textobject>
</mediaobject>
</figure>
</section>
<section id=floppy>
<title>Preparation of Tools</title>
<para>The <literal>bootvinum</literal> Perl script given below in
<xref linkend=Perl> will make the
<application>Vinum</application> bootstrapping process much
easier if you can run it on the machine being bootstrapped.
It is over 200 lines and you would not want to type it in.
At this point, I recommend that you
copy it to a floppy or arrange some
alternative method of making it readily available
so that it can be available later when needed.
For example:</para>
<screen>&prompt.root; <userinput>fdformat -f 1440 /dev/fd0</userinput>
&prompt.root; <userinput>newfs_msdos -f 1440 /dev/fd0</userinput>
&prompt.root; <userinput>mount /dev/fd0 /mnt</userinput>
&prompt.root; <userinput>cp /usr/share/examples/vinum/bootvinum /mnt</userinput></screen>
<para>XXX Someday, I would like this script to live in
<filename>/usr/share/examples/vinum</filename>.
Till then, please use this
<ulink url="http://www.BGPBook.Com/vinum/bootvinum">link</ulink>
to get a copy.</para>
</section>
</section>
</section>
<section id="P2">
<title>Bootstrapping Phase 2: Minimal OS Installation</title>
<para>Our goal in this phase is to complete the smallest possible
FreeBSD installation in such a way that we can later install
<application>Vinum</application>.
We will use only
partitions of type <literal>4.2BSD</literal> (i.e., regular UFS file
systems) since that is the only type supported by
<command>/stand/sysinstall</command>.</para>
<section id="P2E">
<title>Phase 2 Example</title>
<procedure>
<step>
<para>Start up the FreeBSD installation process by running
<command>/stand/sysinstall</command> from
installation media as you normally would.</para></step>
<step>
<para>Fdisk partition all spindles as needed.</para>
<important>
<para>Make sure to select BootMgr for all spindles.</para></important>
</step>
<step>
<para>Partition the root spindle with appropriate block
allocations as described above in <xref linkend=AssignSmall>.
For this example on a 2 GB spindle, I will use
200,000 blocks for root, 200,265 blocks for swap,
1,000,000 blocks for <filename>/home</filename>, and
the rest of the spindle (2,724,408 blocks) for
<filename>/usr</filename>.
(<command>/stand/sysinstall</command>
should automatically assign these to
<devicename>/dev/ad0s1a</devicename>,
<devicename>/dev/ad0s1b</devicename>,
<devicename>/dev/ad0s1e</devicename>, and
<devicename>/dev/ad0s1f</devicename>
by default.)</para>
<note><para>If you prefer soft updates as I do and you are
using 4.4-RELEASE or better, this is a good time to enable
them.</para></note>
</step>
<step>
<para>Partition the rootback spindle with the appropriate block
allocations as described above in <xref linkend=AssignLarge>.
For this example on a 4 GB spindle, I will use
200,000 blocks for <filename>/rootback</filename>,
200,265 blocks for swap, and
the rest of the spindle (8,020,504 blocks) for
<filename>/NOFUTURE</filename>.
(<command>/stand/sysinstall</command>
should automatically assign these to
<devicename>/dev/ad2s1e</devicename>,
<devicename>/dev/ad2s1b</devicename>, and
<devicename>/dev/ad2s1f</devicename> by default.)</para>
<note>
<para>We do not really want to have a
<filename>/NOFUTURE</filename> UFS file system (we
want a vinum partition instead), but that is the
best choice we have for the space given the limitations of
<command>/stand/sysinstall</command>.
Mount point names beginning with <literal>NOFUTURE</literal>
and <literal>rootback</literal>
serve as sentinels to the bootstrapping
script presented in <xref linkend=Perl> below.</para></note>
</step>
<step>
<para>Partition any other spindles with swap if desired and a
single <filename>/NOFUTURExx</filename> file system.</para>
</step>
<step>
<para>Select a minimum system install for now even if you
want to end up with more distributions loaded later.</para>
<tip>
<para>Do not worry about system configuration options at this
point--get <application>Vinum</application>
set up and get the partitions in
the right places first.</para></tip>
</step>
<step>
<para>Exit <command>/stand/sysinstall</command> and reboot.
Do a quick test to verify that the minimum
installation was successful.</para>
</step>
</procedure>
<para>The left-hand side of <xref linkend=ad0b4aft> above
and the left-hand side of <xref linkend=ad2b4aft> above
show how the disks will look at this point.</para>
</section>
</section>
<section id="P3">
<title>Bootstrapping Phase 3: Root Spindle Setup</title>
<para>Our goal in this phase is get <application>Vinum</application>
set up and running on the
root spindle.
We will embed the existing
<filename>/usr</filename> and
<filename>/home</filename> file systems in a
<application>Vinum</application> partition.
Note that the <application>Vinum</application>
volumes created will not yet be
failure-resilient since we have
only one underlying <application>Vinum</application>
drive to hold them.
The resulting system will automatically start
<application>Vinum</application> as it boots to multi-user mode.</para>
<section id="P3E">
<title>Phase 3 Example</title>
<procedure>
<step>
<para>Login as root.</para>
</step>
<step>
<para>We will need a directory in the root file system in
which to keep a few files that will be used in the
<application>Vinum</application>
bootstrapping process.</para>
<screen>&prompt.root; <userinput>mkdir /bootvinum</userinput>
&prompt.root; <userinput>cd /bootvinum</userinput></screen>
</step>
<step>
<para>Several files need to be prepared for use in bootstrapping.
I have written a Perl script that makes all the required
files for you.
Copy this script to <filename>/bootvinum</filename> by
floppy disk, tape, network, or any convenient means and
then run it.
(If you cannot get this script copied onto the machine being
bootstrapped, then see <xref linkend=ManualBoot>
below for a manual alternative.)</para>
<screen>&prompt.root; <userinput>cp /mnt/bootvinum .</userinput>
&prompt.root; <userinput>./bootvinum</userinput></screen>
<note><para><literal>bootvinum</literal> produces no output
when run successfully.
If you get any errors,
something may have gone wrong when you were creating
partitions with
<command>/stand/sysinstall</command> above.</para></note>
<para>Running <literal>bootvinum</literal> will:</para>
<itemizedlist>
<listitem><para>
Create <filename>/etc/fstab.vinum</filename>
based on what it finds
in your existing <filename>/etc/fstab</filename>
</para></listitem>
<listitem><para>
Create new disk labels for each spindle mentioned
in <filename>/etc/fstab</filename> and keep copies of the
current disk labels
</para></listitem>
<listitem><para>
Create files needed as input to <command>vinum</command>
<option>create</option> for building
<application>Vinum</application> objects on each spindle
</para></listitem>
<listitem><para>
Create many alternates to <filename>/etc/fstab.vinum</filename>
that might come in handy should a spindle fail
</para></listitem>
</itemizedlist>
<para>You may want to take a look at these files to learn more
about the disk partitioning required for
<application>Vinum</application> or to learn more about the
commands needed to create
<application>Vinum</application> objects.</para>
</step>
<step>
<para>We now need to install new spindle partitioning for
<devicename>/dev/ad0</devicename>.
This requires that
<devicename>/dev/ad0s1b</devicename> not be in use for
swapping so we have to reboot in single-user mode.</para>
<substeps>
<step>
<para>First, reboot the system.</para>
<screen>&prompt.root; <userinput>reboot</userinput></screen>
</step>
<step>
<para>Next, enter single-user mode.</para>
<screen>Hit [Enter] to boot immediately, or any other key for command prompt.
Booting [kernel] in 8 seconds...
Type '?' for a list of commands, 'help' for more detailed help.
ok <userinput>boot -s</userinput</screen>
</step>
</substeps>
</step>
<step>
<para>In single-user mode, install the new partitioning
created above.</para>
<screen>&prompt.root; <userinput>cd /bootvinum</userinput>
&prompt.root; <userinput>disklabel -R ad0s1 disklabel.ad0s1</userinput>
&prompt.root; <userinput>disklabel -R ad2s1 disklabel.ad2s1</userinput></screen>
<note><para>If you have additional spindles, repeat the
above commands as appropriate for them.</para></note>
</step>
<step>
<para>We are about to start <application>Vinum</application>
for the first time.
It is going to want to create several device nodes under
<filename>/dev/vinum</filename> so we will need to mount the
root file system for read/write access.</para>
<screen>&prompt.root; <userinput>fsck -p /</userinput>
&prompt.root; <userinput>mount /</userinput></screen>
</step>
<step>
<para>Now it is time to create the <application>Vinum</application>
objects that
will embed the existing non-root file systems on
the root spindle in a
<application>Vinum</application> partition.
This will load the <application>Vinum</application>
kernel module and start <application>Vinum</application>
as a side effect.</para>
<screen>&prompt.root; <userinput>vinum create create.YouCrazy</userinput></screen>
<para>
You should see a list of <application>Vinum</application>
objects created that looks like the following:</para>
<screen>1 drives:
D YouCrazy State: up Device /dev/ad0s1h Avail: 0/1818 MB (0%)
2 volumes:
V home State: up Plexes: 1 Size: 488 MB
V usr State: up Plexes: 1 Size: 1330 MB
2 plexes:
P home.p0 C State: up Subdisks: 1 Size: 488 MB
P usr.p0 C State: up Subdisks: 1 Size: 1330 MB
2 subdisks:
S home.p0.s0 State: up PO: 0 B Size: 488 MB
S usr.p0.s0 State: up PO: 0 B Size: 1330 MB</screen>
<para>
You should also see several kernel messages
which state that the <application>Vinum</application>
objects you have created are now <literal>up</literal>.</para>
</step>
<step>
<para>Our non-root file systems should now be embedded in a
<application>Vinum</application> partition and
hence available through <application>Vinum</application>
volumes.
It is important to test that this embedding worked.</para>
<screen>&prompt.root; <userinput>fsck -n /dev/vinum/home</userinput>
&prompt.root; <userinput>fsck -n /dev/vinum/usr</userinput></screen>
<para>This should produce no errors.
If it does produce errors <emphasis>do not fix them</emphasis>.
Instead, go back and examine the root spindle partition tables
before and after <application>Vinum</application>
to see if you can spot the error.
You can back out the partition table changes by using
<command>disklabel -R</command> with the
<filename>disklabel.*.b4vinum</filename> files.</para>
</step>
<step>
<para>While we have the root file system mounted read/write, this is
a good time to install <filename>/etc/fstab</filename>.</para>
<screen>&prompt.root; <userinput>mv /etc/fstab /etc/fstab.b4vinum</userinput>
&prompt.root; <userinput>cp /etc/fstab.vinum /etc/fstab</userinput></screen>
</step>
<step>
<para>We are now done with tasks requiring single-user
mode, so it is safe to go multi-user from here on.</para>
<screen>&prompt.root; <userinput>^D</userinput></screen>
</step>
<step>
<para>Login as root.</para>
</step>
<step>
<para>Edit <filename>/etc/rc.conf</filename> and add this line:
<programlisting>start_vinum="YES"</programlisting></para>
</step>
</procedure>
</section>
</section>
<section id="P4">
<title>Bootstrapping Phase 4: Rootback Spindle Setup</title>
<para>Our goal in this phase is to get redundant copies of all data
from the root spindle to the rootback spindle.
We will first create the necessary <application>Vinum</application>
objects on the rootback spindle.
Then we will ask <application>Vinum</application>
to copy the data from the root spindle to the
rootback spindle.
Finally, we use <command>dump</command> and <command>restore</command>
to copy the root file system.</para>
<section id="P4E">
<title>Phase 4 Example</title>
<procedure>
<step>
<para>Now that <application>Vinum</application>
is running on the root spindle, we can bring
it up on the rootback spindle so that our
<application>Vinum</application> volumes can become
failure-resilient.</para>
<screen>&prompt.root; <userinput>cd /bootvinum</userinput>
&prompt.root; <userinput>vinum create create.UpWindow</userinput></screen>
<para>You should see a list of <application>Vinum</application>
objects created that
looks like the following:</para>
<screen>2 drives:
D YouCrazy State: up Device /dev/ad0s1h Avail: 0/1818 MB (0%)
D UpWindow State: up Device /dev/ad2s1h Avail: 2096/3915 MB (53%)
2 volumes:
V home State: up Plexes: 2 Size: 488 MB
V usr State: up Plexes: 2 Size: 1330 MB
4 plexes:
P home.p0 C State: up Subdisks: 1 Size: 488 MB
P usr.p0 C State: up Subdisks: 1 Size: 1330 MB
P home.p1 C State: faulty Subdisks: 1 Size: 488 MB
P usr.p1 C State: faulty Subdisks: 1 Size: 1330 MB
4 subdisks:
S home.p0.s0 State: up PO: 0 B Size: 488 MB
S usr.p0.s0 State: up PO: 0 B Size: 1330 MB
S home.p1.s0 State: stale PO: 0 B Size: 488 MB
S usr.p1.s0 State: stale PO: 0 B Size: 1330 MB</screen>
<para>You should also see several kernel messages
which state that some of the <application>Vinum</application>
objects you have created are now <literal>up</literal>
while others are <literal>faulty</literal> or
<literal>stale</literal>.</para>
</step>
<step>
<para>Now we ask <application>Vinum</application>
to copy each of the subdisks on drive
<literal>YouCrazy</literal> to drive <literal>UpWindow</literal>.
This will change the state of the newly created
<application>Vinum</application> subdisks
from <literal>stale</literal> to <literal>up</literal>.
It will also change the state of the newly created
<application>Vinum</application> plexes
from <literal>faulty</literal> to <literal>up</literal>.</para>
<para>First, we do the new subdisk we
added to <filename>/home</filename>.</para>
<screen>&prompt.root; <userinput>vinum start -w home.p1.s0</userinput>
reviving home.p1.s0
<emphasis>(time passes . . . )</emphasis>
home.p1.s0 is up by force
home.p1 is up
home.p1.s0 is up</screen>
<note>
<para>
My 5,400 RPM EIDE spindles copied at about 3.5 MBytes/sec.
Your mileage may vary.
</para>
</note>
</step>
<step>
<para>Next we do the new subdisk we
added to <filename>/usr</filename>.</para>
<screen>&prompt.root; <userinput>vinum -w start usr.p1.s0</userinput>
reviving usr.p1.s0
<emphasis>(time passes . . . )</emphasis>
usr.p1.s0 is up by force
usr.p1 is up
usr.p1.s0 is up</screen>
<para>All <application>Vinum</application>
objects should be in state <literal>up</literal> at this point.
The output of
<command>vinum list</command> should look
like the following:</para>
<screen>2 drives:
D YouCrazy State: up Device /dev/ad0s1h Avail: 0/1818 MB (0%)
D UpWindow State: up Device /dev/ad2s1h Avail: 2096/3915 MB (53%)
2 volumes:
V home State: up Plexes: 2 Size: 488 MB
V usr State: up Plexes: 2 Size: 1330 MB
4 plexes:
P home.p0 C State: up Subdisks: 1 Size: 488 MB
P usr.p0 C State: up Subdisks: 1 Size: 1330 MB
P home.p1 C State: up Subdisks: 1 Size: 488 MB
P usr.p1 C State: up Subdisks: 1 Size: 1330 MB
4 subdisks:
S home.p0.s0 State: up PO: 0 B Size: 488 MB
S usr.p0.s0 State: up PO: 0 B Size: 1330 MB
S home.p1.s0 State: up PO: 0 B Size: 488 MB
S usr.p1.s0 State: up PO: 0 B Size: 1330 MB</screen>
</step>
<step>
<para>Copy the root file system so that you will have a backup.</para>
<screen>&prompt.root; <userinput>cd /rootback</userinput>
&prompt.root; <userinput>dump 0f - / | restore rf -</userinput>
&prompt.root; <userinput>rm restoresymtable</userinput>
&prompt.root; <userinput>cd /</userinput></screen>
<note>
<para>You may see errors like this:</para>
<screen>./tmp/rstdir1001216411: (inode 558) not found on tape
cannot find directory inode 265
abort? [yn] <userinput>n</userinput>
expected next file 492, got 491</screen>
<para>They seem to cause no harm.
I suspect they are a consequence of dumping the file system
containing <filename>/tmp</filename> and/or the pipe
connecting <command>dump</command> and
<command>restore</command>.</para>
</note>
</step>
<step>
<para>Make a directory on which we can mount a damaged root
file system during the recovery process.</para>
<screen>&prompt.root; <userinput>mkdir /rootbad</userinput></screen>
</step>
<step>
<para>Remove sentinel mount points that are now unused.</para>
<screen>&prompt.root; <userinput>rmdir /NOFUTURE*</userinput></screen>
</step>
<step>
<para>Create empty &vinum.ap; drives on remaining spindles.</para>
<screen>&prompt.root; <userinput>vinum create create.ThruBank</userinput>
&prompt.root; <userinput>...</userinput></screen>
</step>
</procedure>
<para>At this point, the reliable server foundation is complete.
The right-hand side of <xref linkend=ad0b4aft> above
and the right-hand side of <xref linkend=ad2b4aft> above
show how the disks will look.</para>
<para>You may want to do a quick reboot to multi-user and give it
a quick test drive.
This is also a good point to complete installation
of other distributions beyond the minimal install.
Add packages, ports, and users as required.
Configure <filename>/etc/rc.conf</filename> as required.</para>
<tip>
<para>After you have completed your server configuration,
remember to do one more copy of root to
<filename>/rootback</filename> as shown above before placing
the server into production.</para></tip>
<tip>
<para>Make a schedule to refresh
<filename>/rootback</filename> periodically.</para></tip>
<tip>
<para>It may be a good idea to mount
<filename>/rootback</filename> read-only for normal operation
of the server.
This does, however, complicate the periodic refresh a bit.</para></tip>
<tip>
<para>Do not forget to watch
<filename>/var/log/messages</filename> carefully for errors.
<application>Vinum</application>
may automatically avoid failed hardware in a way that users
do not notice.
You must watch for such failures and get them repaired before a
second failure results in data loss.
You may see
<application>Vinum</application> noting damaged objects
at server boot time.</para></tip>
</section>
</section>
</section>
<section id="FromHere">
<title>Where to Go from Here?</title>
<para>Now that you have established the foundation of a reliable server,
there are several things you might want to try next.</para>
<section>
<title>Make a Vinum Volume with Remaining Space</title>
<para>Following are the steps to create another
<application>Vinum</application> volume with space remaining
on the rootback spindle.</para>
<note><para>This volume will not be resilient to spindle failure
since it has only one plex on a single spindle.</para></note>
<procedure>
<step>
<para>Create a file with the following contents:</para>
<programlisting>volume hope
plex name hope.p0 org concat volume hope
sd name hope.p0.s0 drive UpWindow plex hope.p0 len 0</programlisting>
<note>
<para>Specifying a length of <literal>0</literal> for
the <filename>hope.p0.s0</filename> subdisk
asks <application>Vinum</application>
to use whatever space is left available on the underlying
drive.</para></note>
</step>
<step>
<para>Feed these commands into <command>vinum</command> <option>create</option>.</para>
<screen>&prompt.root; <userinput>vinum create <replaceable>filename</replaceable</userinput></screen>
</step>
<step>
<para>Now we <command>newfs</command> the volume and
<command>mount</command> it.</para>
<screen>&prompt.root; <userinput>newfs -v /dev/vinum/hope</userinput>
&prompt.root; <userinput>mkdir /hope</userinput>
&prompt.root; <userinput>mount /dev/vinum/hope /hope</userinput></screen>
</step>
<step>
<para>Edit <filename>/etc/fstab</filename> if you want
<filename>/hope</filename> mounted at boot time.</para>
</step>
</procedure>
</section>
<section>
<title>Try Out More Vinum Commands</title>
<para>You might already be familiar with
<command>vinum</command> <option>list</option> to get a list of
all <application>Vinum</application> objects.
Try <option>-v</option> following it to see more detail.</para>
<para>If you have more spindles and you want to bring them up as
concatenated, mirrored, or striped volumes, then give
<command>vinum</command> <option>concat</option> <replaceable>drivelist</replaceable>,
<command>vinum</command> <option>mirror</option> <replaceable>drivelist</replaceable>, or
<command>vinum</command> <option>stripe</option> <replaceable>drivelist</replaceable> a try.</para>
<para>See &man.vinum.8; for sample configurations and important
performance considerations before settling on a final organization
for your additional spindles.</para>
<para>The failure recovery instructions below will also give you
some experience using more <application>Vinum</application>
commands.</para>
</section>
</section>
<section id="Failures">
<title>Failure Scenarios</title>
<para>This section contains descriptions of various failure scenarios.
For each scenario, there is a subsection on how to configure your
server for degraded mode operation, how to recover from the failure,
how to exit degraded mode, and how to simulate the failure.</para>
<tip>
<para>Make a hard copy of these instructions and leave them inside the CPU
case, being careful not to interfere with ventilation.</para></tip>
<section id="ad0RootBad">
<title>Root file system on ad0 unusable, rest of drive ok</title>
<note>
<para>We assume here that the boot blocks and disk label on
<devicename>/dev/ad0</devicename> are ok.
If your BIOS can boot from a drive other than
<devicename>C:</devicename>, you may be able to get around this
limitation.</para></note>
<section id="enter1">
<title>Configure Server for Degraded Mode</title>
<procedure>
<step>
<para>Use BootMgr to load kernel from
<devicename>/dev/ad2s1a</devicename>.</para>
<substeps>
<step>
<para>Hit <keycap>F5</keycap> in BootMgr to select
<literal>Drive 1</literal>.</para>
</step>
<step>
<para>Hit <keycap>F1</keycap> to select
<literal>FreeBSD</literal>.</para>
</step>
</substeps>
</step>
<step>
<para>After the kernel is loaded, hit any key but enter to interrupt
the boot sequence.
Boot into single-user mode and allow explicit entry of
a root file system.</para>
<screen>Hit [Enter] to boot immediately, or any other key for command prompt.
Booting [kernel] in 8 seconds...
Type '?' for a list of commands, 'help' for more detailed help.
ok <userinput>boot -as</userinput</screen>
</step>
<step>
<para>Select <filename>/rootback</filename>
as your root file system.</para>
<screen>Manual root file system specification:
&lt;fstype>:&lt;device> Mount &lt;device> using filesystem &lt;fstype>
e.g. ufs:/dev/da0s1a
? List valid disk boot devices
&lt;empty line> Abort manual input
mountroot> <userinput>ufs:/dev/ad2s1a</userinput></screen>
</step>
<step>
<para>Now that you are in single-user mode, change
<filename>/etc/fstab</filename> to avoid the
bad root file system.</para>
<tip>
<para>If you used the <literal>bootvinum</literal> Perl script from <xref linkend=Perl>
below, then these commands should configure your server for
degraded mode.</para>
<screen>&prompt.root; <userinput>fsck -p /</userinput>
&prompt.root; <userinput>mount /</userinput>
&prompt.root; <userinput>cd /etc</userinput>
&prompt.root; <userinput>mv fstab fstab.bak</userinput>
&prompt.root; <userinput>cp fstab_ad0s1_root_bad fstab</userinput>
&prompt.root; <userinput>cd /</userinput>
&prompt.root; <userinput>mount -o ro /</userinput>
&prompt.root; <userinput>vinum start</userinput>
&prompt.root; <userinput>fsck -p</userinput>
&prompt.root; <userinput>^D</userinput></screen>
</tip>
</step>
</procedure>
</section>
<section>
<title>Recovery</title>
<procedure>
<step>
<para>Restore <devicename>/dev/ad0s1a</devicename> from
backups or copy
<filename>/rootback</filename> to it with these commands:</para>
<screen>&prompt.root; <userinput>umount /rootbad</userinput>
&prompt.root; <userinput>newfs /dev/ad0s1a</userinput>
&prompt.root; <userinput>tunefs -n enable /dev/ad0s1a</userinput>
&prompt.root; <userinput>mount /rootbad</userinput>
&prompt.root; <userinput>cd /rootbad</userinput>
&prompt.root; <userinput>dump 0f - / | restore rf -</userinput>
&prompt.root; <userinput>rm restoresymtable</userinput></screen>
</step>
</procedure>
</section>
<section>
<title>Exiting Degraded Mode</title>
<procedure>
<step>
<para>Enter single-user mode.</para>
<screen>&prompt.root; <userinput>shutdown now</userinput></screen>
</step>
<step>
<para>Put <filename>/etc/fstab</filename> back to
normal and reboot.</para>
<screen>&prompt.root; <userinput>cd /rootbad/etc</userinput>
&prompt.root; <userinput>rm fstab</userinput>
&prompt.root; <userinput>mv fstab.bak fstab</userinput>
&prompt.root; <userinput>reboot</userinput></screen>
</step>
<step>
<para>Reboot and hit <keycap>F1</keycap> to boot from
<devicename>/dev/ad0</devicename> when
prompted by BootMgr.</para>
</step>
</procedure>
</section>
<section>
<title>Simulation</title>
<para>This kind of failure can be simulated by shutting down to
single-user mode and then booting as shown above in
<xref linkend=enter1>.</para>
</section>
</section>
<section id="ad2Bad">
<title>Drive ad2 Fails</title>
<para>This section deals with the total failure of
<devicename>/dev/ad2</devicename>.</para>
<section>
<title>Configure Server for Degraded Mode</title>
<procedure>
<step>
<para>After the kernel is loaded, hit any key but
<keycap>Enter</keycap> to interrupt the boot sequence.
Boot into single-user mode.</para>
<screen>Hit [Enter] to boot immediately, or any other key for command prompt.
Booting [kernel] in 8 seconds...
Type '?' for a list of commands, 'help' for more detailed help.
ok <userinput>boot -s</userinput</screen>
</step>
<step>
<para>Change
<filename>/etc/fstab</filename> to avoid the bad drive.
If you used the <literal>bootvinum</literal> Perl script from <xref linkend=Perl>
below, then
these commands should configure your server for
degraded mode.</para>
<screen>&prompt.root; <userinput>fsck -p /</userinput>
&prompt.root; <userinput>mount /</userinput>
&prompt.root; <userinput>cd /etc</userinput>
&prompt.root; <userinput>mv fstab fstab.bak</userinput>
&prompt.root; <userinput>cp fstab_only_have_ad0s1 fstab</userinput>
&prompt.root; <userinput>cd /</userinput>
&prompt.root; <userinput>mount -o ro /</userinput>
&prompt.root; <userinput>vinum start</userinput>
&prompt.root; <userinput>fsck -p</userinput>
&prompt.root; <userinput>^D</userinput></screen>
<para>If you do not have modified versions of
<filename>/etc/fstab</filename> that are ready for use,
then you can use <command>ed</command> to make one.
Alternatively, you can <command>fsck</command> and
<command>mount</command>
<filename>/usr</filename> and then use your
favorite editor.</para>
</step>
</procedure>
</section>
<section id=ad2Recov>
<title>Recovery</title>
<procedure>
<para>We assume here that your server is up and running multi-user in
degraded mode on just
<devicename>/dev/ad0</devicename> and that you have
a new spindle now on
<devicename>/dev/ad2</devicename> ready to go.</para>
<para>You will need a new spindle with enough room to hold root and swap
partitions plus a <application>Vinum</application>
partition large enough to hold
<filename>/home</filename> and <filename>/usr</filename>.</para>
<step>
<para>Create a BIOS partition (slice) on the new spindle.</para>
<screen>&prompt.root; <userinput>/stand/sysinstall</userinput></screen>
<substeps>
<step><para>Select <literal>Custom</literal>.</para></step>
<step><para>Select <literal>Partition</literal>.</para></step>
<step><para>Select <devicename>ad2</devicename>.</para></step>
<step><para>Create a FreeBSD (type 165) slice
large enough to hold everything mentioned above.</para></step>
<step><para>Write changes.</para></step>
<step><para>Yes, you are absolutely sure.</para></step>
<step><para>Select BootMgr.</para></step>
<step><para>Quit Partitioning.</para></step>
<step><para>Exit <command>/stand/sysinstall</command>.</para></step>
</substeps>
</step>
<step>
<para>Create disk label partitioning based on current
<devicename>/dev/ad0</devicename> partitioning.</para>
<screen>&prompt.root; <userinput>disklabel ad0 > /tmp/ad0</userinput>
&prompt.root; <userinput>disklabel -e ad2</userinput></screen>
<para>This will drop you into your favorite editor.</para>
<substeps>
<step>
<para>Copy the lines for the <literal>a</literal> and
<literal>b</literal> partitions from
<filename>/tmp/ad0</filename> to the
<devicename>ad2</devicename> disklabel.</para>
</step>
<step>
<para>Add the <literal>size</literal> of the
<literal>a</literal> and
<literal>b</literal> partitions to find the proper
<literal>offset</literal> for the
<literal>h</literal> partition.</para>
</step>
<step>
<para>Subtract this <literal>offset</literal> from the
<literal>size</literal> of the <literal>c</literal>
partition to find the proper <literal>size</literal> for the <literal>h</literal>
partition.</para>
</step>
<step>
<para>Define an <literal>h</literal> partition with the
<literal>size</literal> and
<literal>offset</literal> calculated above.</para>
</step>
<step>
<para>Set the <literal>fstype</literal> column to
<literal>vinum</literal>.</para>
</step>
<step>
<para>Save the file and quit your editor.</para>
</step>
</substeps>
</step>
<step>
<para>Tell <application>Vinum</application>
about the new drive.</para>
<substeps>
<step>
<para>Ask <application>Vinum</application> to start an
editor with a copy of the current configuration.</para>
<screen>&prompt.root; <userinput>vinum create</userinput></screen>
</step>
<step>
<para>Uncomment the drive line referring to drive
<literal>UpWindow</literal> and set
<literal>device</literal> to
<devicename>/dev/ad2s1h</devicename>.</para></step>
<step>
<para>Save the file and quit your editor.</para></step>
</substeps>
</step>
<step>
<para>Now that <application>Vinum</application>
has two spindles again, revive the mirrors.</para>
<screen>&prompt.root; <userinput>vinum start -w usr.p1.s0</userinput>
&prompt.root; <userinput>vinum start -w home.p1.s0</userinput></screen>
</step>
<step>
<para>Now we need to restore
<filename>/rootback</filename> to a current copy of the
root file system.
These commands will accomplish this.</para>
<screen>&prompt.root; <userinput>newfs /dev/ad2s1a</userinput>
&prompt.root; <userinput>tunefs -n enable /dev/ad2s1a</userinput>
&prompt.root; <userinput>mount /dev/ad2s1a /mnt</userinput>
&prompt.root; <userinput>cd /mnt</userinput>
&prompt.root; <userinput>dump 0f - / | restore rf -</userinput>
&prompt.root; <userinput>rm restoresymtable</userinput>
&prompt.root; <userinput>cd /</userinput>
&prompt.root; <userinput>umount /mnt</userinput></screen>
</step>
</procedure>
</section>
<section>
<title>Exiting Degraded Mode</title>
<procedure>
<step>
<para>Enter single-user mode.</para>
<screen>&prompt.root; <userinput>shutdown now</userinput></screen>
</step>
<step>
<para>Return <filename>/etc/fstab</filename> to
its normal state and reboot.</para>
<screen>&prompt.root; <userinput>cd /etc</userinput>
&prompt.root; <userinput>rm fstab</userinput>
&prompt.root; <userinput>mv fstab.bak fstab</userinput>
&prompt.root; <userinput>reboot</userinput></screen>
</step>
</procedure>
</section>
<section>
<title>Simulation</title>
<para>You can simulate this kind of failure by unplugging
<devicename>/dev/ad2</devicename>, write-protecting it,
or by this procedure:</para>
<procedure>
<step>
<para>Shutdown to single-user mode.</para>
</step>
<step>
<para>Unmount all non-root file systems.</para>
</step>
<step>
<para>Clobber any existing <application>Vinum</application>
configuration and partitioning on
<devicename>/dev/ad2</devicename>.</para>
<screen>&prompt.root; <userinput>vinum stop</userinput>
&prompt.root; <userinput>dd if=/dev/zero of=/dev/ad2s1h count=512</userinput>
&prompt.root; <userinput>dd if=/dev/zero of=/dev/ad2 count=512</userinput></screen>
</step>
</procedure>
</section>
</section>
<section id="ad0Bad">
<title>Drive ad0 Fails</title>
<para>Some BIOSes can boot from drive 1 or drive 2 (often called
<devicename>C:</devicename> or <devicename>D:</devicename>),
while others can boot only from drive 1.
If your BIOS can boot from either, the fastest road to recovery
might be to boot directly from <filename>/dev/ad2</filename>
in single-user mode and
install <filename>/etc/fsatb_only_have_ad2s1</filename> as
<filename>/etc/fstab</filename>.
You would then have to adapt the <filename>/dev/ad2</filename>
failure recovery instructions from <xref linkend=ad2Recov> above.</para>
<para>If your BIOS can only boot from drive one, then you will have to
unplug drive <literal>YouCrazy</literal> from the controller for
<devicename>/dev/ad2</devicename> and plug it
into the controller for <devicename>/dev/ad0</devicename>.
Then continue with the instructions for
<devicename>/dev/ad2</devicename> failure recovery
in <xref linkend=ad2Recov> above.</para>
</section>
</section>
<appendix id="Perl">
<title>bootvinum Perl Script</title>
<para>The <literal>bootvinum</literal> Perl script below reads <filename>/etc/fstab</filename>
and current drive partitioning.
It then writes several files in the current directory and several
variants of <filename>/etc/fstab</filename> in <filename>/etc</filename>.
These files significantly simplify the installation of
<application>Vinum</application> and recovery from
spindle failures.</para>
<programlisting>#!/usr/bin/perl -w
use strict;
use FileHandle;
my $config_tag1 = '$Id: article.sgml,v 1.4 2001-10-31 23:12:55 chern Exp $';
# Copyright (C) 2001 Robert A. Van Valzah
#
# Bootstrap Vinum
#
# Read /etc/fstab and current partitioning for all spindles mentioned there.
# Generate files needed to mirror all file systems on root spindle.
# A new partition table for each spindle
# Input for the vinum create command to create Vinum objects on each spindle
# A copy of fstab mounting Vinum volumes instead of BSD partitions
# Copies of fstab altered for server's degraded modes of operation
# See handbook for instructions on how to use the the files generated.
# N.B. This bootstrapping method shrinks size of swap partition by the size
# of Vinum's on-disk configuration (265 sectors). It embeds existing file
# systems on the root spindle in Vinum objects without having to copy them.
# Thanks to Greg Lehey for suggesting this bootstrapping method.
# Expectations:
# The root spindle must contain at least root, swap, and /usr partitions
# The rootback spindle must have matching /rootback and swap partitions
# Other spindles should only have a /NOFUTURE* file system and maybe swap
# File systems named /NOFUTURE* will be replaced with Vinum drives
# Change configuration variables below to suit your taste
my $vip = 'h'; # VInum Partition
my @drv = ('YouCrazy', 'UpWindow', 'ThruBank', # Vinum DRiVe names
'OutSnakes', 'MeWild', 'InMovie', 'HomeJames', 'DownPrices', 'WhileBlind');
# No configuration variables beyond this point
my %vols; # One entry per Vinum volume to be created
my @spndl; # One entry per SPiNDLe
my $rsp; # Root SPindle (as in /dev/$rsp)
my $rbsp; # RootBack SPindle (as in /dev/$rbsp)
my $cfgsiz = 265; # Size of Vinum on-disk configuration info in sectors
my $nxtpas = 2; # Next fsck pass number for non-root file systems
# Parse fstab, generating the version we'll need for Vinum and noting
# spindles in use.
my $fsin = "/etc/fstab";
#my $fsin = "simu/fstab";
open(FSIN, "$fsin") || die("Couldn't open $fsin: $!\n");
my $fsout = "/etc/fstab.vinum";
open(FSOUT, ">$fsout") || die("Couldn't open $fsout for writing: $!\n");
while (&lt;FSIN>) {
my ($dev, $mnt, $fstyp, $opt, $dump, $pass) = split;
next if $dev =~ /^#/;
if ($mnt eq '/' || $mnt eq '/rootback' || $mnt =~ /^\/NOFUTURE/) {
my $dn = substr($dev, 5, length($dev)-6); # Device Name without /dev/
push(@spndl, $dn) unless grep($_ eq $dn, @spndl);
$rsp = $dn if $mnt eq '/';
next if $mnt =~ /^\/NOFUTURE/;
}
# Move /rootback from partition e to a
if ($mnt =~ /^\/rootback/) {
$dev =~ s/e$/a/;
$pass = 1;
$rbsp = substr($dev, 5, length($dev)-6);
print FSOUT "$dev\t\t$mnt\t$fstyp\t$opt\t\t$dump\t$pass\n";
next;
}
# Move non-root file systems on smallest spindle into Vinum
if (defined($rsp) && $dev =~ /^\/dev\/$rsp/ && $dev =~ /[d-h]$/) {
$pass = $nxtpas++;
print FSOUT "/dev/vinum$mnt\t\t$mnt\t\t$fstyp\t$opt\t\t$dump\t$pass\n";
$vols{$dev}->{mnt} = substr($mnt, 1);
next;
}
print FSOUT $_;
}
close(FSOUT);
die("Found more spindles than we have abstract names\n") if $#spndl > $#drv;
die("Didn't find a root partition!\n") if !defined($rsp);
die("Didn't find a /rootback partition!\n") if !defined($rbsp);
# Table of server's Degraded Modes
# One row per mode with hash keys
# fn FileName
# xpr eXPRession needed to convert fstab lines for this mode
# cm1 CoMment 1 describing this mode
# cm2 CoMment 2 describing this mode
# FH FileHandle (dynamically initialized below)
my @DM = (
{ cm1 => "When we only have $rsp, comment out lines using $rbsp",
fn => "/etc/fstab_only_have_$rsp",
xpr => "s:^/dev/$rbsp:#\$&:",
},
{ cm1 => "When we only have $rbsp, comment out lines using $rsp and",
cm2 => "rootback becomes root",
fn => "/etc/fstab_only_have_$rbsp",
xpr => "s:^/dev/$rsp:#\$&: || s:/rootback:/\t:",
},
{ cm1 => "When only $rsp root is bad, /rootback becomes root and",
cm2 => "root becomes /rootbad",
fn => "/etc/fstab_${rsp}_root_bad",
xpr => "s:\t/\t:\t/rootbad: || s:/rootback:/\t:",
},
);
# Initialize output FileHandles and write comments
foreach my $dm (@DM) {
my $fh = new FileHandle;
$fh->open(">$dm->{fn}") || die("Can't write $dm->{fn}: $!\n");
print $fh "# $dm->{cm1}\n" if $dm->{cm1};
print $fh "# $dm->{cm2}\n" if $dm->{cm2};
$dm->{FH} = $fh;
}
# Parse the Vinum version of fstab written above and write versions needed
# for server's degraded modes.
open(FSOUT, "$fsout") || die("Couldn't open $fsout: $!\n");
while (&lt;FSOUT>) {
my $line = $_;
foreach my $dm (@DM) {
$_ = $line;
eval $dm->{xpr};
print {$dm->{FH}} $_;
}
}
# Parse partition table for each spindle and write versions needed for Vinum
my $rootsiz; # ROOT partition SIZe
my $swapsiz; # SWAP partition SIZe
my $rspminoff; # Root SPindle MINimum OFFset of non-root, non-swap, non-c parts
my $rspsiz; # Root SPindle SIZe
my $rbspsiz; # RootBack SPindle SIZe
foreach my $i (0..$#spndl) {
my $dlin = "disklabel $spndl[$i] |";
# my $dlin = "simu/disklabel.$spndl[$i]";
open(DLIN, "$dlin") || die("Couldn't open $dlin: $!\n");
my $dlout = "disklabel.$spndl[$i]";
open(DLOUT, ">$dlout") || die("Couldn't open $dlout for writing: $!\n");
my $dlb4 = "$dlout.b4vinum";
open(DLB4, ">$dlb4") || die("Couldn't open $dlb4 for writing: $!\n");
my $minoff; # MINimum OFFset of non-root, non-swap, non-c partitions
my $totsiz = 0; # TOTal SIZe of all non-root, non-swap, non-c partitions
my $swapspndl = 0; # True if SWAP partition on this SPiNDLe
while (&lt;DLIN>) {
print DLB4 $_;
my ($part, $siz, $off, $fstyp, $fsiz, $bsiz, $bps) = split;
if ($part && $part eq 'a:' && $spndl[$i] eq $rsp) {
$rootsiz = $siz;
}
if ($part && $part eq 'e:' && $spndl[$i] eq $rbsp) {
if ($rootsiz != $siz) {
die("Rootback size ($siz) != root size ($rootsiz)\n");
}
}
if ($part && $part eq 'c:') {
$rspsiz = $siz if $spndl[$i] eq $rsp;
$rbspsiz = $siz if $spndl[$i] eq $rbsp;
}
# Make swap partition $cfgsiz sectors smaller
if ($part && $part eq 'b:') {
if ($spndl[$i] eq $rsp) {
$swapsiz = $siz;
} else {
if ($swapsiz != $siz) {
die("Swap partition sizes unequal across spindles\n");
}
}
printf DLOUT "%4s%9d%9d%10s\n", $part, $siz-$cfgsiz, $off, $fstyp;
$swapspndl = 1;
next;
}
# Move rootback spindle e partitions to a
if ($part && $part eq 'e:' && $spndl[$i] eq $rbsp) {
printf DLOUT "%4s%9d%9d%10s%9d%6d%6d\n", 'a:', $siz, $off, $fstyp,
$fsiz, $bsiz, $bps;
next;
}
# Delete non-root, non-swap, non-c partitions but note their minimum
# offset and total size that're needed below.
if ($part && $part =~ /^[d-h]:$/) {
$minoff = $off unless $minoff;
$minoff = $off if $off &lt; $minoff;
$totsiz += $siz;
if ($spndl[$i] eq $rsp) { # If doing spindle containing root
my $dev = "/dev/$spndl[$i]" . substr($part, 0, 1);
$vols{$dev}->{siz} = $siz;
$vols{$dev}->{off} = $off;
$rspminoff = $minoff;
}
next;
}
print DLOUT $_;
}
if ($swapspndl) { # If there was a swap partition on this spindle
# Make a Vinum partition the size of all non-root, non-swap,
# non-c partitions + the size of Vinum's on-disk configuration.
# Set its offset so that the start of the first subdisk it contains
# coincides with the first file system we're embedding in Vinum.
printf DLOUT "%4s%9d%9d%10s\n", "$vip:", $totsiz+$cfgsiz, $minoff-$cfgsiz,
'vinum';
} else {
# No need to mess with size size and offset if there was no swap
printf DLOUT "%4s%9d%9d%10s\n", "$vip:", $totsiz, $minoff,
'vinum';
}
}
die("Swap partition not found\n") unless $swapsiz;
die("Swap partition not larger than $cfgsiz blocks\n") unless $swapsiz>$cfgsiz;
die("Rootback spindle size not >= root spindle size\n") unless $rbspsiz>=$rspsiz;
# Generate input to vinum create command needed for each spindle.
foreach my $i (0..$#spndl) {
my $cfn = "create.$drv[$i]"; # Create File Name
open(CF, ">$cfn") || die("Can't open $cfn for writing: $!\n");
print CF "drive $drv[$i] device /dev/$spndl[$i]$vip\n";
next unless $spndl[$i] eq $rsp || $spndl[$i] eq $rbsp;
foreach my $dev (keys(%vols)) {
my $mnt = $vols{$dev}->{mnt};
my $siz = $vols{$dev}->{siz};
my $off = $vols{$dev}->{off}-$rspminoff+$cfgsiz;
print CF "volume $mnt\n" if $spndl[$i] eq $rsp;
print CF &lt;&lt;EOF;
plex name $mnt.p$i org concat volume $mnt
sd name $mnt.p$i.s0 drive $drv[$i] plex $mnt.p$i len ${siz}s driveoffset ${off}s
EOF
}
}</programlisting>
</appendix>
<appendix id=ManualBoot>
<title>Manual Vinum Bootstrapping</title>
<para>The <literal>bootvinum</literal> Perl script in <xref linkend=Perl> makes life easier, but
it may be necessary to manually perform some or all of the steps that
it automates.
This appendix describes how you would manually mimic the script.</para>
<procedure>
<step>
<para>Make a copy of <filename>/etc/fstab</filename>
to be customized.</para>
<screen>&prompt.root; <userinput>cp /etc/fstab /etc/fstab.vinum</userinput></screen>
</step>
<step>
<para>Edit <filename>/etc/fstab.vinum</filename>.</para>
<substeps>
<step>
<para>Change the <literal>device</literal> column of
non-root partitions on the root spindle to
<filename>/dev/vinum/mnt</filename>.</para></step>
<step>
<para>Change the <literal>pass</literal> column of
non-root partitions on the root spindle to <userinput>2</userinput>,
<userinput>3</userinput>, etc.</para></step>
<step>
<para>Delete any lines with mountpoint
matching <filename>/NOFUTURE*</filename>.</para></step>
<step>
<para>Change the <literal>device</literal> column of
<filename>/rootback</filename>
from <literal>e</literal> to
<literal>a</literal>.</para></step>
<step>
<para>Change the <literal>pass</literal> column of
<filename>/rootback</filename> to
<userinput>1</userinput>.</para></step>
</substeps>
</step>
<step>
<para>Prepare disklabels for editing:</para>
<screen>&prompt.root; <userinput>cd /bootvinum</userinput>
&prompt.root; <userinput>disklabel ad0s1 > disklabel.ad0s1</userinput>
&prompt.root; <userinput>cp disklabel.ad0s1 disklabel.ad0s1.b4vinum</userinput>
&prompt.root; <userinput>disklabel ad2s1 > disklabel.ad2s1</userinput>
&prompt.root; <userinput>cp disklabel.ad2s1 disklabel.ad2s1.b4vinum</userinput></screen>
</step>
<step>
<para>Edit <filename>/etc/disklabel.ad?s1</filename>.</para>
<substeps>
<step>
<para>On the root spindle:</para>
<substeps>
<step>
<para>Decrease the <literal>size</literal> of the
<literal>b</literal> partition by 265 blocks.</para></step>
<step>
<para>Note the <literal>size</literal> and
<literal>offset</literal> of the <literal>a</literal> and
<literal>b</literal> partitions.</para></step>
<step>
<para>Note the smallest <literal>offset</literal> for partitions
<literal>d</literal>-<literal>h</literal>.</para></step>
<step>
<para>Note the <literal>size</literal> and
<literal>offset</literal> for all non-root, non-swap
partitions (<filename>/home</filename> was probably on
<literal>e</literal> and <filename>/usr</filename> was
probably on <literal>f</literal>).</para></step>
<step>
<para>Delete partitions
<literal>d</literal>-<literal>h</literal>.</para></step>
<step>
<para>Create a new <literal>h</literal> partition with
<literal>offset</literal> 265 blocks less than the
smallest <literal>offset</literal>
for partitions <literal>d</literal>-<literal>h</literal>
noted above.
Set its <literal>size</literal> to the <literal>size</literal>
of the <literal>c</literal> partition less the
smallest <literal>offset</literal>
for partitions <literal>d</literal>-<literal>h</literal>
noted above + 265 blocks.</para>
<note>
<para><application>Vinum</application>
can use any partition other than <literal>c</literal>.
It is not strictly necessary to use <literal>h</literal>
for all your <application>Vinum</application>
partitions, but it is good practice to
be consistent across all spindles.</para></note>
</step>
<step>
<para>Set the <literal>fstype</literal> of this new
partition to <userinput>vinum</userinput>.</para></step>
</substeps>
</step>
<step>
<para>On the rootback spindle:</para>
<substeps>
<step>
<para>Move the <literal>e</literal> partition to
<literal>a</literal>.</para></step>
<step>
<para>Verify that the <literal>size</literal> of the
<literal>a</literal> and
<literal>b</literal> partitions matches the
root spindle.</para></step>
<step>
<para>Note the smallest <literal>offset</literal> for partitions
<literal>d</literal>-<literal>h</literal>.</para></step>
<step>
<para>Delete partitions
<literal>d</literal>-<literal>h</literal>.</para></step>
<step>
<para>Create a new <literal>h</literal> partition with
<literal>offset</literal> 265 blocks less than the
smallest <literal>offset</literal>
noted above for partitions
<literal>d</literal>-<literal>h</literal>.
Set its <literal>size</literal> to the <literal>size</literal>
of the <literal>c</literal> partition less the
smallest <literal>offset</literal>
for partitions <literal>d</literal>-<literal>h</literal>
noted above + 265 blocks.</para></step>
<step>
<para>Set the <literal>fstype</literal> of this new
partition to <userinput>vinum</userinput>.</para></step>
</substeps>
</step>
</substeps>
</step>
<step>
<para>Create a file named
<filename>create.YouCrazy</filename> that contains:</para>
<programlisting>drive YouCrazy device /dev/ad0s1h
volume home
plex name home.p0 org concat volume home
sd name home.p0.s0 drive YouCrazy plex home.p0 len $hl driveoffset $ho
volume usr
plex name usr.p0 org concat volume usr
sd name usr.p0.s0 drive YouCrazy plex usr.p0 len $ul driveoffset $uo</programlisting>
<para>Where:</para>
<itemizedlist>
<listitem><para>
<literal>$hl</literal> is the length noted above for
<filename>/home</filename>.</para></listitem>
<listitem><para>
<literal>$ho</literal> is the offset noted above for
<filename>/home</filename> less the smallest offset
noted above + 265 blocks.</para></listitem>
<listitem><para>
<literal>$ul</literal> is the length noted above for
<filename>/usr</filename>.</para></listitem>
<listitem><para>
<literal>$uo</literal> is the offset noted above for
<filename>/usr</filename> less the smallest offset
noted above + 265 blocks.</para></listitem>
</itemizedlist>
</step>
<step>
<para>Create a file named
<filename>create.UpWindow</filename> containing:</para>
<programlisting>drive UpWindow device /dev/ad2s1h
plex name home.p1 org concat volume home
sd name home.p1.s0 drive UpWindow plex home.p1 len $hl driveoffset $ho
plex name usr.p1 org concat volume usr
sd name usr.p1.s0 drive UpWindow plex usr.p1 len $ul driveoffset $uo</programlisting>
<para>Where <literal>$hl</literal>, <literal>$ho</literal>, <literal>$ul</literal>, and <literal>$uo</literal> are set as above.</para>
</step>
</procedure>
</appendix>
<appendix id="Acknowledgements">
<title>Acknowledgements</title>
<para>I would like to thank Greg Lehey for writing &vinum.ap; and for
providing very helpful comments on early drafts.
Several others made helpful suggestions after reviewing later drafts
including
Dag-Erling Sm&oslash;rgrav,
Michael Splendoria,
Chern Lee,
Stefan Aeschbacher,
Fleming Froekjaer,
Bernd Walter,
Aleksey Baranov, and
Doug Swarin.</para>
</appendix>
</article>