Finally, write down a section or two, and document how a Vinum volume
can be used for the root filesystem.
This commit is contained in:
parent
651c38eacc
commit
6efff8762a
Notes:
svn2git
2020-12-08 03:00:23 +00:00
svn path=/head/; revision=16892
1 changed files with 463 additions and 1 deletions
|
@ -897,6 +897,9 @@
|
|||
<screen>&prompt.root; <userinput>newfs /dev/vinum/concat</userinput>
|
||||
newfs: /dev/vinum/concat: can't figure out file system partition</screen>
|
||||
|
||||
<note><para>The following is only valid for FreeBSD versions
|
||||
prior to 5.0:</para></note>
|
||||
|
||||
<para>In order to create a file system on this volume, use the
|
||||
<option>-v</option> option to &man.newfs.8;:</para>
|
||||
|
||||
|
@ -958,7 +961,7 @@ sd name bigraid.p0.s4 drive e plex bigraid.p0 state initializing len 4194304b dr
|
|||
if they have been assigned different UNIX™ drive
|
||||
IDs.</para>
|
||||
|
||||
<sect3>
|
||||
<sect3 id="vinum-rc-startup">
|
||||
<title>Automatic Startup</title>
|
||||
|
||||
<para>In order to start Vinum automatically when you boot the
|
||||
|
@ -988,4 +991,463 @@ sd name bigraid.p0.s4 drive e plex bigraid.p0 state initializing len 4194304b dr
|
|||
</sect3>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="vinum-root">
|
||||
<title>Using Vinum for the root filesystem</title>
|
||||
|
||||
<para>For a machine that has fully-mirrored filesystems using
|
||||
Vinum, it is desirable to also mirror the root filesystem.
|
||||
Setting up such a configuration is less trivial than mirroring
|
||||
an arbitrary filesystem because:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>The root filesystem must be available very early during
|
||||
the boot process, so the Vinum infrastructure must already be
|
||||
available at this time.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>The volume containing the root filesystem also contains
|
||||
the system bootstrap and the kernel, which must be read
|
||||
using the host system's native utilites (e. g. the BIOS on
|
||||
PC-class machines) which often cannot be taught about the
|
||||
details of Vinum.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>In the following sections, the term <quote>root
|
||||
volume</quote> is generally used to describe the Vinum volume
|
||||
that contains the root filesystem. It is probably a good idea
|
||||
to use the name <literal>"root"</literal> for this volume, but
|
||||
this is not technically required in any way. All command
|
||||
examples in the following sections assume this name though.</para>
|
||||
|
||||
<sect2>
|
||||
<title>Starting up Vinum early enough for the root
|
||||
filesystem</title>
|
||||
|
||||
<para>There are several measures to take for this to
|
||||
happen:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Vinum must be available in the kernel at boot-time.
|
||||
Thus, the method to start Vinum automatically described in
|
||||
<xref linkend="vinum-rc-startup"> is not applicable to
|
||||
accomplish this task, and the
|
||||
<literal>start_vinum</literal> parameter must actually
|
||||
<emphasis>not</emphasis> be set when the following setup
|
||||
is being arranged. The first option would be to compile
|
||||
Vinum statically into the kernel, so it is available all
|
||||
the time, but this is usually not desirable. There is
|
||||
another option as well, to have
|
||||
<filename>/boot/loader</filename> (<xref
|
||||
linkend="boot-loader">) load the vinum kernel module
|
||||
early, before starting the kernel. This can be
|
||||
accomplished by putting the line</para>
|
||||
|
||||
<para><literal>vinum_load="YES"</literal></para>
|
||||
|
||||
<para>into the file
|
||||
<filename>/boot/loader.conf</filename>.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>Vinum must be initialized early since it needs to
|
||||
supply the volume for the root filesystem. By default,
|
||||
the Vinum kernel part is not looking for drives that might
|
||||
contain Vinum volume information until the administrator
|
||||
(or one of the startup scripts) issues a <command>vinum
|
||||
start</command> command.</para>
|
||||
|
||||
<note><para>The following paragraphs are outlining the steps
|
||||
needed for FreeBSD 5.x and above. The setup required for
|
||||
FreeBSD 4.x differs, and is described below in <xref
|
||||
linkend="vinum-root-4x">.</para></note>
|
||||
|
||||
<para>By placing the line:</para>
|
||||
|
||||
<para><literal>vinum.autostart="YES"</literal></para>
|
||||
|
||||
<para>into <filename>/boot/loader.conf</filename>, Vinum is
|
||||
instructed to automatically scan all drives for Vinum
|
||||
information as part of the kernel startup.</para>
|
||||
|
||||
<para>Note that it is not necessary to instruct the kernel
|
||||
where to look for the root filesystem.
|
||||
<filename>/boot/loader</filename> looks up the name of the
|
||||
root device in <filename>/etc/fstab</filename>, and passes
|
||||
this information on to the kernel. When it comes to mount
|
||||
the root filesystem, the kernel figures out from the
|
||||
devicename provided which driver to ask to translate this
|
||||
into the internal device ID (major/minor number).</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Making a Vinum-based root volume accessible to the
|
||||
bootstrap</title>
|
||||
|
||||
<para>Since the current FreeBSD bootstrap is only 7.5 KB of
|
||||
code, and already has the burden of reading files (like
|
||||
<filename>/boot/loader</filename>) from the UFS filesystem, it
|
||||
is sheer impossible to also teach it about internal Vinum
|
||||
structures so it could parse the Vinum configuration data, and
|
||||
figure out about the elements of a boot volume itself. Thus,
|
||||
some tricks are necessary to provide the bootstrap code with
|
||||
the illusion of a standard <literal>"a"</literal> partition
|
||||
that contains the root filesystem.</para>
|
||||
|
||||
<para>For this to be possible at all, the following requirements
|
||||
must be met for the root volume:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>The root volume must not be striped or RAID-5.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>The root volume must not contain more than one
|
||||
concatenated subdisk per plex.</para>
|
||||
</itemizedlist>
|
||||
|
||||
<para>Note that it is desirable and possible that there are
|
||||
multiple plexes, each containing one replica of the root
|
||||
filesystem. The bootstrap process will, however, only use one
|
||||
of these replica for finding the bootstrap and all the files,
|
||||
until the kernel will eventually mount the root filesystem
|
||||
itself. Each single subdisk within these plexes will then
|
||||
need its own <literal>"a"</literal> partition illusion, for
|
||||
the respective device to become bootable. It is not strictly
|
||||
needed that each of these faked <literal>"a"</literal>
|
||||
partitions is located at the same offset within its device,
|
||||
compared with other devices containing plexes of the root
|
||||
volume. However, it is probably a good idea to create the
|
||||
Vinum volumes that way so the resulting mirrored devices are
|
||||
symmetric, to avoid confusion.</para>
|
||||
|
||||
<para>In order to setup these <literal>"a"</literal> partitions,
|
||||
for each device containing part of the root volume, the
|
||||
following needs to be done:</para>
|
||||
|
||||
<procedure>
|
||||
<step>
|
||||
<para>The location (offset from the beginning of the device)
|
||||
and size of this device's subdisk that is part of the root
|
||||
volume need to be examined, using the command</para>
|
||||
|
||||
<para><command>vinum l -rv root</command></para>
|
||||
|
||||
<para>Note that Vinum offsets and sizes are measured in
|
||||
bytes. They must be divided by 512 in order to obtain the
|
||||
block numbers that are to be used in the
|
||||
<command>disklabel</command> command.</para>
|
||||
</step>
|
||||
|
||||
<step>
|
||||
<para>Run the command</para>
|
||||
|
||||
<para><command>disklabel -e
|
||||
</command><replaceable>devname</replaceable></para>
|
||||
|
||||
<para>for each device that participates in the root volume.
|
||||
<replaceable>devname</replaceable> must be either the name
|
||||
of the disk (like <devicename>da0</devicename>) for disks
|
||||
without a slice (aka. fdisk) table, or the name of the
|
||||
slice (like <devicename>ad0s1</devicename>).</para>
|
||||
|
||||
<para>If there is already an <literal>"a"</literal>
|
||||
partition on the device (presumably, containing a
|
||||
pre-Vinum root filesystem), it should be renamed to
|
||||
something else, so it remains accessible (just in case),
|
||||
but will no longer be used by default to bootstrap the
|
||||
system. Note that active partitions (like a root
|
||||
filesystem currently mounted) cannot be renamed, so this
|
||||
must be executed either when being booted from a
|
||||
<quote>Fixit</quote> medium, or in a two-step process,
|
||||
where (in a mirrored situation) the disk that has not been
|
||||
currently booted is being manipulated first.</para>
|
||||
|
||||
<para>Then, the offset the Vinum partition on this
|
||||
device (if any) must be added to the offset of the
|
||||
respective root volume subdisk on this device. The
|
||||
resulting value will become the
|
||||
<literal>"offset"</literal> value for the new
|
||||
<literal>"a"</literal> partition. The
|
||||
<literal>"size"</literal> value for this partition can be
|
||||
taken verbatim from the calculation above. The
|
||||
<literal>"fstype"</literal> should be
|
||||
<literal>4.2BSD</literal>. The
|
||||
<literal>"fsize"</literal>, <literal>"bsize"</literal>,
|
||||
and <literal>"cpg"</literal> values should best be chosen
|
||||
to match the actual filesystem, though they are fairly
|
||||
unimportant within this context.</para>
|
||||
|
||||
<para>That way, a new <literal>"a"</literal> partition will
|
||||
be established that overlaps the Vinum partition on this
|
||||
device. Note that the <command>disklabel</command> will
|
||||
only allow for this overlap if the Vinum partition has
|
||||
properly been marked using the <literal>"vinum"</literal>
|
||||
fstype.</para>
|
||||
</step>
|
||||
|
||||
<step>
|
||||
<para>That's all! A faked <literal>"a"</literal> partition
|
||||
does exist now on each device that has one replica of the
|
||||
root volume. It is highly recommendable to verify the
|
||||
result again, using a command like</para>
|
||||
|
||||
<para><command>fsck -n
|
||||
</command><devicename>/dev/<replaceable>devname</replaceable>a</devicename></para>
|
||||
</step>
|
||||
</procedure>
|
||||
|
||||
<para>It should be remembered that all files containing control
|
||||
information must be relative to the root filesystem in the
|
||||
Vinum volume which, when setting up a new Vinum root volume,
|
||||
might not match the root filesystem that is currently active.
|
||||
So in particular, the files <filename>/etc/fstab</filename>
|
||||
and <filename>/boot/loader.conf</filename> need to be taken
|
||||
care of.</para>
|
||||
|
||||
<para>At next reboot, the bootstrap should figure out the
|
||||
appropriate control information from the new Vinum-based root
|
||||
filesystem, and act accordingly. At the end of the kernel
|
||||
initialization process, after all devices have been announced,
|
||||
the prominent notice that shows the success of this setup is a
|
||||
message like:</para>
|
||||
|
||||
<para><screen>Mounting root from ufs:/dev/vinum/root</screen></para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Example of a Vinum-based root setup</title>
|
||||
|
||||
<para>After the Vinum root volume has been set up, the output of
|
||||
<command>vinum l -rv root</command> could look like:</para>
|
||||
|
||||
<para>
|
||||
<screen>
|
||||
...
|
||||
Subdisk root.p0.s0:
|
||||
Size: 125829120 bytes (120 MB)
|
||||
State: up
|
||||
Plex root.p0 at offset 0 (0 B)
|
||||
Drive disk0 (/dev/da0h) at offset 135680 (132 kB)
|
||||
|
||||
Subdisk root.p1.s0:
|
||||
Size: 125829120 bytes (120 MB)
|
||||
State: up
|
||||
Plex root.p1 at offset 0 (0 B)
|
||||
Drive disk1 (/dev/da1h) at offset 135680 (132 kB)
|
||||
</screen>
|
||||
</para>
|
||||
|
||||
<para>The values to note are <literal>135680</literal> for the
|
||||
offset (relative to partition
|
||||
<devicename>/dev/da0h</devicename>). This translates to 265
|
||||
512-byte disk blocks in <command>disklabel</command>'s terms.
|
||||
Likewise, the size of this root volume is 245760 512-byte
|
||||
blocks. <devicename>/dev/da1h</devicename>, containing the
|
||||
second replica of this root volume, has a symmetric
|
||||
setup.</para>
|
||||
|
||||
<para>The disklabel for these devices might look like:</para>
|
||||
|
||||
<para>
|
||||
<screen>
|
||||
...
|
||||
8 partitions:
|
||||
# size offset fstype [fsize bsize bps/cpg]
|
||||
a: 245760 281 4.2BSD 2048 16384 0 # (Cyl. 0*- 15*)
|
||||
c: 71771688 0 unused 0 0 # (Cyl. 0 - 4467*)
|
||||
h: 71771672 16 vinum # (Cyl. 0*- 4467*)
|
||||
</screen>
|
||||
</para>
|
||||
|
||||
<para>It can be observed that the <literal>"size"</literal>
|
||||
parameter for the faked <literal>"a"</literal> partition
|
||||
matches the value outlined above, while the
|
||||
<literal>"offset"</literal> parameter is the sum of the offset
|
||||
within the Vinum partition <literal>"h"</literal>, and the
|
||||
offset of this partition within the device (or slice). This
|
||||
is a typical setup that is necessary to avoid the problem
|
||||
described in <xref linkend="vinum-root-panic">. It can also
|
||||
be seen that the entire <literal>"a"</literal> partition is
|
||||
completely within the <literal>"h"</literal> partition
|
||||
containing all the Vinum data for this device.</para>
|
||||
|
||||
<para>Note that in the above example, the entire device is
|
||||
dedicated to Vinum, and there is no leftover pre-Vinum root
|
||||
partition, since this has been a newly set-up disk that was
|
||||
only meant to be part of a Vinum configuration, ever.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Troubleshooting</title>
|
||||
|
||||
<para>If something goes wrong, a way is needed to recover from
|
||||
the situation. The following list contains few known pitfalls
|
||||
and solutions.</para>
|
||||
|
||||
<sect3>
|
||||
<title>System bootstrap loads, but system does not boot</title>
|
||||
|
||||
<para>If for any reason the system does not continue to boot,
|
||||
the bootstrap can be interrupted with by pressing the
|
||||
<keycap>space</keycap> key at the 10-seconds warning. The
|
||||
loader variables (like <literal>vinum.autostart</literal>)
|
||||
can be examined using the <command>show</command>, and
|
||||
manipulated using <command>set</command> or
|
||||
<command>unset</command> commands.</para>
|
||||
|
||||
<para>If the only problem was that the Vinum kernel module was
|
||||
not yet in the list of modules to load automatically, a
|
||||
simple <command>load vinum</command> will help.</para>
|
||||
|
||||
<para>When ready, the boot process can be continued with a
|
||||
<command>boot -as</command>. The options
|
||||
<option>-as</option> will request the kernel to ask for the
|
||||
root filesystem to mount (<option>-a</option>), and make the
|
||||
boot process stop in single-user mode (<option>-s</option>),
|
||||
where the root filesystem is mounted read-only. That way,
|
||||
even if only one plex of a multi-plex volume has been
|
||||
mounted, no data inconsitency between plexes is being
|
||||
risked.</para>
|
||||
|
||||
<para>At the prompt asking for a root filesystem to mount, any
|
||||
device that contains a valid root filesystem can be entered.
|
||||
If <filename>/etc/fstab</filename> had been set up
|
||||
correctly, the default should be something like
|
||||
<literal>ufs:/dev/vinum/root</literal>. A typical alternate
|
||||
choice would be something like
|
||||
<userinput>ufs:da0d</userinput> which could be a
|
||||
hypothetical partition that contains the pre-Vinum root
|
||||
filesystem. Care should be taken if one of the alias
|
||||
<literal>"a"</literal> partitions are entered here that are
|
||||
actually reference to the subdisks of the Vinum root device,
|
||||
because in a mirrored setup, this would only mount one piece
|
||||
of a mirrored root device. If this filesystem is to be
|
||||
mounted read-write later on, it is necessary to remove the
|
||||
other plex(es) of the Vinum root volume since these plexes
|
||||
would otherwise carry inconsistent data.</para>
|
||||
</sect3>
|
||||
|
||||
<sect3>
|
||||
<title>Only primary bootstrap loads</title>
|
||||
|
||||
<para>If <filename>/boot/loader</filename> fails to load, but
|
||||
the primary bootstrap still loads (visible by a single dash
|
||||
in the left column of the screen right after the boot
|
||||
process starts), an attempt can be made to interrupt the
|
||||
primary bootstrap at this point, using the
|
||||
<keycap>space</keycap> key. This will make the bootstrap
|
||||
stop in stage two, see <xref linkend="boot-boot1">. An
|
||||
attempt can be made here to boot off an alternate partition,
|
||||
like the partition containing the previous root filesystem
|
||||
that has been moved away from <literal>"a"</literal>
|
||||
above.</para>
|
||||
</sect3>
|
||||
|
||||
<sect3 id="vinum-root-panic">
|
||||
<title>Nothing boots, the bootstrap
|
||||
panics</title>
|
||||
|
||||
<para>This situation will happen if the bootstrap had been
|
||||
destroyed by the Vinum installation. Unfortunately, Vinum
|
||||
accidentally currently leaves only 4 KB at the beginning of
|
||||
its partition free before starting to write its Vinum header
|
||||
information. However, the stage one and two bootstraps plus
|
||||
the disklabel embedded between them currently require 8 KB.
|
||||
So if a Vinum partition was started at offset 0 within a
|
||||
slice or disk that was meant to be bootable, the Vinum setup
|
||||
will trash the bootstrap.</para>
|
||||
|
||||
<para>Similarly, if the above situation has been recovered,
|
||||
for example by booting from a <quote>Fixit</quote> medium,
|
||||
and the bootstrap has been re-installed using
|
||||
<command>disklabel -B</command> as described in <xref
|
||||
linkend="boot-boot1">, the bootstrap will trash the Vinum
|
||||
header, and Vinum will no longer find its disk(s). Though
|
||||
no actual Vinum configuration data or data in Vinum volumes
|
||||
will be trashed by this, and it would be possible to recover
|
||||
all the data by entering exact the same Vinum configuration
|
||||
data again, the situation is hard to fix at all. It would
|
||||
be necessary to move the entire Vinum partition by at least
|
||||
4 KB off, in order to have the Vinum header and the system
|
||||
bootstrap no longer collide.</para>
|
||||
</sect3>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="vinum-root-4x">
|
||||
<title>Differences for FreeBSD 4.x</title>
|
||||
|
||||
<para>Under FreeBSD 4.x, some internal functions required to
|
||||
make Vinum automatically scan all disks are missing, and the
|
||||
code that figures out the internal ID of the root device is
|
||||
not smart enough to handle a name like
|
||||
<devicename>/dev/vinum/root</devicename> automatically.
|
||||
Therefore, things are a little different here.</para>
|
||||
|
||||
<para>Vinum must explicitly be told which disks to scan, using a
|
||||
line like the following one in
|
||||
<filename>/boot/loader.conf</filename>:</para>
|
||||
|
||||
<para><literal>vinum.drives="/dev/<replaceable>da0</replaceable>
|
||||
/dev/<replaceable>da1</replaceable>"</literal></para>
|
||||
|
||||
<para>It is important that all drives are mentioned that could
|
||||
possibly contain Vinum data. It does not harm if
|
||||
<emphasis>more</emphasis> drives are listed, nor is it
|
||||
necessary to add each slice and/or partition explicitly, since
|
||||
Vinum will scan all slices and partitions of the named drives
|
||||
for valid Vinum headers.</para>
|
||||
|
||||
<para>Since the routines used to parse the name of the root
|
||||
filesystem, and derive the device ID (major/minor number) are
|
||||
only prepared to handle <quote>classical</quote> device names
|
||||
like <devicename>/dev/ad0s1a</devicename>, they cannot make
|
||||
any sense out of a root volume name like
|
||||
<devicename>/dev/vinum/root</devicename>. For that reason,
|
||||
Vinum itself needs to pre-setup the internal kernel parameter
|
||||
that holds the ID of the root device during its own
|
||||
initialization. This is requested by passing the name of the
|
||||
root volume in the loader variable
|
||||
<literal>vinum.root</literal>. The entry in
|
||||
<filename>/boot/loader.conf</filename> to accomplish this
|
||||
looks like:</para>
|
||||
|
||||
<para><literal>vinum.root="root"</literal></para>
|
||||
|
||||
<para>Now, when the kernel initialization tries to find out the
|
||||
root device to mount, it sees whether some kernel module has
|
||||
already pre-initialized the kernel parameter for it. If that
|
||||
is the case, <emphasis>and</emphasis> the device claiming the
|
||||
root device matches the major number of the driver as figured
|
||||
out from the name of the root device string being passed (that
|
||||
is, <literal>"vinum"</literal> in our case), it will use the
|
||||
pre-allocated device ID, instead of trying to figure out one
|
||||
itself. That way, during the usual automatic startup, it can
|
||||
continue to mount the Vinum root volume for the root
|
||||
filesystem.</para>
|
||||
|
||||
<para>However, when <command>boot -a</command> has been
|
||||
requesting to ask for entering the name of the root device
|
||||
manually, it must be noted that this routine still cannot
|
||||
actually parse a name entered there that refers to a Vinum
|
||||
volume. If any device name is entered that does not refer to
|
||||
a Vinum device, the mismatch between the major numbers of the
|
||||
pre-allocated root parameter and the driver as figured out
|
||||
from the given name will make this routine enter its normal
|
||||
parser, so entering a string like
|
||||
<userinput>ufs:da0d</userinput> will work as expected. Note
|
||||
that if this fails, it is however no longer possible to
|
||||
re-enter a string like <userinput>ufs:vinum/root</userinput>
|
||||
again, since it cannot be parsed. The only way out is to
|
||||
reboot again, and start over then. (At the
|
||||
<quote>askroot</quote> prompt, the initial
|
||||
<devicename>/dev/</devicename> can always be omitted.)</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
|
Loading…
Reference in a new issue