From 6efff8762ae5d52e93a3a7076e235d3648746b67 Mon Sep 17 00:00:00 2001 From: Joerg Wunsch Date: Tue, 13 May 2003 11:18:47 +0000 Subject: [PATCH] Finally, write down a section or two, and document how a Vinum volume can be used for the root filesystem. --- .../books/handbook/vinum/chapter.sgml | 464 +++++++++++++++++- 1 file changed, 463 insertions(+), 1 deletion(-) diff --git a/en_US.ISO8859-1/books/handbook/vinum/chapter.sgml b/en_US.ISO8859-1/books/handbook/vinum/chapter.sgml index 0035d932a5..eea2528198 100644 --- a/en_US.ISO8859-1/books/handbook/vinum/chapter.sgml +++ b/en_US.ISO8859-1/books/handbook/vinum/chapter.sgml @@ -897,6 +897,9 @@ &prompt.root; newfs /dev/vinum/concat newfs: /dev/vinum/concat: can't figure out file system partition + The following is only valid for FreeBSD versions + prior to 5.0: + In order to create a file system on this volume, use the option to &man.newfs.8;: @@ -958,7 +961,7 @@ sd name bigraid.p0.s4 drive e plex bigraid.p0 state initializing len 4194304b dr if they have been assigned different UNIX™ drive IDs. - + Automatic Startup In order to start Vinum automatically when you boot the @@ -988,4 +991,463 @@ sd name bigraid.p0.s4 drive e plex bigraid.p0 state initializing len 4194304b dr + + + Using Vinum for the root filesystem + + For a machine that has fully-mirrored filesystems using + Vinum, it is desirable to also mirror the root filesystem. + Setting up such a configuration is less trivial than mirroring + an arbitrary filesystem because: + + + + The root filesystem must be available very early during + the boot process, so the Vinum infrastructure must already be + available at this time. + + + The volume containing the root filesystem also contains + the system bootstrap and the kernel, which must be read + using the host system's native utilites (e. g. the BIOS on + PC-class machines) which often cannot be taught about the + details of Vinum. + + + + In the following sections, the term root + volume is generally used to describe the Vinum volume + that contains the root filesystem. It is probably a good idea + to use the name "root" for this volume, but + this is not technically required in any way. All command + examples in the following sections assume this name though. + + + Starting up Vinum early enough for the root + filesystem + + There are several measures to take for this to + happen: + + + + Vinum must be available in the kernel at boot-time. + Thus, the method to start Vinum automatically described in + is not applicable to + accomplish this task, and the + start_vinum parameter must actually + not be set when the following setup + is being arranged. The first option would be to compile + Vinum statically into the kernel, so it is available all + the time, but this is usually not desirable. There is + another option as well, to have + /boot/loader () load the vinum kernel module + early, before starting the kernel. This can be + accomplished by putting the line + + vinum_load="YES" + + into the file + /boot/loader.conf. + + + + Vinum must be initialized early since it needs to + supply the volume for the root filesystem. By default, + the Vinum kernel part is not looking for drives that might + contain Vinum volume information until the administrator + (or one of the startup scripts) issues a vinum + start command. + + The following paragraphs are outlining the steps + needed for FreeBSD 5.x and above. The setup required for + FreeBSD 4.x differs, and is described below in . + + By placing the line: + + vinum.autostart="YES" + + into /boot/loader.conf, Vinum is + instructed to automatically scan all drives for Vinum + information as part of the kernel startup. + + Note that it is not necessary to instruct the kernel + where to look for the root filesystem. + /boot/loader looks up the name of the + root device in /etc/fstab, and passes + this information on to the kernel. When it comes to mount + the root filesystem, the kernel figures out from the + devicename provided which driver to ask to translate this + into the internal device ID (major/minor number). + + + + + + Making a Vinum-based root volume accessible to the + bootstrap + + Since the current FreeBSD bootstrap is only 7.5 KB of + code, and already has the burden of reading files (like + /boot/loader) from the UFS filesystem, it + is sheer impossible to also teach it about internal Vinum + structures so it could parse the Vinum configuration data, and + figure out about the elements of a boot volume itself. Thus, + some tricks are necessary to provide the bootstrap code with + the illusion of a standard "a" partition + that contains the root filesystem. + + For this to be possible at all, the following requirements + must be met for the root volume: + + + + The root volume must not be striped or RAID-5. + + + + The root volume must not contain more than one + concatenated subdisk per plex. + + + Note that it is desirable and possible that there are + multiple plexes, each containing one replica of the root + filesystem. The bootstrap process will, however, only use one + of these replica for finding the bootstrap and all the files, + until the kernel will eventually mount the root filesystem + itself. Each single subdisk within these plexes will then + need its own "a" partition illusion, for + the respective device to become bootable. It is not strictly + needed that each of these faked "a" + partitions is located at the same offset within its device, + compared with other devices containing plexes of the root + volume. However, it is probably a good idea to create the + Vinum volumes that way so the resulting mirrored devices are + symmetric, to avoid confusion. + + In order to setup these "a" partitions, + for each device containing part of the root volume, the + following needs to be done: + + + + The location (offset from the beginning of the device) + and size of this device's subdisk that is part of the root + volume need to be examined, using the command + + vinum l -rv root + + Note that Vinum offsets and sizes are measured in + bytes. They must be divided by 512 in order to obtain the + block numbers that are to be used in the + disklabel command. + + + + Run the command + + disklabel -e + devname + + for each device that participates in the root volume. + devname must be either the name + of the disk (like da0) for disks + without a slice (aka. fdisk) table, or the name of the + slice (like ad0s1). + + If there is already an "a" + partition on the device (presumably, containing a + pre-Vinum root filesystem), it should be renamed to + something else, so it remains accessible (just in case), + but will no longer be used by default to bootstrap the + system. Note that active partitions (like a root + filesystem currently mounted) cannot be renamed, so this + must be executed either when being booted from a + Fixit medium, or in a two-step process, + where (in a mirrored situation) the disk that has not been + currently booted is being manipulated first. + + Then, the offset the Vinum partition on this + device (if any) must be added to the offset of the + respective root volume subdisk on this device. The + resulting value will become the + "offset" value for the new + "a" partition. The + "size" value for this partition can be + taken verbatim from the calculation above. The + "fstype" should be + 4.2BSD. The + "fsize", "bsize", + and "cpg" values should best be chosen + to match the actual filesystem, though they are fairly + unimportant within this context. + + That way, a new "a" partition will + be established that overlaps the Vinum partition on this + device. Note that the disklabel will + only allow for this overlap if the Vinum partition has + properly been marked using the "vinum" + fstype. + + + + That's all! A faked "a" partition + does exist now on each device that has one replica of the + root volume. It is highly recommendable to verify the + result again, using a command like + + fsck -n + /dev/devnamea + + + + It should be remembered that all files containing control + information must be relative to the root filesystem in the + Vinum volume which, when setting up a new Vinum root volume, + might not match the root filesystem that is currently active. + So in particular, the files /etc/fstab + and /boot/loader.conf need to be taken + care of. + + At next reboot, the bootstrap should figure out the + appropriate control information from the new Vinum-based root + filesystem, and act accordingly. At the end of the kernel + initialization process, after all devices have been announced, + the prominent notice that shows the success of this setup is a + message like: + + Mounting root from ufs:/dev/vinum/root + + + + Example of a Vinum-based root setup + + After the Vinum root volume has been set up, the output of + vinum l -rv root could look like: + + + +... +Subdisk root.p0.s0: + Size: 125829120 bytes (120 MB) + State: up + Plex root.p0 at offset 0 (0 B) + Drive disk0 (/dev/da0h) at offset 135680 (132 kB) + +Subdisk root.p1.s0: + Size: 125829120 bytes (120 MB) + State: up + Plex root.p1 at offset 0 (0 B) + Drive disk1 (/dev/da1h) at offset 135680 (132 kB) + + + + The values to note are 135680 for the + offset (relative to partition + /dev/da0h). This translates to 265 + 512-byte disk blocks in disklabel's terms. + Likewise, the size of this root volume is 245760 512-byte + blocks. /dev/da1h, containing the + second replica of this root volume, has a symmetric + setup. + + The disklabel for these devices might look like: + + + +... +8 partitions: +# size offset fstype [fsize bsize bps/cpg] + a: 245760 281 4.2BSD 2048 16384 0 # (Cyl. 0*- 15*) + c: 71771688 0 unused 0 0 # (Cyl. 0 - 4467*) + h: 71771672 16 vinum # (Cyl. 0*- 4467*) + + + + It can be observed that the "size" + parameter for the faked "a" partition + matches the value outlined above, while the + "offset" parameter is the sum of the offset + within the Vinum partition "h", and the + offset of this partition within the device (or slice). This + is a typical setup that is necessary to avoid the problem + described in . It can also + be seen that the entire "a" partition is + completely within the "h" partition + containing all the Vinum data for this device. + + Note that in the above example, the entire device is + dedicated to Vinum, and there is no leftover pre-Vinum root + partition, since this has been a newly set-up disk that was + only meant to be part of a Vinum configuration, ever. + + + + Troubleshooting + + If something goes wrong, a way is needed to recover from + the situation. The following list contains few known pitfalls + and solutions. + + + System bootstrap loads, but system does not boot + + If for any reason the system does not continue to boot, + the bootstrap can be interrupted with by pressing the + space key at the 10-seconds warning. The + loader variables (like vinum.autostart) + can be examined using the show, and + manipulated using set or + unset commands. + + If the only problem was that the Vinum kernel module was + not yet in the list of modules to load automatically, a + simple load vinum will help. + + When ready, the boot process can be continued with a + boot -as. The options + will request the kernel to ask for the + root filesystem to mount (), and make the + boot process stop in single-user mode (), + where the root filesystem is mounted read-only. That way, + even if only one plex of a multi-plex volume has been + mounted, no data inconsitency between plexes is being + risked. + + At the prompt asking for a root filesystem to mount, any + device that contains a valid root filesystem can be entered. + If /etc/fstab had been set up + correctly, the default should be something like + ufs:/dev/vinum/root. A typical alternate + choice would be something like + ufs:da0d which could be a + hypothetical partition that contains the pre-Vinum root + filesystem. Care should be taken if one of the alias + "a" partitions are entered here that are + actually reference to the subdisks of the Vinum root device, + because in a mirrored setup, this would only mount one piece + of a mirrored root device. If this filesystem is to be + mounted read-write later on, it is necessary to remove the + other plex(es) of the Vinum root volume since these plexes + would otherwise carry inconsistent data. + + + + Only primary bootstrap loads + + If /boot/loader fails to load, but + the primary bootstrap still loads (visible by a single dash + in the left column of the screen right after the boot + process starts), an attempt can be made to interrupt the + primary bootstrap at this point, using the + space key. This will make the bootstrap + stop in stage two, see . An + attempt can be made here to boot off an alternate partition, + like the partition containing the previous root filesystem + that has been moved away from "a" + above. + + + + Nothing boots, the bootstrap + panics + + This situation will happen if the bootstrap had been + destroyed by the Vinum installation. Unfortunately, Vinum + accidentally currently leaves only 4 KB at the beginning of + its partition free before starting to write its Vinum header + information. However, the stage one and two bootstraps plus + the disklabel embedded between them currently require 8 KB. + So if a Vinum partition was started at offset 0 within a + slice or disk that was meant to be bootable, the Vinum setup + will trash the bootstrap. + + Similarly, if the above situation has been recovered, + for example by booting from a Fixit medium, + and the bootstrap has been re-installed using + disklabel -B as described in , the bootstrap will trash the Vinum + header, and Vinum will no longer find its disk(s). Though + no actual Vinum configuration data or data in Vinum volumes + will be trashed by this, and it would be possible to recover + all the data by entering exact the same Vinum configuration + data again, the situation is hard to fix at all. It would + be necessary to move the entire Vinum partition by at least + 4 KB off, in order to have the Vinum header and the system + bootstrap no longer collide. + + + + + Differences for FreeBSD 4.x + + Under FreeBSD 4.x, some internal functions required to + make Vinum automatically scan all disks are missing, and the + code that figures out the internal ID of the root device is + not smart enough to handle a name like + /dev/vinum/root automatically. + Therefore, things are a little different here. + + Vinum must explicitly be told which disks to scan, using a + line like the following one in + /boot/loader.conf: + + vinum.drives="/dev/da0 + /dev/da1" + + It is important that all drives are mentioned that could + possibly contain Vinum data. It does not harm if + more drives are listed, nor is it + necessary to add each slice and/or partition explicitly, since + Vinum will scan all slices and partitions of the named drives + for valid Vinum headers. + + Since the routines used to parse the name of the root + filesystem, and derive the device ID (major/minor number) are + only prepared to handle classical device names + like /dev/ad0s1a, they cannot make + any sense out of a root volume name like + /dev/vinum/root. For that reason, + Vinum itself needs to pre-setup the internal kernel parameter + that holds the ID of the root device during its own + initialization. This is requested by passing the name of the + root volume in the loader variable + vinum.root. The entry in + /boot/loader.conf to accomplish this + looks like: + + vinum.root="root" + + Now, when the kernel initialization tries to find out the + root device to mount, it sees whether some kernel module has + already pre-initialized the kernel parameter for it. If that + is the case, and the device claiming the + root device matches the major number of the driver as figured + out from the name of the root device string being passed (that + is, "vinum" in our case), it will use the + pre-allocated device ID, instead of trying to figure out one + itself. That way, during the usual automatic startup, it can + continue to mount the Vinum root volume for the root + filesystem. + + However, when boot -a has been + requesting to ask for entering the name of the root device + manually, it must be noted that this routine still cannot + actually parse a name entered there that refers to a Vinum + volume. If any device name is entered that does not refer to + a Vinum device, the mismatch between the major numbers of the + pre-allocated root parameter and the driver as figured out + from the given name will make this routine enter its normal + parser, so entering a string like + ufs:da0d will work as expected. Note + that if this fails, it is however no longer possible to + re-enter a string like ufs:vinum/root + again, since it cannot be parsed. The only way out is to + reboot again, and start over then. (At the + askroot prompt, the initial + /dev/ can always be omitted.) + +