<?xml version="1.0" encoding="iso-8859-1"?> <!-- The FreeBSD Documentation Project $FreeBSD$ --> <chapter id="geom"> <chapterinfo> <authorgroup> <author> <firstname>Tom</firstname> <surname>Rhodes</surname> <contrib>Written by </contrib> </author> </authorgroup> </chapterinfo> <title>GEOM: Modular Disk Transformation Framework</title> <sect1 id="geom-synopsis"> <title>Synopsis</title> <indexterm> <primary>GEOM</primary> </indexterm> <indexterm> <primary>GEOM Disk Framework</primary> <see>GEOM</see> </indexterm> <para>This chapter covers the use of disks under the GEOM framework in &os;. This includes the major <acronym role="Redundant Array of Inexpensive Disks">RAID</acronym> control utilities which use the framework for configuration. This chapter will not go into in depth discussion on how GEOM handles or controls I/O, the underlying subsystem, or code. This information is provided in &man.geom.4; and its various <literal>SEE ALSO</literal> references. This chapter is also not a definitive guide to <acronym>RAID</acronym> configurations and only GEOM-supported <acronym>RAID</acronym> classifications will be discussed.</para> <para>After reading this chapter, you will know:</para> <itemizedlist> <listitem> <para>What type of <acronym>RAID</acronym> support is available through GEOM.</para> </listitem> <listitem> <para>How to use the base utilities to configure, maintain, and manipulate the various <acronym>RAID</acronym> levels.</para> </listitem> <listitem> <para>How to mirror, stripe, encrypt, and remotely connect disk devices through GEOM.</para> </listitem> <listitem> <para>How to troubleshoot disks attached to the GEOM framework.</para> </listitem> </itemizedlist> <para>Before reading this chapter, you should:</para> <itemizedlist> <listitem> <para>Understand how &os; treats <link linkend="disks">disk devices</link>.</para> </listitem> <listitem> <para>Know how to configure and install a new <link linkend="kernelconfig">&os; kernel</link>.</para> </listitem> </itemizedlist> </sect1> <sect1 id="geom-intro"> <title>GEOM Introduction</title> <para>GEOM permits access and control to classes, such as Master Boot Records and <acronym>BSD</acronym> labels, through the use of providers, or the special files in <filename class="directory">/dev</filename>. By supporting various software <acronym>RAID</acronym> configurations, GEOM transparently provides access to the operating system and operating system utilities.</para> </sect1> <sect1 id="geom-striping"> <sect1info> <authorgroup> <author> <firstname>Tom</firstname> <surname>Rhodes</surname> <contrib>Written by </contrib> </author> <author> <firstname>Murray</firstname> <surname>Stokely</surname> </author> </authorgroup> </sect1info> <title>RAID0 - Striping</title> <indexterm> <primary>GEOM</primary> </indexterm> <indexterm> <primary>Striping</primary> </indexterm> <para>Striping combine several disk drives into a single volume. In many cases, this is done through the use of hardware controllers. The GEOM disk subsystem provides software support for <acronym>RAID</acronym>0, also known as disk striping.</para> <para>In a <acronym>RAID</acronym>0 system, data is split into blocks that get written across all the drives in the array. Instead of having to wait on the system to write 256k to one disk, a <acronym>RAID</acronym>0 system can simultaneously write 64k to each of four different disks, offering superior I/O performance. This performance can be enhanced further by using multiple disk controllers.</para> <para>Each disk in a <acronym>RAID</acronym>0 stripe must be of the same size, since I/O requests are interleaved to read or write to multiple disks in parallel.</para> <mediaobject> <imageobject> <imagedata fileref="geom/striping" align="center"/> </imageobject> <textobject> <phrase>Disk Striping Illustration</phrase> </textobject> </mediaobject> <procedure> <title>Creating a Stripe of Unformatted ATA Disks</title> <step> <para>Load the <filename>geom_stripe.ko</filename> module:</para> <screen>&prompt.root; <userinput>kldload geom_stripe</userinput></screen> </step> <step> <para>Ensure that a suitable mount point exists. If this volume will become a root partition, then temporarily use another mount point such as <filename class="directory">/mnt</filename>:</para> <screen>&prompt.root; <userinput>mkdir /mnt</userinput></screen> </step> <step> <para>Determine the device names for the disks which will be striped, and create the new stripe device. For example, to stripe two unused and unpartitioned <acronym>ATA</acronym> disks with device names of <filename>/dev/ad2</filename> and <filename>/dev/ad3</filename>:</para> <screen>&prompt.root; <userinput>gstripe label -v st0 /dev/ad2 /dev/ad3</userinput> Metadata value stored on /dev/ad2. Metadata value stored on /dev/ad3. Done.</screen> </step> <step> <para>Write a standard label, also known as a partition table, on the new volume and install the default bootstrap code:</para> <screen>&prompt.root; <userinput>bsdlabel -wB /dev/stripe/st0</userinput></screen> </step> <step> <para>This process should create two other devices in <filename class="directory">/dev/stripe</filename> in addition to <devicename>st0</devicename>. Those include <devicename>st0a</devicename> and <devicename>st0c</devicename>. At this point, a file system may be created on <devicename>st0a</devicename> using <command>newfs</command>:</para> <screen>&prompt.root; <userinput>newfs -U /dev/stripe/st0a</userinput></screen> <para>Many numbers will glide across the screen, and after a few seconds, the process will be complete. The volume has been created and is ready to be mounted.</para> </step> </procedure> <para>To manually mount the created disk stripe:</para> <screen>&prompt.root; <userinput>mount /dev/stripe/st0a /mnt</userinput></screen> <para>To mount this striped file system automatically during the boot process, place the volume information in <filename>/etc/fstab</filename>. In this example, a permanent mount point, named <filename class="directory">stripe</filename>, is created:</para> <screen>&prompt.root; <userinput>mkdir /stripe</userinput> &prompt.root; <userinput>echo "/dev/stripe/st0a /stripe ufs rw 2 2" \</userinput> <userinput>>> /etc/fstab</userinput></screen> <para>The <filename>geom_stripe.ko</filename> module must also be automatically loaded during system initialization, by adding a line to <filename>/boot/loader.conf</filename>:</para> <screen>&prompt.root; <userinput>echo 'geom_stripe_load="YES"' >> /boot/loader.conf</userinput></screen> </sect1> <sect1 id="geom-mirror"> <title>RAID1 - Mirroring</title> <indexterm> <primary>GEOM</primary> </indexterm> <indexterm> <primary>Disk Mirroring</primary> </indexterm> <indexterm> <primary>RAID1</primary> </indexterm> <para><acronym>RAID1</acronym>, or <firstterm>mirroring</firstterm>, is the technique of writing the same data to more than one disk drive. Mirrors are usually used to guard against data loss due to drive failure. Each drive in a mirror contains an identical copy of the data. When an individual drive fails, the mirror continues to work, providing data from the drives that are still functioning. The computer keeps running, and the administrator has time to replace the failed drive without user interruption.</para> <para>Two common situations are illustrated in these examples. The first creates a mirror out of two new drives and uses it as a replacement for an existing single drive. The second example creates a mirror on a single new drive, copies the old drive's data to it, then inserts the old drive into the mirror. While this procedure is slightly more complicated, it only requires one new drive.</para> <para>Traditionally, the two drives in a mirror are identical in model and capacity, but &man.gmirror.8; does not require that. Mirrors created with dissimilar drives will have a capacity equal to that of the smallest drive in the mirror. Extra space on larger drives will be unused. Drives inserted into the mirror later must have at least as much capacity as the smallest drive already in the mirror.</para> <warning> <para>The mirroring procedures shown here are non-destructive, but as with any major disk operation, make a full backup first.</para> </warning> <sect2 id="geom-mirror-metadata"> <title>Metadata Issues</title> <para>Many disk systems store metadata at the end of each disk. Old metadata should be erased before reusing the disk for a mirror. Most problems are caused by two particular types of leftover metadata: GPT partition tables, and old &man.gmirror.8; metadata from a previous mirror.</para> <para>GPT metadata can be erased with &man.gpart.8;. This example erases both primary and backup GPT partition tables from disk <devicename>ada8</devicename>:</para> <screen>&prompt.root; <userinput>gpart destroy -F ada8</userinput></screen> <para>&man.gmirror.8; can remove a disk from an active mirror and erase the metadata in one step. Here, the example disk <devicename>ada8</devicename> is removed from the active mirror <devicename>gm4</devicename>:</para> <screen>&prompt.root; <userinput>gmirror remove gm4 ada8</userinput></screen> <para>If the mirror is not running but old mirror metadata is still on the disk, use <command>gmirror clear</command> to remove it:</para> <screen>&prompt.root; <userinput>gmirror clear ada8</userinput></screen> <para>&man.gmirror.8; stores one block of metadata at the end of the disk. Because GPT partition schemes also store metadata at the end of the disk, mirroring full GPT disks with &man.gmirror.8; is not recommended. MBR partitioning is used here because it only stores a partition table at the start of the disk and does not conflict with &man.gmirror.8;.</para> </sect2> <sect2 id="geom-mirror-two-new-disks"> <title>Creating a Mirror with Two New Disks</title> <para>In this example, &os; has already been installed on a single disk, <devicename>ada0</devicename>. Two new disks, <devicename>ada1</devicename> and <devicename>ada2</devicename>, have been connected to the system. A new mirror will be created on these two disks and used to replace the old single disk.</para> <para>&man.gmirror.8; requires a kernel module, <filename>geom_mirror.ko</filename>, either built into the kernel or loaded at boot- or run-time. Manually load the kernel module now:</para> <screen>&prompt.root; <userinput>gmirror load</userinput></screen> <para>Create the mirror with the two new drives:</para> <screen>&prompt.root; <userinput>gmirror label -v gm0 /dev/ada1 /dev/ada2</userinput></screen> <para><devicename>gm0</devicename> is a user-chosen device name assigned to the new mirror. After the mirror has been started, this device name will appear in <filename>/dev/mirror/</filename>.</para> <para>MBR and bsdlabel partition tables can now be created on the mirror with &man.gpart.8;. Here we show a traditional split-filesystem layout, with partitions for <filename>/</filename>, swap, <filename>/var</filename>, <filename>/tmp</filename>, and <filename>/usr</filename>. A single <filename>/</filename> filesystem and a swap partition will also work.</para> <para>Partitions on the mirror do not have to be the same size as those on the existing disk, but they must be large enough to hold all the data already present on <devicename>ada0</devicename>.</para> <screen>&prompt.root; <userinput>gpart create -s MBR mirror/gm0</userinput> &prompt.root; <userinput>gpart add -t freebsd -a 4k mirror/gm0</userinput> &prompt.root; <userinput>gpart show mirror/gm0</userinput> => 63 156301423 mirror/gm0 MBR (74G) 63 63 - free - (31k) 126 156301299 1 freebsd (74G) 156301425 61 - free - (30k)</screen> <screen>&prompt.root; <userinput>gpart create -s BSD mirror/gm0s1</userinput> &prompt.root; <userinput>gpart add -t freebsd-ufs -a 4k -s 2g mirror/gm0s1</userinput> &prompt.root; <userinput>gpart add -t freebsd-swap -a 4k -s 4g mirror/gm0s1</userinput> &prompt.root; <userinput>gpart add -t freebsd-ufs -a 4k -s 2g mirror/gm0s1</userinput> &prompt.root; <userinput>gpart add -t freebsd-ufs -a 4k -s 1g mirror/gm0s1</userinput> &prompt.root; <userinput>gpart add -t freebsd-ufs -a 4k mirror/gm0s1</userinput> &prompt.root; <userinput>gpart show mirror/gm0s1</userinput> => 0 156301299 mirror/gm0s1 BSD (74G) 0 2 - free - (1.0k) 2 4194304 1 freebsd-ufs (2.0G) 4194306 8388608 2 freebsd-swap (4.0G) 12582914 4194304 4 freebsd-ufs (2.0G) 16777218 2097152 5 freebsd-ufs (1.0G) 18874370 137426928 6 freebsd-ufs (65G) 156301298 1 - free - (512B)</screen> <para>Make the mirror bootable by installing bootcode in the MBR and bsdlabel and setting the active slice:</para> <screen>&prompt.root; <userinput>gpart bootcode -b /boot/mbr mirror/gm0</userinput> &prompt.root; <userinput>gpart set -a active -i 1 mirror/gm0</userinput> &prompt.root; <userinput>gpart bootcode -b /boot/boot mirror/gm0s1</userinput></screen> <para>Format the filesystems on the new mirror, enabling soft-updates.</para> <screen>&prompt.root; <userinput>newfs -U /dev/mirror/gm0s1a</userinput> &prompt.root; <userinput>newfs -U /dev/mirror/gm0s1d</userinput> &prompt.root; <userinput>newfs -U /dev/mirror/gm0s1e</userinput> &prompt.root; <userinput>newfs -U /dev/mirror/gm0s1f</userinput></screen> <para>Filesystems from the original <devicename>ada0</devicename> disk can now be copied onto the mirror with &man.dump.8; and &man.restore.8;.</para> <screen>&prompt.root; <userinput>mount /dev/mirror/gm0s1a /mnt</userinput> &prompt.root; <userinput>dump -C16 -b64 -0aL -f - / | (cd /mnt && restore -rf -)</userinput> &prompt.root; <userinput>mount /dev/mirror/gm0s1d /mnt/var</userinput> &prompt.root; <userinput>mount /dev/mirror/gm0s1e /mnt/tmp</userinput> &prompt.root; <userinput>mount /dev/mirror/gm0s1f /mnt/usr</userinput> &prompt.root; <userinput>dump -C16 -b64 -0aL -f - /var | (cd /mnt/var && restore -rf -)</userinput> &prompt.root; <userinput>dump -C16 -b64 -0aL -f - /tmp | (cd /mnt/tmp && restore -rf -)</userinput> &prompt.root; <userinput>dump -C16 -b64 -0aL -f - /usr | (cd /mnt/usr && restore -rf -)</userinput></screen> <para><filename>/mnt/etc/fstab</filename> must be edited to point to the new mirror filesystems:</para> <programlisting># Device Mountpoint FStype Options Dump Pass# /dev/mirror/gm0s1a / ufs rw 1 1 /dev/mirror/gm0s1b none swap sw 0 0 /dev/mirror/gm0s1d /var ufs rw 2 2 /dev/mirror/gm0s1e /tmp ufs rw 2 2 /dev/mirror/gm0s1f /usr ufs rw 2 2</programlisting> <para>If the &man.gmirror.8; kernel module has not been built into the kernel, <filename>/mnt/boot/loader.conf</filename> is edited to load the module at boot:</para> <programlisting>geom_mirror_load="YES"</programlisting> <para>Reboot the system to test the new mirror and verify that all data has been copied. The BIOS will see the mirror as two individual drives rather than a mirror. Because the drives are identical, it does not matter which is selected to boot.</para> <para>See the <link linkend="gmirror-troubleshooting">Troubleshooting</link> section if there are problems booting. Powering down and disconnecting the original <devicename>ada0</devicename> disk will allow it to be kept as an offline backup.</para> <para>In use, the mirror will behave just like the original single drive.</para> </sect2> <sect2 id="geom-mirror-existing-drive"> <title>Creating a Mirror with an Existing Drive</title> <para>In this example, &os; has already been installed on a single disk, <devicename>ada0</devicename>. A new disk, <devicename>ada1</devicename>, has been connected to the system. A one-disk mirror will be created on the new disk, the existing system copied onto it, and then the old disk will be inserted into the mirror. This slightly complex procedure is required because &man.gmirror.8; needs to put a 512-byte block of metadata at the end of each disk, and the existing <devicename>ada0</devicename> has usually had all of its space already allocated.</para> <para>Load the &man.gmirror.8; kernel module:</para> <screen>&prompt.root; <userinput>gmirror load</userinput></screen> <para>Check the media size of the original disk with &man.diskinfo.8;:</para> <screen>&prompt.root; <userinput>diskinfo -v ada0 | head -n3</userinput> /dev/ada0 512 # sectorsize 1000204821504 # mediasize in bytes (931G)</screen> <para>Create a mirror on the new disk. To make certain that the mirror capacity is not any larger than the original drive, &man.gnop.8; is used to create a fake drive of the exact same size. This drive does not store any data, but is used only to limit the size of the mirror. When &man.gmirror.8; creates the mirror, it will restrict the capacity to the size of <devicename>gzero.nop</devicename>, even if the new drive (<devicename>ada1</devicename>) has more space. Note that the <replaceable>1000204821504</replaceable> in the second line should be equal to <devicename>ada0</devicename>'s media size as shown by &man.diskinfo.8; above.</para> <screen>&prompt.root; <userinput>geom zero load</userinput> &prompt.root; <userinput>gnop create -s 1000204821504 gzero</userinput> &prompt.root; <userinput>gmirror label -v gm0 gzero.nop ada1</userinput> &prompt.root; <userinput>gmirror forget gm0</userinput></screen> <para><devicename>gzero.nop</devicename> does not store any data, so the mirror does not see it as connected. The mirror is told to <quote>forget</quote> unconnected components, removing references to <devicename>gzero.nop</devicename>. The result is a mirror device containing only a single disk, <devicename>ada1</devicename>.</para> <para>After creating <devicename>gm0</devicename>, view the partition table on <devicename>ada0</devicename>.</para> <para>This output is from a 1 TB drive. If there is some unallocated space at the end of the drive, the contents may be copied directly from <devicename>ada0</devicename> to the new mirror.</para> <para>However, if the output shows that all of the space on the disk is allocated like the following listing, there is no space available for the 512-byte &man.gmirror.8; metadata at the end of the disk.</para> <screen>&prompt.root; <userinput>gpart show ada0</userinput> => 63 1953525105 ada0 MBR (931G) 63 1953525105 1 freebsd [active] (931G)</screen> <para>In this case, the partition table must be edited to reduce the capacity by one sector on <devicename>mirror/gm0</devicename>. The procedure will be explained later.</para> <para>In either case, partition tables on the primary disk should be copied first with the &man.gpart.8; <command>backup</command> and <command>restore</command> subcommands.</para> <screen>&prompt.root; <userinput>gpart backup ada0 > table.ada0</userinput> &prompt.root; <userinput>gpart backup ada0s1 > table.ada0s1</userinput></screen> <para>These commands create two files, <filename>table.ada0</filename> and <filename>table.ada0s1</filename>. This example is from a 1 TB drive:</para> <screen>&prompt.root; <userinput>cat table.ada0</userinput> MBR 4 1 freebsd 63 1953525105 [active]</screen> <screen>&prompt.root; <userinput>cat table.ada0s1</userinput> BSD 8 1 freebsd-ufs 0 4194304 2 freebsd-swap 4194304 33554432 4 freebsd-ufs 37748736 50331648 5 freebsd-ufs 88080384 41943040 6 freebsd-ufs 130023424 838860800 7 freebsd-ufs 968884224 984640881</screen> <para>If the output of <command>gpart show</command> shows no free space at the end of the disk, the size of both the slice and the last partition must be reduced by one sector. Edit the two files, reducing the size of both the slice and last partition by one. These are the last numbers in each listing.</para> <screen>&prompt.root; <userinput>cat table.ada0</userinput> MBR 4 1 freebsd 63 <emphasis>1953525104</emphasis> [active]</screen> <screen>&prompt.root; <userinput>cat table.ada0s1</userinput> BSD 8 1 freebsd-ufs 0 4194304 2 freebsd-swap 4194304 33554432 4 freebsd-ufs 37748736 50331648 5 freebsd-ufs 88080384 41943040 6 freebsd-ufs 130023424 838860800 7 freebsd-ufs 968884224 <emphasis>984640880</emphasis></screen> <para>If at least one sector was unallocated at the end of the disk, these two files can be used without modification.</para> <para>Now restore the partition table into <devicename>mirror/gm0</devicename>:</para> <screen>&prompt.root; <userinput>gpart restore mirror/gm0 < table.ada0</userinput> &prompt.root; <userinput>gpart restore mirror/gm0s1 < table.ada0s1</userinput></screen> <para>Check the partition table with <command>gpart show</command>. This example has <devicename>gm0s1a</devicename> for <filename>/</filename>, <devicename>gm0s1d</devicename> for <filename>/var</filename>, <devicename>gm0s1e</devicename> for <filename>/usr</filename>, <devicename>gm0s1f</devicename> for <filename>/data1</filename>, and <devicename>gm0s1g</devicename> for <filename>/data2</filename>.</para> <screen>&prompt.root; <userinput>gpart show mirror/gm0</userinput> => 63 1953525104 mirror/gm0 MBR (931G) 63 1953525042 1 freebsd [active] (931G) 1953525105 62 - free - (31k) &prompt.root; <userinput>gpart show mirror/gm0s1</userinput> => 0 1953525042 mirror/gm0s1 BSD (931G) 0 2097152 1 freebsd-ufs (1.0G) 2097152 16777216 2 freebsd-swap (8.0G) 18874368 41943040 4 freebsd-ufs (20G) 60817408 20971520 5 freebsd-ufs (10G) 81788928 629145600 6 freebsd-ufs (300G) 710934528 1242590514 7 freebsd-ufs (592G) 1953525042 63 - free - (31k)</screen> <para>Both the slice and the last partition should have some free space at the end of each disk.</para> <para>Create filesystems on these new partitions. The number of partitions will vary, matching the partitions on the original disk, <devicename>ada0</devicename>.</para> <screen>&prompt.root; <userinput>newfs -U /dev/mirror/gm0s1a</userinput> &prompt.root; <userinput>newfs -U /dev/mirror/gm0s1d</userinput> &prompt.root; <userinput>newfs -U /dev/mirror/gm0s1e</userinput> &prompt.root; <userinput>newfs -U /dev/mirror/gm0s1f</userinput> &prompt.root; <userinput>newfs -U /dev/mirror/gm0s1g</userinput></screen> <para>Make the mirror bootable by installing bootcode in the MBR and bsdlabel and setting the active slice:</para> <screen>&prompt.root; <userinput>gpart bootcode -b /boot/mbr mirror/gm0</userinput> &prompt.root; <userinput>gpart set -a active -i 1 mirror/gm0</userinput> &prompt.root; <userinput>gpart bootcode -b /boot/boot mirror/gm0s1</userinput></screen> <para>Adjust <filename>/etc/fstab</filename> to use the new partitions on the mirror. Back up this file first by copying it to <filename>/etc/fstab.orig</filename>.</para> <screen>&prompt.root; <userinput>cp /etc/fstab /etc/fstab.orig</userinput></screen> <para>Edit <filename>/etc/fstab</filename>, replacing <devicename>/dev/ada0</devicename> with <devicename>mirror/gm0</devicename>.</para> <programlisting># Device Mountpoint FStype Options Dump Pass# /dev/mirror/gm0s1a / ufs rw 1 1 /dev/mirror/gm0s1b none swap sw 0 0 /dev/mirror/gm0s1d /var ufs rw 2 2 /dev/mirror/gm0s1e /usr ufs rw 2 2 /dev/mirror/gm0s1f /data1 ufs rw 2 2 /dev/mirror/gm0s1g /data2 ufs rw 2 2</programlisting> <para>If the &man.gmirror.8; kernel module has not been built into the kernel, edit <filename>/boot/loader.conf</filename> to load it:</para> <programlisting>geom_mirror_load="YES"</programlisting> <para>Filesystems from the original disk can now be copied onto the mirror with &man.dump.8; and &man.restore.8;. Note that it may take some time to create a snapshot for each filesystem dumped with <command>dump -L</command>.</para> <screen>&prompt.root; <userinput>mount /dev/mirror/gm0s1a /mnt</userinput> &prompt.root; <userinput>dump -C16 -b64 -0aL -f - / | (cd /mnt && restore -rf -)</userinput> &prompt.root; <userinput>mount /dev/mirror/gm0s1d /mnt/var</userinput> &prompt.root; <userinput>mount /dev/mirror/gm0s1e /mnt/usr</userinput> &prompt.root; <userinput>mount /dev/mirror/gm0s1f /mnt/data1</userinput> &prompt.root; <userinput>mount /dev/mirror/gm0s1g /mnt/data2</userinput> &prompt.root; <userinput>dump -C16 -b64 -0aL -f - /usr | (cd /mnt/usr && restore -rf -)</userinput> &prompt.root; <userinput>dump -C16 -b64 -0aL -f - /var | (cd /mnt/var && restore -rf -)</userinput> &prompt.root; <userinput>dump -C16 -b64 -0aL -f - /data1 | (cd /mnt/data1 && restore -rf -)</userinput> &prompt.root; <userinput>dump -C16 -b64 -0aL -f - /data2 | (cd /mnt/data2 && restore -rf -)</userinput></screen> <para>Restart the system, booting from <devicename>ada1</devicename>. If everything is working, the system will boot from <devicename>mirror/gm0</devicename>, which now contains the same data as <devicename>ada0</devicename> had previously. See the <link linkend="gmirror-troubleshooting">Troubleshooting</link> section if there are problems booting.</para> <para>At this point, the mirror still consists of only the single <devicename>ada1</devicename> disk.</para> <para>After booting from <devicename>mirror/gm0</devicename> successfully, the final step is inserting <devicename>ada0</devicename> into the mirror.</para> <important> <para>When <devicename>ada0</devicename> is inserted into the mirror, its former contents will be overwritten by data on the mirror. Make certain that <devicename>mirror/gm0</devicename> has the same contents as <devicename>ada0</devicename> before adding <devicename>ada0</devicename> to the mirror. If there is something wrong with the contents copied by &man.dump.8; and &man.restore.8;, revert <filename>/etc/fstab</filename> to mount the filesystems on <devicename>ada0</devicename>, reboot, and try the whole procedure again.</para> </important> <screen>&prompt.root; <userinput>gmirror insert gm0 ada0</userinput> GEOM_MIRROR: Device gm0: rebuilding provider ada0</screen> <para>Synchronization between the two disks will start immediately. &man.gmirror.8; <command>status</command> shows the progress.</para> <screen>&prompt.root; <userinput>gmirror status</userinput> Name Status Components mirror/gm0 DEGRADED ada1 (ACTIVE) ada0 (SYNCHRONIZING, 64%)</screen> <para>After a while, synchronization will finish.</para> <screen>GEOM_MIRROR: Device gm0: rebuilding provider ada0 finished. &prompt.root; <userinput>gmirror status</userinput> Name Status Components mirror/gm0 COMPLETE ada1 (ACTIVE) ada0 (ACTIVE)</screen> <para><devicename>mirror/gm0</devicename> now consists of the two disks <devicename>ada0</devicename> and <devicename>ada1</devicename>, and the contents are automatically synchronized with each other. In use, <devicename>mirror/gm0</devicename> will behave just like the original single drive.</para> </sect2> <sect2 id="gmirror-troubleshooting"> <title>Troubleshooting</title> <sect3> <title>Problems with Booting</title> <sect4> <title>BIOS Settings</title> <para>BIOS settings may have to be changed to boot from one of the new mirrored drives. Either mirror drive can be used for booting, as they contain identical data.</para> </sect4> <sect4> <title>Boot Problems</title> <para>If the boot stopped with this message, something is wrong with the mirror device:</para> <screen>Mounting from ufs:/dev/mirror/gm0s1a failed with error 19. Loader variables: vfs.root.mountfrom=ufs:/dev/mirror/gm0s1a vfs.root.mountfrom.options=rw Manual root filesystem specification: <fstype>:<device> [options] Mount <device> using filesystem <fstype> and with the specified (optional) option list. eg. ufs:/dev/da0s1a zfs:tank cd9660:/dev/acd0 ro (which is equivalent to: mount -t cd9660 -o ro /dev/acd0 /) ? List valid disk boot devices . Yield 1 second (for background tasks) <empty line> Abort manual input mountroot></screen> <para>Forgetting to load the <filename>geom_mirror</filename> module in <filename>/boot/loader.conf</filename> can cause this problem. To fix it, boot from a &os; 9.0 or later installation media and choose <literal>Shell</literal> at the first prompt. Then load the mirror module and mount the mirror device:</para> <screen>&prompt.root; <userinput>gmirror load</userinput> &prompt.root; <userinput>mount /dev/mirror/gm0s1a /mnt</userinput></screen> <para>Edit <filename>/mnt/boot/loader.conf</filename>, adding a line to load the mirror module:</para> <programlisting>geom_mirror_load="YES"</programlisting> <para>Save the file and reboot.</para> <para>Other problems that cause <literal>error 19</literal> require more effort to fix. Enter <literal>ufs:/dev/ada0s1a</literal> at the boot loader prompt. Although the system should boot from <devicename>ada0</devicename>, another prompt to select a shell appears because <filename>/etc/fstab</filename> is incorrect. Press the Enter key at the prompt. Undo the modifications so far by reverting <filename>/etc/fstab</filename>, mounting filesystems from the original disk (<devicename>ada0</devicename>) instead of the mirror. Reboot the system and try the procedure again.</para> <screen>Enter full pathname of shell or RETURN for /bin/sh: &prompt.root; <userinput>cp /etc/fstab.orig /etc/fstab</userinput> &prompt.root; <userinput>reboot</userinput></screen> </sect4> </sect3> </sect2> <sect2> <title>Recovering from Disk Failure</title> <para>The benefit of disk mirroring is that an individual disk can fail without causing the mirror to lose any data. In the above example, if <devicename>ada0</devicename> fails, the mirror will continue to work, providing data from the remaining working drive, <devicename>ada1</devicename>.</para> <para>To replace the failed drive, shut down the system and physically replace the failed drive with a new drive of equal or greater capacity. Manufacturers use somewhat arbitrary values when rating drives in gigabytes, and the only way to really be sure is to compare the total count of sectors shown by <command>diskinfo -v</command>. A drive with larger capacity than the mirror will work, although the extra space on the new drive will not be used.</para> <para>After the computer is powered back up, the mirror will be running in a <quote>degraded</quote> mode with only one drive. The mirror is told to forget drives that are not currently connected:</para> <screen>&prompt.root; <userinput>gmirror forget gm0</userinput></screen> <para>Any old metadata should be <link linkend="geom-mirror-metadata">cleared from the replacement disk</link>. Then the disk, <devicename>ada4</devicename> for this example, is inserted into the mirror:</para> <screen>&prompt.root; <userinput>gmirror insert gm0 /dev/ada4</userinput></screen> <para>Resynchronization begins when the new drive is inserted into the mirror. This process of copying mirror data to a new drive can take a while. Performance of the mirror will be greatly reduced during the copy, so inserting new drives is best done when there is low demand on the computer.</para> <para>Progress can be monitored with <command>gmirror status</command>, which shows drives that are being synchronized and the percentage of completion. During resynchronization, the status will be <computeroutput>DEGRADED</computeroutput>, changing to <computeroutput>COMPLETE</computeroutput> when the process is finished.</para> </sect2> </sect1> <sect1 id="geom-raid3"> <sect1info> <authorgroup> <author> <firstname>Mark</firstname> <surname>Gladman</surname> <contrib>Written by </contrib> </author> <author> <firstname>Daniel</firstname> <surname>Gerzo</surname> </author> </authorgroup> <authorgroup> <author> <firstname>Tom</firstname> <surname>Rhodes</surname> <contrib>Based on documentation by </contrib> </author> <author> <firstname>Murray</firstname> <surname>Stokely</surname> </author> </authorgroup> </sect1info> <title><acronym>RAID</acronym>3 - Byte-level Striping with Dedicated Parity</title> <indexterm> <primary>GEOM</primary> </indexterm> <indexterm> <primary>RAID3</primary> </indexterm> <para><acronym>RAID</acronym>3 is a method used to combine several disk drives into a single volume with a dedicated parity disk. In a <acronym>RAID</acronym>3 system, data is split up into a number of bytes that are written across all the drives in the array except for one disk which acts as a dedicated parity disk. This means that reading 1024KB from a <acronym>RAID</acronym>3 implementation will access all disks in the array. Performance can be enhanced by using multiple disk controllers. The <acronym>RAID</acronym>3 array provides a fault tolerance of 1 drive, while providing a capacity of 1 - 1/n times the total capacity of all drives in the array, where n is the number of hard drives in the array. Such a configuration is mostly suitable for storing data of larger sizes such as multimedia files.</para> <para>At least 3 physical hard drives are required to build a <acronym>RAID</acronym>3 array. Each disk must be of the same size, since I/O requests are interleaved to read or write to multiple disks in parallel. Also, due to the nature of <acronym>RAID</acronym>3, the number of drives must be equal to 3, 5, 9, 17, and so on, or 2^n + 1.</para> <sect2> <title>Creating a Dedicated <acronym>RAID</acronym>3 Array</title> <para>In &os;, support for <acronym>RAID</acronym>3 is implemented by the &man.graid3.8; <acronym>GEOM</acronym> class. Creating a dedicated <acronym>RAID</acronym>3 array on &os; requires the following steps.</para> <note> <para>While it is theoretically possible to boot from a <acronym>RAID</acronym>3 array on &os;, that configuration is uncommon and is not advised.</para> </note> <procedure> <step> <para>First, load the <filename>geom_raid3.ko</filename> kernel module by issuing the following command:</para> <screen>&prompt.root; <userinput>graid3 load</userinput></screen> <para>Alternatively, it is possible to manually load the <filename>geom_raid3.ko</filename> module:</para> <screen>&prompt.root; <userinput>kldload geom_raid3.ko</userinput></screen> </step> <step> <para>Create or ensure that a suitable mount point exists:</para> <screen>&prompt.root; <userinput>mkdir <replaceable>/multimedia/</replaceable></userinput></screen> </step> <step> <para>Determine the device names for the disks which will be added to the array, and create the new <acronym>RAID</acronym>3 device. The final device listed will act as the dedicated parity disk. This example uses three unpartitioned <acronym>ATA</acronym> drives: <devicename><replaceable>ada1</replaceable></devicename> and <devicename><replaceable>ada2</replaceable></devicename> for data, and <devicename><replaceable>ada3</replaceable></devicename> for parity.</para> <screen>&prompt.root; <userinput>graid3 label -v gr0 /dev/ada1 /dev/ada2 /dev/ada3</userinput> Metadata value stored on /dev/ada1. Metadata value stored on /dev/ada2. Metadata value stored on /dev/ada3. Done.</screen> </step> <step> <para>Partition the newly created <devicename>gr0</devicename> device and put a UFS file system on it:</para> <screen>&prompt.root; <userinput>gpart create -s GPT /dev/raid3/gr0</userinput> &prompt.root; <userinput>gpart add -t freebsd-ufs /dev/raid3/gr0</userinput> &prompt.root; <userinput>newfs -j /dev/raid3/gr0p1</userinput></screen> <para>Many numbers will glide across the screen, and after a bit of time, the process will be complete. The volume has been created and is ready to be mounted:</para> <screen>&prompt.root; <userinput>mount /dev/raid3/gr0p1 /multimedia/</userinput></screen> <para>The <acronym>RAID</acronym>3 array is now ready to use.</para> </step> </procedure> <para>Additional configuration is needed to retain the above setup across system reboots.</para> <procedure> <step> <para>The <filename>geom_raid3.ko</filename> module must be loaded before the array can be mounted. To automatically load the kernel module during system initialization, add the following line to <filename>/boot/loader.conf</filename>:</para> <programlisting>geom_raid3_load="YES"</programlisting> </step> <step> <para>The following volume information must be added to <filename>/etc/fstab</filename> in order to automatically mount the array's file system during the system boot process:</para> <programlisting>/dev/raid3/gr0p1 /multimedia ufs rw 2 2</programlisting> </step> </procedure> </sect2> </sect1> <sect1 id="geom-ggate"> <title>GEOM Gate Network Devices</title> <para>GEOM supports the remote use of devices, such as disks, CD-ROMs, and files through the use of the gate utilities. This is similar to <acronym>NFS</acronym>.</para> <para>To begin, an exports file must be created. This file specifies who is permitted to access the exported resources and what level of access they are offered. For example, to export the fourth slice on the first <acronym>SCSI</acronym> disk, the following <filename>/etc/gg.exports</filename> is more than adequate:</para> <programlisting>192.168.1.0/24 RW /dev/da0s4d</programlisting> <para>This allows all hosts inside the specified private network access to the file system on the <devicename>da0s4d</devicename> partition.</para> <para>To export this device, ensure it is not currently mounted, and start the &man.ggated.8; server daemon:</para> <screen>&prompt.root; <userinput>ggated</userinput></screen> <para>To <command>mount</command> the device on the client machine, issue the following commands:</para> <screen>&prompt.root; <userinput>ggatec create -o rw 192.168.1.1 /dev/da0s4d</userinput> ggate0 &prompt.root; <userinput>mount /dev/ggate0 /mnt</userinput></screen> <para>The device may now be accessed through the <filename class="directory">/mnt</filename> mount point.</para> <note> <para>However, this will fail if the device is currently mounted on either the server machine or any other machine on the network.</para> </note> <para>When the device is no longer needed, unmount it with &man.umount.8;, similar to any other disk device.</para> </sect1> <sect1 id="geom-glabel"> <title>Labeling Disk Devices</title> <indexterm> <primary>GEOM</primary> </indexterm> <indexterm> <primary>Disk Labels</primary> </indexterm> <para>During system initialization, the &os; kernel creates device nodes as devices are found. This method of probing for devices raises some issues. For instance, what if a new disk device is added via <acronym>USB</acronym>? It is likely that a flash device may be handed the device name of <devicename>da0</devicename> and the original <devicename>da0</devicename> shifted to <devicename>da1</devicename>. This will cause issues mounting file systems if they are listed in <filename>/etc/fstab</filename> which may also prevent the system from booting.</para> <para>One solution is to chain <acronym>SCSI</acronym> devices in order so a new device added to the <acronym>SCSI</acronym> card will be issued unused device numbers. But what about <acronym>USB</acronym> devices which may replace the primary <acronym>SCSI</acronym> disk? This happens because <acronym>USB</acronym> devices are usually probed before the <acronym>SCSI</acronym> card. One solution is to only insert these devices after the system has been booted. Another method is to use only a single <acronym>ATA</acronym> drive and never list the <acronym>SCSI</acronym> devices in <filename>/etc/fstab</filename>.</para> <para>A better solution is to use <command>glabel</command> to label the disk devices and use the labels in <filename>/etc/fstab</filename>. Because <command>glabel</command> stores the label in the last sector of a given provider, the label will remain persistent across reboots. By using this label as a device, the file system may always be mounted regardless of what device node it is accessed through.</para> <note> <para><command>glabel</command> can create both transient and permanent labels. Only permanent labels are consistent across reboots. Refer to &man.glabel.8; for more information on the differences between labels.</para> </note> <sect2> <title>Label Types and Examples</title> <para>Permanent labels can be a generic or a file system label. Permanent file system labels can be created with &man.tunefs.8; or &man.newfs.8;. These types of labels are created in a sub-directory of <filename class="directory">/dev</filename>, and will be named according to the file system type. For example, <acronym>UFS</acronym>2 file system labels will be created in <filename class="directory">/dev/ufs</filename>. Generic permanent labels can be created with <command>glabel label</command>. These are not file system specific and will be created in <filename class="directory">/dev/label</filename>.</para> <para>Temporary labels are destroyed at the next reboot. These labels are created in <filename class="directory">/dev/label</filename> and are suited to experimentation. A temporary label can be created using <command>glabel create</command>.</para> <!-- XXXTR: How do you create a file system label without running newfs or when there is no newfs (e.g.: cd9660)? --> <para>To create a permanent label for a <acronym>UFS</acronym>2 file system without destroying any data, issue the following command:</para> <screen>&prompt.root; <userinput>tunefs -L <replaceable>home</replaceable> <replaceable>/dev/da3</replaceable></userinput></screen> <warning> <para>If the file system is full, this may cause data corruption.</para> </warning> <para>A label should now exist in <filename class="directory">/dev/ufs</filename> which may be added to <filename>/etc/fstab</filename>:</para> <programlisting>/dev/ufs/home /home ufs rw 2 2</programlisting> <note> <para>The file system must not be mounted while attempting to run <command>tunefs</command>.</para> </note> <para>Now the file system may be mounted:</para> <screen>&prompt.root; <userinput>mount /home</userinput></screen> <para>From this point on, so long as the <filename>geom_label.ko</filename> kernel module is loaded at boot with <filename>/boot/loader.conf</filename> or the <literal>GEOM_LABEL</literal> kernel option is present, the device node may change without any ill effect on the system.</para> <para>File systems may also be created with a default label by using the <option>-L</option> flag with <command>newfs</command>. Refer to &man.newfs.8; for more information.</para> <para>The following command can be used to destroy the label:</para> <screen>&prompt.root; <userinput>glabel destroy home</userinput></screen> <para>The following example shows how to label the partitions of a boot disk.</para> <example> <title>Labeling Partitions on the Boot Disk</title> <para>By permanently labeling the partitions on the boot disk, the system should be able to continue to boot normally, even if the disk is moved to another controller or transferred to a different system. For this example, it is assumed that a single <acronym>ATA</acronym> disk is used, which is currently recognized by the system as <devicename>ad0</devicename>. It is also assumed that the standard &os; partition scheme is used, with <filename class="directory">/</filename>, <filename class="directory">/var</filename>, <filename class="directory">/usr</filename> and <filename class="directory">/tmp</filename>, as well as a swap partition.</para> <para>Reboot the system, and at the &man.loader.8; prompt, press <keycap>4</keycap> to boot into single user mode. Then enter the following commands:</para> <screen>&prompt.root; <userinput>glabel label rootfs /dev/ad0s1a</userinput> GEOM_LABEL: Label for provider /dev/ad0s1a is label/rootfs &prompt.root; <userinput>glabel label var /dev/ad0s1d</userinput> GEOM_LABEL: Label for provider /dev/ad0s1d is label/var &prompt.root; <userinput>glabel label usr /dev/ad0s1f</userinput> GEOM_LABEL: Label for provider /dev/ad0s1f is label/usr &prompt.root; <userinput>glabel label tmp /dev/ad0s1e</userinput> GEOM_LABEL: Label for provider /dev/ad0s1e is label/tmp &prompt.root; <userinput>glabel label swap /dev/ad0s1b</userinput> GEOM_LABEL: Label for provider /dev/ad0s1b is label/swap &prompt.root; <userinput>exit</userinput></screen> <para>The system will continue with multi-user boot. After the boot completes, edit <filename>/etc/fstab</filename> and replace the conventional device names, with their respective labels. The final <filename>/etc/fstab</filename> will look like this:</para> <programlisting># Device Mountpoint FStype Options Dump Pass# /dev/label/swap none swap sw 0 0 /dev/label/rootfs / ufs rw 1 1 /dev/label/tmp /tmp ufs rw 2 2 /dev/label/usr /usr ufs rw 2 2 /dev/label/var /var ufs rw 2 2</programlisting> <para>The system can now be rebooted. If everything went well, it will come up normally and <command>mount</command> will show:</para> <screen>&prompt.root; <userinput>mount</userinput> /dev/label/rootfs on / (ufs, local) devfs on /dev (devfs, local) /dev/label/tmp on /tmp (ufs, local, soft-updates) /dev/label/usr on /usr (ufs, local, soft-updates) /dev/label/var on /var (ufs, local, soft-updates)</screen> </example> <para>Starting with &os; 7.2, the &man.glabel.8; class supports a new label type for <acronym>UFS</acronym> file systems, based on the unique file system id, <literal>ufsid</literal>. These labels may be found in <filename class="directory">/dev/ufsid</filename> and are created automatically during system startup. It is possible to use <literal>ufsid</literal> labels to mount partitions using <filename>/etc/fstab</filename>. Use <command>glabel status</command> to receive a list of file systems and their corresponding <literal>ufsid</literal> labels:</para> <screen>&prompt.user; <userinput>glabel status</userinput> Name Status Components ufsid/486b6fc38d330916 N/A ad4s1d ufsid/486b6fc16926168e N/A ad4s1f</screen> <para>In the above example, <devicename>ad4s1d</devicename> represents <filename class="directory">/var</filename>, while <devicename>ad4s1f</devicename> represents <filename class="directory">/usr</filename>. Using the <literal>ufsid</literal> values shown, these partitions may now be mounted with the following entries in <filename>/etc/fstab</filename>:</para> <programlisting>/dev/ufsid/486b6fc38d330916 /var ufs rw 2 2 /dev/ufsid/486b6fc16926168e /usr ufs rw 2 2</programlisting> <para>Any partitions with <literal>ufsid</literal> labels can be mounted in this way, eliminating the need to manually create permanent labels, while still enjoying the benefits of device name independent mounting.</para> </sect2> </sect1> <sect1 id="geom-gjournal"> <title>UFS Journaling Through GEOM</title> <indexterm> <primary>GEOM</primary> </indexterm> <indexterm> <primary>Journaling</primary> </indexterm> <para>Beginning with &os; 7.0, support for UFS journals is available. The implementation is provided through the <acronym>GEOM</acronym> subsystem and is configured using &man.gjournal.8;.</para> <para>Journaling stores a log of file system transactions, such as changes that make up a complete disk write operation, before meta-data and file writes are committed to the disk. This transaction log can later be replayed to redo file system transactions, preventing file system inconsistencies.</para> <para>This method provides another mechanism to protect against data loss and inconsistencies of the file system. Unlike Soft Updates, which tracks and enforces meta-data updates, and snapshots, which create an image of the file system, a log is stored in disk space specifically for this task, and in some cases, may be stored on another disk entirely.</para> <para>Unlike other file system journaling implementations, the <command>gjournal</command> method is block based and not implemented as part of the file system. It is a <acronym>GEOM</acronym> extension.</para> <para>To enable support for <command>gjournal</command>, the &os; kernel must have the following option which is the default on &os; 7.0 and later:</para> <programlisting>options UFS_GJOURNAL</programlisting> <para>If journaled volumes need to be mounted during startup, the <filename>geom_journal.ko</filename> kernel module needs to be loaded, by adding the following line to <filename>/boot/loader.conf</filename>:</para> <programlisting>geom_journal_load="YES"</programlisting> <para>Alternatively, this function can be built into a custom kernel, by adding the following line in the kernel configuration file:</para> <programlisting>options GEOM_JOURNAL</programlisting> <para>Creating a journal on a free file system may now be done using the following steps. In this example, <devicename>da4</devicename> is a new <acronym>SCSI</acronym> disk:</para> <screen>&prompt.root; <userinput>gjournal load</userinput> &prompt.root; <userinput>gjournal label /dev/da4</userinput></screen> <para>At this point, there should be a <devicename>/dev/da4</devicename> device node and a <devicename>/dev/da4.journal</devicename> device node. A file system may now be created on this device:</para> <screen>&prompt.root; <userinput>newfs -O 2 -J /dev/da4.journal</userinput></screen> <para>This command will create a <acronym>UFS</acronym>2 file system on the journaled device.</para> <para><command>mount</command> the device at the desired point with:</para> <screen>&prompt.root; <userinput>mount /dev/da4.journal <replaceable>/mnt</replaceable></userinput></screen> <note> <para>In the case of several slices, a journal will be created for each individual slice. For instance, if <devicename>ad4s1</devicename> and <devicename>ad4s2</devicename> are both slices, then <command>gjournal</command> will create <devicename>ad4s1.journal</devicename> and <devicename>ad4s2.journal</devicename>.</para> </note> <para>For better performance, the journal may be kept on another disk. In this configuration, the journal provider or storage device should be listed after the device to enable journaling on. Journaling may also be enabled on current file systems by using <command>tunefs</command>. However, <emphasis>always</emphasis> make a backup before attempting to alter a file system. In most cases, <command>gjournal</command> will fail if it is unable to create the journal, but this does not protect against data loss incurred as a result of misusing <command>tunefs</command>.</para> <para>It is also possible to journal the boot disk of a &os; system. Refer to the article <ulink url="&url.articles.gjournal-desktop;">Implementing UFS Journaling on a Desktop PC</ulink> for detailed instructions.</para> </sect1> </chapter>