File Systems Support TomRhodesWritten by Synopsis File Systems File Systems Support File Systems File systems are an integral part of any operating system. They allow users to upload and store files, provide access to data, and make hard drives useful. Different operating systems differ in their native file system. Traditionally, the native &os; file system has been the Unix File System UFS which has been modernized as UFS2. Since &os; 7.0, the Z File System ZFS is also available as a native file system. In addition to its native file systems, &os; supports a multitude of other file systems so that data from other operating systems can be accessed locally, such as data stored on locally attached USB storage devices, flash drives, and hard disks. This includes support for the &linux; Extended File System (EXT) and the Reiser file system. There are different levels of &os; support for the various file systems. Some require a kernel module to be loaded and others may require a toolset to be installed. Some non-native file system support is full read-write while others are read-only. After reading this chapter, you will know: The difference between native and supported file systems. Which file systems are supported by &os;. How to enable, configure, access, and make use of non-native file systems. Before reading this chapter, you should: Understand &unix; and &os; basics. Be familiar with the basics of kernel configuration and compilation. Feel comfortable installing software in &os;. Have some familiarity with disks, storage, and device names in &os;. The Z File System (ZFS) The Z file system, originally developed by &sun;, is designed to use a pooled storage method in that space is only used as it is needed for data storage. It is also designed for maximum data integrity, supporting data snapshots, multiple copies, and data checksums. It uses a software data replication model, known as RAID-Z. RAID-Z provides redundancy similar to hardware RAID, but is designed to prevent data write corruption and to overcome some of the limitations of hardware RAID. ZFS Tuning Some of the features provided by ZFS are RAM-intensive, so some tuning may be required to provide maximum efficiency on systems with limited RAM. Memory At a bare minimum, the total system memory should be at least one gigabyte. The amount of recommended RAM depends upon the size of the pool and the ZFS features which are used. A general rule of thumb is 1GB of RAM for every 1TB of storage. If the deduplication feature is used, a general rule of thumb is 5GB of RAM per TB of storage to be deduplicated. While some users successfully use ZFS with less RAM, it is possible that when the system is under heavy load, it may panic due to memory exhaustion. Further tuning may be required for systems with less than the recommended RAM requirements. Kernel Configuration Due to the RAM limitations of the &i386; platform, users using ZFS on the &i386; architecture should add the following option to a custom kernel configuration file, rebuild the kernel, and reboot: options KVA_PAGES=512 This option expands the kernel address space, allowing the vm.kvm_size tunable to be pushed beyond the currently imposed limit of 1 GB, or the limit of 2 GB for PAE. To find the most suitable value for this option, divide the desired address space in megabytes by four (4). In this example, it is 512 for 2 GB. Loader Tunables The kmem address space can be increased on all &os; architectures. On a test system with one gigabyte of physical memory, success was achieved with the following options added to /boot/loader.conf, and the system restarted: vm.kmem_size="330M" vm.kmem_size_max="330M" vfs.zfs.arc_max="40M" vfs.zfs.vdev.cache.size="5M" For a more detailed list of recommendations for ZFS-related tuning, see http://wiki.freebsd.org/ZFSTuningGuide. Using <acronym>ZFS</acronym> There is a start up mechanism that allows &os; to mount ZFS pools during system initialization. To set it, issue the following commands: &prompt.root; echo 'zfs_enable="YES"' >> /etc/rc.conf &prompt.root; service zfs start The examples in this section assume three SCSI disks with the device names da0, da1, and da2. Users of IDE hardware should instead use ad device names. Single Disk Pool To create a simple, non-redundant ZFS pool using a single disk device, use zpool: &prompt.root; zpool create example /dev/da0 To view the new pool, review the output of df: &prompt.root; df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1a 2026030 235230 1628718 13% / devfs 1 1 0 100% /dev /dev/ad0s1d 54098308 1032846 48737598 2% /usr example 17547136 0 17547136 0% /example This output shows that the example pool has been created and mounted. It is now accessible as a file system. Files may be created on it and users can browse it, as seen in the following example: &prompt.root; cd /example &prompt.root; ls &prompt.root; touch testfile &prompt.root; ls -al total 4 drwxr-xr-x 2 root wheel 3 Aug 29 23:15 . drwxr-xr-x 21 root wheel 512 Aug 29 23:12 .. -rw-r--r-- 1 root wheel 0 Aug 29 23:15 testfile However, this pool is not taking advantage of any ZFS features. To create a dataset on this pool with compression enabled: &prompt.root; zfs create example/compressed &prompt.root; zfs set compression=gzip example/compressed The example/compressed dataset is now a ZFS compressed file system. Try copying some large files to /example/compressed. Compression can be disabled with: &prompt.root; zfs set compression=off example/compressed To unmount a file system, issue the following command and then verify by using df: &prompt.root; zfs umount example/compressed &prompt.root; df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1a 2026030 235232 1628716 13% / devfs 1 1 0 100% /dev /dev/ad0s1d 54098308 1032864 48737580 2% /usr example 17547008 0 17547008 0% /example To re-mount the file system to make it accessible again, and verify with df: &prompt.root; zfs mount example/compressed &prompt.root; df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1a 2026030 235234 1628714 13% / devfs 1 1 0 100% /dev /dev/ad0s1d 54098308 1032864 48737580 2% /usr example 17547008 0 17547008 0% /example example/compressed 17547008 0 17547008 0% /example/compressed The pool and file system may also be observed by viewing the output from mount: &prompt.root; mount /dev/ad0s1a on / (ufs, local) devfs on /dev (devfs, local) /dev/ad0s1d on /usr (ufs, local, soft-updates) example on /example (zfs, local) example/data on /example/data (zfs, local) example/compressed on /example/compressed (zfs, local) ZFS datasets, after creation, may be used like any file systems. However, many other features are available which can be set on a per-dataset basis. In the following example, a new file system, data is created. Important files will be stored here, the file system is set to keep two copies of each data block: &prompt.root; zfs create example/data &prompt.root; zfs set copies=2 example/data It is now possible to see the data and space utilization by issuing df: &prompt.root; df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1a 2026030 235234 1628714 13% / devfs 1 1 0 100% /dev /dev/ad0s1d 54098308 1032864 48737580 2% /usr example 17547008 0 17547008 0% /example example/compressed 17547008 0 17547008 0% /example/compressed example/data 17547008 0 17547008 0% /example/data Notice that each file system on the pool has the same amount of available space. This is the reason for using df in these examples, to show that the file systems use only the amount of space they need and all draw from the same pool. The ZFS file system does away with concepts such as volumes and partitions, and allows for several file systems to occupy the same pool. To destroy the file systems and then destroy the pool as they are no longer needed: &prompt.root; zfs destroy example/compressed &prompt.root; zfs destroy example/data &prompt.root; zpool destroy example <acronym>ZFS</acronym> RAID-Z There is no way to prevent a disk from failing. One method of avoiding data loss due to a failed hard disk is to implement RAID. ZFS supports this feature in its pool design. To create a RAID-Z pool, issue the following command and specify the disks to add to the pool: &prompt.root; zpool create storage raidz da0 da1 da2 &sun; recommends that the amount of devices used in a RAID-Z configuration is between three and nine. For environments requiring a single pool consisting of 10 disks or more, consider breaking it up into smaller RAID-Z groups. If only two disks are available and redundancy is a requirement, consider using a ZFS mirror. Refer to &man.zpool.8; for more details. This command creates the storage zpool. This may be verified using &man.mount.8; and &man.df.1;. This command makes a new file system in the pool called home: &prompt.root; zfs create storage/home It is now possible to enable compression and keep extra copies of directories and files using the following commands: &prompt.root; zfs set copies=2 storage/home &prompt.root; zfs set compression=gzip storage/home To make this the new home directory for users, copy the user data to this directory, and create the appropriate symbolic links: &prompt.root; cp -rp /home/* /storage/home &prompt.root; rm -rf /home /usr/home &prompt.root; ln -s /storage/home /home &prompt.root; ln -s /storage/home /usr/home Users should now have their data stored on the freshly created /storage/home. Test by adding a new user and logging in as that user. Try creating a snapshot which may be rolled back later: &prompt.root; zfs snapshot storage/home@08-30-08 Note that the snapshot option will only capture a real file system, not a home directory or a file. The @ character is a delimiter used between the file system name or the volume name. When a user's home directory gets trashed, restore it with: &prompt.root; zfs rollback storage/home@08-30-08 To get a list of all available snapshots, run ls in the file system's .zfs/snapshot directory. For example, to see the previously taken snapshot: &prompt.root; ls /storage/home/.zfs/snapshot It is possible to write a script to perform regular snapshots on user data. However, over time, snapshots may consume a great deal of disk space. The previous snapshot may be removed using the following command: &prompt.root; zfs destroy storage/home@08-30-08 After testing, /storage/home can be made the real /home using this command: &prompt.root; zfs set mountpoint=/home storage/home Run df and mount to confirm that the system now treats the file system as the real /home: &prompt.root; mount /dev/ad0s1a on / (ufs, local) devfs on /dev (devfs, local) /dev/ad0s1d on /usr (ufs, local, soft-updates) storage on /storage (zfs, local) storage/home on /home (zfs, local) &prompt.root; df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1a 2026030 235240 1628708 13% / devfs 1 1 0 100% /dev /dev/ad0s1d 54098308 1032826 48737618 2% /usr storage 26320512 0 26320512 0% /storage storage/home 26320512 0 26320512 0% /home This completes the RAID-Z configuration. To get status updates about the file systems created during the nightly &man.periodic.8; runs, issue the following command: &prompt.root; echo 'daily_status_zfs_enable="YES"' >> /etc/periodic.conf Recovering <acronym>RAID</acronym>-Z Every software RAID has a method of monitoring its state. The status of RAID-Z devices may be viewed with the following command: &prompt.root; zpool status -x If all pools are healthy and everything is normal, the following message will be returned: all pools are healthy If there is an issue, perhaps a disk has gone offline, the pool state will look similar to: pool: storage state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scrub: none requested config: NAME STATE READ WRITE CKSUM storage DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 da0 ONLINE 0 0 0 da1 OFFLINE 0 0 0 da2 ONLINE 0 0 0 errors: No known data errors This indicates that the device was previously taken offline by the administrator using the following command: &prompt.root; zpool offline storage da1 It is now possible to replace da1 after the system has been powered down. When the system is back online, the following command may issued to replace the disk: &prompt.root; zpool replace storage da1 From here, the status may be checked again, this time without the flag to get state information: &prompt.root; zpool status storage pool: storage state: ONLINE scrub: resilver completed with 0 errors on Sat Aug 30 19:44:11 2008 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1 ONLINE 0 0 0 da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 errors: No known data errors As shown from this example, everything appears to be normal. Data Verification ZFS uses checksums to verify the integrity of stored data. These are enabled automatically upon creation of file systems and may be disabled using the following command: &prompt.root; zfs set checksum=off storage/home Doing so is not recommended as checksums take very little storage space and are used to check data integrity using checksum verification in a process is known as scrubbing. To verify the data integrity of the storage pool, issue this command: &prompt.root; zpool scrub storage This process may take considerable time depending on the amount of data stored. It is also very I/O intensive, so much so that only one scrub may be run at any given time. After the scrub has completed, the status is updated and may be viewed by issuing a status request: &prompt.root; zpool status storage pool: storage state: ONLINE scrub: scrub completed with 0 errors on Sat Jan 26 19:57:37 2013 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1 ONLINE 0 0 0 da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 errors: No known data errors The completion time is displayed and helps to ensure data integrity over a long period of time. Refer to &man.zfs.8; and &man.zpool.8; for other ZFS options. ZFS Quotas ZFS supports different types of quotas: the refquota, the general quota, the user quota, and the group quota. This section explains the basics of each type and includes some usage instructions. Quotas limit the amount of space that a dataset and its descendants can consume, and enforce a limit on the amount of space used by file systems and snapshots for the descendants. Quotas are useful to limit the amount of space a particular user can use. Quotas cannot be set on volumes, as the volsize property acts as an implicit quota. The refquota=size limits the amount of space a dataset can consume by enforcing a hard limit on the space used. However, this hard limit does not include space used by descendants, such as file systems or snapshots. To enforce a general quota of 10 GB for storage/home/bob, use the following: &prompt.root; zfs set quota=10G storage/home/bob User quotas limit the amount of space that can be used by the specified user. The general format is userquota@user=size, and the user's name must be in one of the following formats: POSIX compatible name such as joe. POSIX numeric ID such as 789. SID name such as joe.bloggs@example.com. SID numeric ID such as S-1-123-456-789. For example, to enforce a quota of 50 GB for a user named joe, use the following: &prompt.root; zfs set userquota@joe=50G To remove the quota or make sure that one is not set, instead use: &prompt.root; zfs set userquota@joe=none User quota properties are not displayed by zfs get all. Non-root users can only see their own quotas unless they have been granted the userquota privilege. Users with this privilege are able to view and set everyone's quota. The group quota limits the amount of space that a specified group can consume. The general format is groupquota@group=size. To set the quota for the group firstgroup to 50 GB, use: &prompt.root; zfs set groupquota@firstgroup=50G To remove the quota for the group firstgroup, or to make sure that one is not set, instead use: &prompt.root; zfs set groupquota@firstgroup=none As with the user quota property, non-root users can only see the quotas associated with the groups that they belong to. However, root or a user with the groupquota privilege can view and set all quotas for all groups. To display the amount of space consumed by each user on the specified file system or snapshot, along with any specified quotas, use zfs userspace. For group information, use zfs groupspace. For more information about supported options or how to display only specific options, refer to &man.zfs.1;. Users with sufficient privileges and root can list the quota for storage/home/bob using: &prompt.root; zfs get quota storage/home/bob ZFS Reservations ZFS supports two types of space reservations. This section explains the basics of each and includes some usage instructions. The reservation property makes it possible to reserve a minimum amount of space guaranteed for a dataset and its descendants. This means that if a 10 GB reservation is set on storage/home/bob, if disk space gets low, at least 10 GB of space is reserved for this dataset. The refreservation property sets or indicates the minimum amount of space guaranteed to a dataset excluding descendants, such as snapshots. As an example, if a snapshot was taken of storage/home/bob, enough disk space would have to exist outside of the refreservation amount for the operation to succeed because descendants of the main data set are not counted by the refreservation amount and so do not encroach on the space set. Reservations of any sort are useful in many situations, such as planning and testing the suitability of disk space allocation in a new system, or ensuring that enough space is available on file systems for system recovery procedures and files. The general format of the reservation property is reservation=size, so to set a reservation of 10 GB on storage/home/bob, use: &prompt.root; zfs set reservation=10G storage/home/bob To make sure that no reservation is set, or to remove a reservation, use: &prompt.root; zfs set reservation=none storage/home/bob The same principle can be applied to the refreservation property for setting a refreservation, with the general format refreservation=size. To check if any reservations or refreservations exist on storage/home/bob, execute one of the following commands: &prompt.root; zfs get reservation storage/home/bob &prompt.root; zfs get refreservation storage/home/bob &linux; File Systems &os; provides built-in support for several &linux; file systems. This section demonstrates how to load support for and how to mount the supported &linux; file systems. <acronym>ext2</acronym> Kernel support for ext2 file systems has been available since &os; 2.2. In &os; 8.x and earlier, the code is licensed under the GPL. Since &os; 9.0, the code has been rewritten and is now BSD licensed. The &man.ext2fs.5; driver allows the &os; kernel to both read and write to ext2 file systems. This driver can also be used to access ext3 and ext4 file systems. However, ext3 journaling, extended attributes, and inodes greater than 128-bytes are not supported. Support for ext4 is read-only. To access an ext file system, first load the kernel loadable module: &prompt.root; kldload ext2fs Then, mount the ext volume by specifying its &os; partition name and an existing mount point. This example mounts /dev/ad1s1 on /mnt: &prompt.root; mount -t ext2fs /dev/ad1s1 /mnt XFS A &os; kernel can be configured to provide read-only support for XFS file systems. To compile in XFS support, add the following option to a custom kernel configuration file and recompile the kernel using the instructions in : options XFS Then, to mount an XFS volume located on /dev/ad1s1: &prompt.root; mount -t xfs /dev/ad1s1 /mnt The sysutils/xfsprogs package or port provides additional utilities, with man pages, for using, analyzing, and repairing XFS file systems. ReiserFS &os; provides read-only support for The Reiser file system, ReiserFS. To load the &man.reiserfs.5; driver: &prompt.root; kldload reiserfs Then, to mount a ReiserFS volume located on /dev/ad1s1: &prompt.root; mount -t reiserfs /dev/ad1s1 /mnt