What is ZFS?
ZFS is an advanced file system that originated at Sun Microsystems for use with their Solaris operating system. Following Oracle’s acquisition of Sun in 2009, ZFS is now under Oracle Corporation ownership.
However, in a typical act of altruism, from 2005 onwards, Sun released an open source version of ZFS. Inevitably, this was ported to Linux where it gained wider exposure. The open source version of ZFS—openZFS—is managed and maintained by the OpenZFS project.
ZFS is a high-capacity, fault-tolerant file system. ZFS originally stood for Zettabyte File System. The ZFS architecture is based on 128 bits instead of the more common 64 bits of other files system. Being able to work with larger numeric values is one of the factors that made ZFS capable of handling zettabytes of storage. To give you an idea of what that means, a zettabyte is a billion terabytes.
Nowadays, ZFS supports file storage of up to 256 zebibytes. A zebibyte (270 bytes) is larger than a zettabyte (1021 bytes), but not by an order of magnitude. There’s much more to ZFS than sheer capacity—as mind-boggling as that is. ZFS functions as its own volume manager and RAID controller. It has built-in functions such as true copy on write that protect your data from corruption. It combines features that deliver file system pooling, cloning and copying, and its RAID-like functionality, natively.
Ubuntu has offered ZFS for some years now, but always with warnings and caveats. In Ubuntu 20.10 the warnings were removed. Canonical officially supports ZFS but only in full disk setups. To get that support you’ll need to install ZFS as you install Ubuntu. The ZFS options are still tucked away, but they’re there and no longer just for the intrepid or foolhardy.
With Ubuntu 21.10 due in October 2021, it’s a good time to see how the ZFS offering in Ubuntu is maturing.
RELATED: How to Install and Use ZFS on Ubuntu (and Why You’d Want To)
Finding the ZFS Options During Installation
During the Ubuntu install the “Installation Type” screen lets you choose to erase the disk you’re installing Ubuntu on or to do something else. Click the “Advanced Features” button.
The “Advanced Features” dialog appears.
Select the “Erase Disk and Use ZFS” radio button, and click the “OK” button.
The “Installation Type” screen will display the “ZFS Selected” to show that you’ve chosen to use ZFS.
Click the “Continue” button and complete the installation as usual.
If you have several hard drives installed in your computer you’ll be able to choose how you want them to be used by ZFS. Ubuntu will offer a suggested configuration, but you can adjust things to suit yourself.
But what if you add some hard drives once you’ve installed Ubuntu? How do you configure ZFS to use the new storage? That’s what we’ll look at next.
Adding Extra Hard Drives
We installed Ubuntu with ZFS on the single hard drive of the test machine we used to research this article. We added two more hard drives, giving the computer three hard drives in total. One hard drive had Ubuntu installed on it, and the two new drives were blank, unformatted, and unmounted.
The first thing we need to do is identify how Ubuntu is referring to the new hard drives. The lsblk command lists all block devices installed in your computer. We can be specific about which columns of output we want to see in the results.
The -o (output) option is followed by the columns we want to see. We chose:
name: The name Ubuntu uses to refer to the hard drive. size: The size of the hard drive. If the hard drive has more than one partition, they are all listed and the size of each partition is shown. fstype: The file system that is one the hard drive or partition. type: Whether the line refers to a disk, partition, CD-ROM drive, or loopback pseudo-device. mountpoint: The mount point of the file system on the hard drive or partition.
There are a bunch of squashfs loopback devices, numbered loop0 throughloop6. Each time you install a snap application, one of these pseudo-devices is created. It is part of the encapsulation and sandboxing that snap wraps around each snap application.
The first hard drive is listed as /dev/sda. It’s a 32 GB drive with five partitions on it, listed as /dev/sda1 through /dev/sda5. They’re formatted in different ways. This is the drive that was in the computer when we installed Ubuntu.
Our two new hard drives are listed as /dev/sdb and /dev/sdc. They’re 32 GB drives too, but they’re not formatted and they’re not mounted.
Pools, RAID 0, RAID 1
To utilize the new hard rives we add them to a pool. You can add as many drives to a pool as you like. There are two ways to do this. You can configure the pool so that you can use all of the storage space of each hard drive in a RAID 0 configuration, or you can configure them so that the pool only offers the amount of storage space of the smallest hard drive in the pool, in a RAID 1 configuration.
The advantage of RAID 0 is space. But the preferred—and the very highly recommended—configuration is RAID 1. RAID 1 mirrors the data across all the drives in the pool. That means you can have a hard drive failure and the file system and your data are still safe and your computer is still functional. You can replace the stricken drive and add the new drive to your pool.
By contrast, with RAID 0 a single hard drive failure renders your system inoperable until you replace the stricken drive and perform a restore from your backups.
The more drives you have in a RAID 1 pool the more robust it is. The minimum you need for RAID 1 is two drives. A failure in either drive would be an inconvenience, but not a disaster. But a failure of both hard drives at the same time would be a bigger problem, of course. So the answer would appear to be pooling as many hard drives as you can spare.
But of course, in practice, there is a limit to how many drives you’ll want—or can afford to—allocate to a single pool. If you have eight spare hard drives, setting up two four-drive RAID 1 pools is probably a better use of the hardware than a single eight-drive pool. And remember, a RAID 1 pool can only offer the storage of the smallest hard drive in the pool, so always try to use drives of the same size in a single pool.
Creating a RAID 1 Pool
We’ve identified our new hard drives as /dev/sdb and /dev/sdc . To create a ZFS RAID 1 pool, we use this command:
The components of the command are:
sudo: We’re changing the system configuration so we need to use sudo to get root privileges. zpool: This is the ZFS pool management command. create: This is the action we want zpool to carry out for us. cloudsavvyit: This is the name fo the pool we wish to create. mirror: We want to have our data mirrored across all drives, giving us a RAID 1 pool. Omitting the “mirror” option creates a RAID 0 pool. /dev/sdb: The first of our new hard drives. /dev/sdc: The second of our new hard drives.
Replace “cloudsavvyit” with the name you want to call your pool, and replace /dev/sdb and /dev/sdc with the identifiers of your new hard drives.
Creating a pool is a little anti-climactic. If all goes well you’re unceremoniously returned to the command prompt. We can use the status action with the zpool command to see the status of our new pool.
Our new pool has been created, it is online, our two new drives are in the pool, and there are no errors. That all looks great. But where is the pool? Let’s see if lsblk will show us where it has been mounted.
We can see that our new hard drives /dev/sdb and /dev/sdc have been partitioned with two partitions each, but no mount point is listed for them. Pools aren’t mounted like regular hard drives. For example, there’s no entry in the /etc/fstab file for ZFS pools. By default, a mount point is created in the root directory. It has the same name as the pool.
If you want to have the mount point created somewhere else, use the -m (mount point) option when you’re creating the pool, and provide the path to where you’d like the mount point to be created. You can also give the mount point a different name.
Giving Users Access to the Pool
The pool exists, but only the root user can store data in it. That’s not what we need, of course. We want other users to be able to access the pool.
To achieve this we will:
Create a directory in the pool. Create a new group. Set the new group to be the group owner of the directory. Add users that need to access the data storage to new the group.
This scheme provides great flexibility. We can create as many data storage directories as we need, with different groups owning them. Giving users access to the different storage areas is as simple as adding them to the appropriate groups.
We’ll use groupadd to create a user group. Our group is called “csavvy1”. We’ll then use the usermod command to add a user called “dave” to the new group. The -a (append) option adds the new group to the list of existing groups that the user is in. Without this option, the user is removed from all existing groups and added to the new one. That’ll cause problems, so make sure you use the -a option.
So that their new group membership becomes effective, the user must log out and back in again.
Now we’ll create a directory in the pool, called “data1.”
The chgrp command lets us set the group owner of the directory.
Finally, we’ll set the group permissions using chmod . The “s” is the SGID special bit. It means that files and directories created within the “data1” directory will inherit the group owner of this directory.
Our user has logged out and back in. Let’s try to create a file in the new data storage directory in our new RAID 1 ZFS pool.
And let’s see it was created.
Success. What if we try to create another file outside of our data1 storage area?
This fails as expected. Our permissions are working. Our user is only able to manipulate files in the data storage directory that he has been given permission to access.
RELATED: How to Use SUID, SGID, and Sticky Bits on Linux
Destroy a Pool
Be careful with this command. Make sure you have backups before you proceed. If you’re sure you really want to and you’ve verified you have other copies of the data in the pool, you can destroy a pool with this command:
Replace “cloudsavvyit” with the name of the pool you’re going to destroy.
You Only Have One Hard Drive?
If you only have one hard drive, or if you’re computer has multiple hard drives but their size varies too much to form a useful pool, you can still use ZFS. You won’t get RAID mirroring, but the built-in anti-corruption and data protection mechanisms are still worthwhile and persuasive features.
But remember, no file system—with or without RAID mirroring—means you can ignore backups.
RELATED: Backups vs. Redundancy: What’s the Difference?