Understanding ZFS Pool – Part 1 of 2 – Tutorial

Oracle Solaris 0 Comments

Zpool or zfs pool are the base layer on which we create zfs file system, this pool can be constructed in different way to accomplish the desired redundancy.
ZFS pool can be created in three different RAID configuration:

    • RAID 0: No protection is available using this configuration, as you write data it get distributed across the disks
    • RAID 1: Also known as mirroring, it require two disks at a minimum. Data written in the first disk get copied to the second one. We don’t lose data if we lose a single disk.
    • RAID Z: it’s a RAID 5 variation where we have 3 protections schema
  1. raidz: it’s a single parity disk group that can sustain one disk failure
  2. raidz2: it’s a double parity disk group that can sustain two disk failure
  3. raidz3: it’s a triple parity disk group that can sustain three disk failure without losing data.

After we create a zfs pool, space allocation is managed through ZFS, when we create a zfs file system, we don’t need to specify its size, block get allocated by the kernel as needed by applications.
If we need to control this behaviour, we can set reservation and quota to avoid that some applications consume all the space available in the zfs file system.

Creating a ZFS Pool

Here we have 4 disks that we can use to create our zfs pool, the first one c1t0d0 is the first disk used in the rpool, which contain the OS.

root@sol01:~# echo | format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c1t0d0 
          /pci@0,0/pci8086,2829@d/disk@0,0
       1. c1t2d0 
          /pci@0,0/pci8086,2829@d/disk@2,0
       2. c1t3d0 
          /pci@0,0/pci8086,2829@d/disk@3,0
       3. c1t4d0 
          /pci@0,0/pci8086,2829@d/disk@4,0
       4. c1t5d0 
          /pci@0,0/pci8086,2829@d/disk@5,0

Let’s create our first zfs pool with raid0 configuration

root@sol01:~# zpool create mypool  c1t2d0 c1t3d0
'mypool' successfully created, but with no redundancy; failure of one
device will cause loss of the pool
root@sol01:~# zpool list
NAME     SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
mypool   374M   172K   374M   0%  1.00x  ONLINE  -
rpool   15.6G  6.62G  9.00G  42%  1.00x  ONLINE  -

As stated above this create a raid-0 configuration which provide no redundancy.
Disk partition structure in a ZFS Pool

When we create a pool, zfs use the efi partition schema to label the disks, the output from prtvtoc show the number of partition, and the first starting sector which is usally 34 for EFI partition.

Here is the output from a disk with an EFI label:

root@sol01:~# prtvtoc /dev/dsk/c1t4d0
* /dev/dsk/c1t4d0 partition map
*
* Dimensions:
*     512 bytes/sector
* 409600 sectors
* 409533 accessible sectors
*
* Flags:
*   1: unmountable
*  10: read-only
*
*                          First     Sector    Last
* Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
       0      4    00         34    393149    393182
       8     11    00     393183     16384    409566

And the same disk with an SMI label:

root@sol01:~# prtvtoc /dev/dsk/c1t4d0s0
* /dev/dsk/c1t4d0s0 partition map
*
* Dimensions:
*     512 bytes/sector
*      32 sectors/track
*      64 tracks/cylinder
*    2048 sectors/cylinder
*     199 cylinders
*     197 accessible cylinders
*
* Flags:
*   1: unmountable
*  10: read-only
*
* Unallocated space:
*       First     Sector    Last
*       Sector     Count    Sector
*        2048    401408    403455
*
*                          First     Sector    Last
* Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
       2      5    01          0    403456    403455
       8      1    01          0      2048      2047

Listing ZFS Pool Health Status
Using zpool list we can view the current configured pool on our host, if you need more details about the health status , use zpool status to check for the current raid configuration and to check for any error.

root@sol01:~# zpool list
NAME     SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
mypool   374M  95.5K   374M   0%  1.00x  ONLINE  -
rpool   15.6G  6.62G  9.01G  42%  1.00x  ONLINE  -
root@sol01:~# zpool status mypool
  pool: mypool
 state: ONLINE
  scan: none requested
config:

        NAME      STATE     READ WRITE CKSUM
        mypool    ONLINE       0     0     0
          c1t2d0  ONLINE       0     0     0
          c1t3d0  ONLINE       0     0     0

errors: No known data errors

Another usefull command that allow to quickly check the current pool health status is zpool status -xv

root@sol01:~# zpool status -xv
all pools are healthy

Adding Storage Space to ZFS Pool
Let’s increase the pool storage space using one of our free disks

root@sol01:~# zpool add mypool c1t4d0
root@sol01:~# zpool status mypool
  pool: mypool
 state: ONLINE
  scan: none requested
config:

        NAME      STATE     READ WRITE CKSUM
        mypool    ONLINE       0     0     0
          c1t2d0  ONLINE       0     0     0
          c1t3d0  ONLINE       0     0     0
          c1t4d0  ONLINE       0     0     0

errors: No known data errors

 

root@sol01:~# zpool list
NAME     SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
mypool   561M   142K   561M   0%  1.00x  ONLINE  -
rpool   15.6G  6.62G  9.01G  42%  1.00x  ONLINE  -

As you can see the pool size was increased from 374M to 561M.

Creating a Protected ZFS Pool

First i will destroy the pool and re-create it using a single disk

root@sol01:~# zpool destroy mypool
root@sol01:~# zpool create mypool c1t2d0
root@sol01:~# zpool status mypool
  pool: mypool
 state: ONLINE
  scan: none requested
config:

        NAME      STATE     READ WRITE CKSUM
        mypool    ONLINE       0     0     0
          c1t2d0  ONLINE       0     0     0

This provide a raid 0 protection level, to convert it to a mirrored configuration, we can use the zpool attach command

root@sol01:~# zpool attach mypool c1t2d0 c1t3d0
root@sol01:~# zpool status mypool
  pool: mypool
 state: ONLINE
  scan: resilvered 65.5K in 0h0m with 0 errors on Tue Feb  9 21:50:58 2016
config:

        NAME        STATE     READ WRITE CKSUM
        mypool      ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c1t2d0  ONLINE       0     0     0
            c1t3d0  ONLINE       0     0     0

We can enhance this configuration by adding a spare disk that can be used in case of a disk failure

root@sol01:~# zpool add mypool spare  c1t4d0
root@sol01:~# zpool status mypool
  pool: mypool
 state: ONLINE
  scan: resilvered 68.5K in 0h0m with 0 errors on Tue Feb  9 21:58:49 2016
config:

        NAME        STATE     READ WRITE CKSUM
        mypool      ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c1t2d0  ONLINE       0     0     0
            c1t3d0  ONLINE       0     0     0
        spares
          c1t4d0    AVAIL

Replacing a Failed Device in a ZFS Pool

Let’s break this mirror and see how zfs will handle this

root@sol01:~# dd if=/dev/zero of=/dev/rdsk/c1t2d0p0 bs=1024k count=16
16+0 records in
16+0 records out
root@sol01:~# zpool scrub mypool;zpool status mypool
  pool: mypool
 state: ONLINE
  scan: scrub in progress since Tue Feb  9 22:15:13 2016
    8.34M scanned out of 50.5M at 2.93M/s, 0h0m to go
    19.5K repaired, 16.50% done
config:

        NAME        STATE     READ WRITE CKSUM
        mypool      ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c1t2d0  ONLINE       0     0    29  (repairing)
            c1t3d0  ONLINE       0     0     0
        spares
          c1t4d0    AVAIL

root@sol01:~# zpool status mypool
  pool: mypool
 state: DEGRADED
status: One or more devices has been diagnosed as degraded. An attempt
        was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or 'fmadm repaired', or replace the device
        with 'zpool replace'.
        Run 'zpool status -v' to see device specific details.
  scan: resilvered 50.2M in 0h0m with 0 errors on Tue Feb  9 22:15:29 2016
config:

        NAME          STATE     READ WRITE CKSUM
        mypool        DEGRADED     0     0     0
          mirror-0    DEGRADED     0     0     0
            spare-0   DEGRADED     0     0     0
              c1t2d0  DEGRADED     0     0    29
              c1t4d0  ONLINE       0     0     0
            c1t3d0    ONLINE       0     0     0
        spares
          c1t4d0      INUSE

As we can see the zfs detect data inconsistency in the mirror and start repairing the mirror.
The failed disk got replaced by the spare after that.
To restore the zfs pool to a healthy state, we need to retire the old failed disk and replaced it with a new one.

root@sol01:~# zpool replace mypool c1t2d0 c1t5d0
root@sol01:~# zpool status mypool
  pool: mypool
 state: ONLINE
  scan: resilvered 50.2M in 0h0m with 0 errors on Tue Feb  9 22:20:19 2016
config:

        NAME        STATE     READ WRITE CKSUM
        mypool      ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c1t5d0  ONLINE       0     0     0
            c1t3d0  ONLINE       0     0     0
        spares
          c1t4d0    AVAIL

The disk got changed and the spare returned to it initial state.
Setting ZFS Pool Properties

Some properties can be set at pool level, others at zfs datasets level.

The three most changed properties are:

  1. autoreplace: which allow automatic disk replacement
  2. autoexpand: which allow automatic disk expansion
  3. listsnaps: as seen in Understanding zfs snapshot, it allow the snapshot to be listed without using the zfs list -t snapshot.

zfs pool properties

 

 

 

 

 

 

 

 

 

On the dataset, the most interesting one are: reservation, quota, deduplication and compression. Use “zfs get all pool_name” to get all the properties.

 

Find this useful! Be Sociable, Share your Knowledge!

Leave a Reply

Your email address will not be published. Required fields are marked *