Replication

ZFS supports replication, allowing datasets to be copied to remote systems for backup, disaster recovery, or load balancing. Replication can be performed either as a full copy or incrementally, transferring only the changes made since the last replication. ZFS replication is efficient and can work with snapshots to provide point-in-time copies of data across systems.

Setting Up and Managing Replication

To replicate a ZFS dataset, snapshots are used to capture the current state of the dataset. These snapshots are then sent to another system or disk using the zfs send and zfs receive commands. Replication can be performed over a network or to a local disk.

First, create a snapshot of the dataset:

$ sudo zfs snapshot mypool/mydataset@snapshotname

Once the snapshot is created, it can be replicated to another ZFS system. If replicating over the network, SSH is typically used to securely transfer the data. The following command sends the snapshot to the remote system remotesystem and receives it into the remotepool pool on that system:

$ sudo zfs send mypool/mydataset@snapshotname | ssh remotesystem sudo zfs receive remotepool/mydataset

This command transfers the entire snapshot of mypool/mydataset to remotepool/mydataset on the remote system. This is a full replication, meaning the entire dataset is copied.

Managing replication involves regularly creating snapshots and sending incremental changes to the remote system. For recurring replication, a cron job or external scheduling system can be used to automate snapshot creation and sending.

To view a list of snapshots for managing replication, use:

$ sudo zfs list -t snapshot

This command will display all snapshots available for replication.

Incremental vs. Full Replication

Full Replication involves transferring the entire dataset to the target system. This is done when there is no prior data on the remote system or when starting a new replication relationship. Full replication can be useful for the initial transfer of large datasets but may require significant time and resources.

$ sudo zfs send mypool/mydataset@snapshotname | ssh remotesystem sudo zfs receive remotepool/mydataset

This command transfers the full dataset to the target system.

Incremental Replication only sends the changes (or "diffs") between two snapshots, reducing the amount of data that needs to be transferred. This is more efficient when keeping the remote system synchronized with the local dataset after the initial full replication has been completed. Incremental replication requires two snapshots: one to mark the starting point (the previous replication) and one for the current state.

For example, if snapshot1 is the previous snapshot and snapshot2 is the most recent snapshot, the following command sends only the changes between these two snapshots:

$ sudo zfs send -i mypool/mydataset@snapshot1 mypool/mydataset@snapshot2 | ssh remotesystem sudo zfs receive remotepool/mydataset

This transfers only the differences between snapshot1 and snapshot2, making incremental replication much faster and more bandwidth-efficient compared to full replication.

Key Differences:

  • Full Replication: Transfers the entire dataset. Useful for initial replication or when a complete copy is needed.

  • Incremental Replication: Transfers only the changes between two snapshots. More efficient for regular backups or ongoing synchronization.

Both full and incremental replication can be automated using scripts or scheduling tools, allowing ZFS replication to serve as an efficient backup solution or disaster recovery strategy.