HOWTO: Use GlusterFS for IMAP Spools

GlusterFS is a distributed filesystem with built-in redundancy and self-healing features, that allows individual storage volumes to be aggregated into larger storage volumes.

This HOWTO sets up a single Kolab server using an IMAP spool mounted over GlusterFS, as illustrated in GlusterFS Replicated Volume.

To illustrate the GlusterFS volume scaling, we expand this original GlusterFS volume in GlusterFS Distributed Replicated Volume.

The initial setup consists of the following systems:

  • System gfs1.example.org with a second disk volume vdb of 10GB and IP address 192.168.122.11.

  • System gfs2.example.org with a second disk volume vdb of 10GB and IP address 192.168.122.12.

  • System kolab.example.org.

The IN A address for gfs.example.org is made to resolve to the .11 and .12 IP addresses.

GlusterFS Replicated Volume

The initial setup looks as follows:

digraph {
        nodesep=1

        "Kolab Server" -> "GlusterFS"

        subgraph cluster_gluster {
                "GlusterFS" -> "Brick #1", "Brick #2";

                subgraph {
                        rank=same;
                        "Brick #1" -> "Brick #2" [dir=both];
                    }
            }
    }

In this scenario, the Kolab server uses a GlusterFS volume mount for its IMAP spool, that is redundant as both bricks contain the same data.

  1. Partition /dev/vdb on gfs1 and gfs2 as follows:

    # parted /dev/vdb
    GNU Parted 3.1
    Using /dev/vdb
    Welcome to GNU Parted! Type 'help' to view a list of commands.
    # mklabel gpt
    Warning: The existing disk label on /dev/vdb will be destroyed and all data on this disk will be lost.
    Do you want to continue? Yes/No? yes
    # unit GB
    # mkpart primary 0GB 10GB
    # set 1 lvm on
  2. Create a physical volume, then a volume group, then a logical volume on both gfs1 and gfs2:

    # pvcreate /dev/vdb
    # vgcreate vg_gfs /dev/vdb
    # lvcreate -L 9GB -n lv_brick vg_gfs

    Note

    The logical volume lv_brick leaves 10% of the volume group unused for two purposes:

    1. Filesystem checks can be performed on a logical volume snapshot, without interrupting the storage availability, and

    2. Backups can be made using logical volume snapshots without interrupting storage availability.

  3. On both gfs1 and gfs2, create a filesystem on the new logical volume:

    # mkfs.ext4 /dev/vg_gfs/lv_brick
  4. Create a mount point for the filesystem:

    # mkdir -p /srv/gfs
  5. Configure the mount to be made on system startup and mount:

    # echo "/dev/vg_gfs/lv_brick /srv/gfs ext4 defaults 1 2" >>     /etc/fstab
    # mount -a
  6. Create the directory to be exported as a brick:

    # mkdir -p /srv/gfs/brick

    Warning

    Do not use the filesystem root directory /srv/gfs/ as the brick to export, for its lost+found/ directory will be rendered corrupt and useless.

  7. Install the glusterfs, glusterfs-fuse and glusterfs-server packages on gfs1 and gfs2:

    # yum -y install glusterfs{,-fuse,-server}
  8. Start the glusterd service and configure it to start when the system boots:

    # service glusterd start
    # chkconfig glusterd on
  9. Use gfs1 and probe the other GlusterFS node:

    # gluster peer probe gfs2.example.org
  10. Create the GlusterFS volume to provide to kolab.example.org:

    # gluster volume create imap0     gfs1.example.org:/srv/gfs/brick/ gfs2.example.org:/srv/gfs/brick/
  11. Start the new volume:

    # gluster volume start imap0
  12. Continue with Configuring the GlusterFS Client.

GlusterFS Distributed Replicated Volume

This part of the HOWTO assumes we are expanding a GlusterFS Replicated Volume and you already have followed Configuring the GlusterFS Client.

We’ll be expanding the GlusterFS storage volume from 10GB to 20GB, by configuring the GlusterFS volume to become a distributed volume (on top of being replicated).

The number of nodes required for this is 4 – distributing files over two bricks, each of which replicate with a replica brick. We will therefore add nodes:

  • System gfs3.example.org with a second disk volume vdb of 10GB and IP address 192.168.122.13.

  • System gfs4.example.org with a second disk volume vdb of 10GB and IP address 192.168.122.14.

  1. Partition /dev/vdb on gfs3 and gfs4 as follows:

    # parted /dev/vdb
    GNU Parted 3.1
    Using /dev/vdb
    Welcome to GNU Parted! Type 'help' to view a list of commands.
    # mklabel gpt
    Warning: The existing disk label on /dev/vdb will be destroyed and all data on this disk will be lost.
    Do you want to continue? Yes/No? yes
    # unit GB
    # mkpart primary 0GB 10GB
    # set 1 lvm on
  2. Create a physical volume, then a volume group, then a logical volume on both gfs3 and gfs4:

    # pvcreate /dev/vdb
    # vgcreate vg_gfs /dev/vdb
    # lvcreate -L 9GB -n lv_brick vg_gfs

    Note

    The logical volume lv_brick leaves 10% of the volume group unused for two purposes:

    1. Filesystem checks can be performed on a logical volume snapshot, without interrupting the storage availability, and

    2. Backups can be made using logical volume snapshots without interrupting storage availability.

  3. On both gfs3 and gfs4, create a filesystem on the new logical volume:

    # mkfs.ext4 /dev/vg_gfs/lv_brick
  4. Create a mount point for the filesystem:

    # mkdir -p /srv/gfs
  5. Configure the mount to be made on system startup and mount:

    # echo "/dev/vg_gfs/lv_brick /srv/gfs ext4 defaults 1 2" >>     /etc/fstab
    # mount -a
  6. Create the directory to be exported as a brick:

    # mkdir -p /srv/gfs/brick

    Warning

    Do not use the filesystem root directory /srv/gfs/ as the brick to export, for its lost+found/ directory will be rendered corrupt and useless.

  7. Install the glusterfs, glusterfs-fuse and glusterfs-server packages on gfs3 and gfs4:

    # yum -y install glusterfs{,-fuse,-server}
  8. Start the glusterd service and configure it to start when the system boots:

    # service glusterd start
    # chkconfig glusterd on
  9. Use gfs1 and probe the new GlusterFS nodes:

    # gluster peer probe gfs3.example.org
    # gluster peer probe gfs4.example.org
  10. Add the new bricks to the existing volume:

    # gluster volume add-brick imap0     gfs3.example.org:/srv/gfs/brick gfs4.example.org:/srv/gfs/brick
  11. Rebalance the bricks (use gfs1 or gfs2):

    # gluster volume rebalance imap0 start
    # watch -n 1 gluster volume rebalance imap0 status
  12. When the rebalancing of the volume has been completed, remounting the volume on the GlusterFS client(s) makes it appreciate the change in storage volume.

    # mount -o remount /var/spool/imap/
digraph {
        nodesep=1

        "Kolab Server" -> "GlusterFS"

        subgraph cluster_gluster {
                "GlusterFS" -> "Brick #1", "Brick #2", "Brick #3", "Brick #4";

                subgraph {
                        rank=same;
                        "Brick #1" -> "Brick #2" [dir=both];
                        "Brick #3" -> "Brick #4" [dir=both];
                    }
            }
    }

Configuring the GlusterFS Client

Using kolab.example.org, this procedure configures the GlusterFS client to mount the imap0 volume.

  1. Install the glusterfs and glusterfs-fuse packages:

    # yum -y install glusterfs{,-fuse}
  2. Configure the mount to be made on system startup and mount:

    # echo "gfs.example.org:/imap0 /var/spool/imap/ glusterfs defaults,_netdev 0 0" >> /etc/fstab
    # mount -a -t glusterfs
  3. Change the directory ownership back to its original owner and group:

    # chown cyrus:mail /var/spool/imap/
    # chmod 750 /var/spool/imap/

FAQ

What happens when a GlusterFS node fails?

In a replica n volume, n-1 nodes can fail. For each individual brick, at least one replica must stay alive.

In situations where you might expect or are required take into account the failure of multiple nodes (that are replicas) simultaneously, such as might be the case when using old desktop PCs for your storage, you should increase the number of replicas.

There is a significant initial performance hit for the GlusterFS client, as it merely starts to realize one of the volume’s bricks is no longer available.

The write performance should not be impacted significantly, but the read performance is – not unlike with RAID 1 replicated disk volume.

You can find peers that are unavailable as being disconnected:

# gluster peer status
Number of Peers: 3

Hostname: gfs2.example.org
Uuid: 5e68482a-4164-4cfb-af2c-61a64cf894a7
State: Peer in Cluster (Connected)

Hostname: gfs3.example.org
Uuid: 89073c71-1cf7-4d6e-af93-dab8f13cee14
State: Peer in Cluster (Disconnected)

Hostname: gfs4.example.org
Uuid: fb7db59d-aaee-4dcc-98e3-c852243c8024
State: Peer in Cluster (Connected)

When the node comes back online, it will automatically repair itself before it is deemed connected. During the downtime, and during the repair, it is crucially important the other replica(s) does not fail as well.

Replica x, Distribute y - how much storage, how many nodes?

The total storage volume available is impacted most significantly by the number of replicas – the distribution is a JBOD aggregation of volumes.