HOWTO: Use GlusterFS for IMAP Spools¶
GlusterFS is a distributed filesystem with built-in redundancy and self-healing features, that allows individual storage volumes to be aggregated into larger storage volumes.
This HOWTO sets up a single Kolab server using an IMAP spool mounted over GlusterFS, as illustrated in GlusterFS Replicated Volume.
To illustrate the GlusterFS volume scaling, we expand this original GlusterFS volume in GlusterFS Distributed Replicated Volume.
The initial setup consists of the following systems:
System
gfs1.example.org
with a second disk volume vdb of 10GB and IP address 192.168.122.11.System
gfs2.example.org
with a second disk volume vdb of 10GB and IP address 192.168.122.12.System
kolab.example.org
.
The IN A
address for gfs.example.org
is made to resolve to the .11
and .12 IP addresses.
GlusterFS Replicated Volume¶
The initial setup looks as follows:
In this scenario, the Kolab server uses a GlusterFS volume mount for its IMAP spool, that is redundant as both bricks contain the same data.
Partition
/dev/vdb
ongfs1
andgfs2
as follows:# parted /dev/vdb GNU Parted 3.1 Using /dev/vdb Welcome to GNU Parted! Type 'help' to view a list of commands. # mklabel gpt Warning: The existing disk label on /dev/vdb will be destroyed and all data on this disk will be lost. Do you want to continue? Yes/No? yes # unit GB # mkpart primary 0GB 10GB # set 1 lvm on
Create a physical volume, then a volume group, then a logical volume on both
gfs1
andgfs2
:# pvcreate /dev/vdb # vgcreate vg_gfs /dev/vdb # lvcreate -L 9GB -n lv_brick vg_gfs
Note
The logical volume
lv_brick
leaves 10% of the volume group unused for two purposes:Filesystem checks can be performed on a logical volume snapshot, without interrupting the storage availability, and
Backups can be made using logical volume snapshots without interrupting storage availability.
On both
gfs1
andgfs2
, create a filesystem on the new logical volume:# mkfs.ext4 /dev/vg_gfs/lv_brick
Create a mount point for the filesystem:
# mkdir -p /srv/gfs
Configure the mount to be made on system startup and mount:
# echo "/dev/vg_gfs/lv_brick /srv/gfs ext4 defaults 1 2" >> /etc/fstab # mount -a
Create the directory to be exported as a brick:
# mkdir -p /srv/gfs/brick
Warning
Do not use the filesystem root directory
/srv/gfs/
as the brick to export, for itslost+found/
directory will be rendered corrupt and useless.Install the
glusterfs
,glusterfs-fuse
andglusterfs-server
packages ongfs1
andgfs2
:# yum -y install glusterfs{,-fuse,-server}
Start the glusterd service and configure it to start when the system boots:
# service glusterd start # chkconfig glusterd on
Use
gfs1
and probe the other GlusterFS node:# gluster peer probe gfs2.example.org
Create the GlusterFS volume to provide to
kolab.example.org
:# gluster volume create imap0 gfs1.example.org:/srv/gfs/brick/ gfs2.example.org:/srv/gfs/brick/
Start the new volume:
# gluster volume start imap0
Continue with Configuring the GlusterFS Client.
GlusterFS Distributed Replicated Volume¶
This part of the HOWTO assumes we are expanding a GlusterFS Replicated Volume and you already have followed Configuring the GlusterFS Client.
We’ll be expanding the GlusterFS storage volume from 10GB to 20GB, by configuring the GlusterFS volume to become a distributed volume (on top of being replicated).
The number of nodes required for this is 4 – distributing files over two bricks, each of which replicate with a replica brick. We will therefore add nodes:
System
gfs3.example.org
with a second disk volume vdb of 10GB and IP address 192.168.122.13.System
gfs4.example.org
with a second disk volume vdb of 10GB and IP address 192.168.122.14.
Partition
/dev/vdb
ongfs3
andgfs4
as follows:# parted /dev/vdb GNU Parted 3.1 Using /dev/vdb Welcome to GNU Parted! Type 'help' to view a list of commands. # mklabel gpt Warning: The existing disk label on /dev/vdb will be destroyed and all data on this disk will be lost. Do you want to continue? Yes/No? yes # unit GB # mkpart primary 0GB 10GB # set 1 lvm on
Create a physical volume, then a volume group, then a logical volume on both
gfs3
andgfs4
:# pvcreate /dev/vdb # vgcreate vg_gfs /dev/vdb # lvcreate -L 9GB -n lv_brick vg_gfs
Note
The logical volume
lv_brick
leaves 10% of the volume group unused for two purposes:Filesystem checks can be performed on a logical volume snapshot, without interrupting the storage availability, and
Backups can be made using logical volume snapshots without interrupting storage availability.
On both
gfs3
andgfs4
, create a filesystem on the new logical volume:# mkfs.ext4 /dev/vg_gfs/lv_brick
Create a mount point for the filesystem:
# mkdir -p /srv/gfs
Configure the mount to be made on system startup and mount:
# echo "/dev/vg_gfs/lv_brick /srv/gfs ext4 defaults 1 2" >> /etc/fstab # mount -a
Create the directory to be exported as a brick:
# mkdir -p /srv/gfs/brick
Warning
Do not use the filesystem root directory
/srv/gfs/
as the brick to export, for itslost+found/
directory will be rendered corrupt and useless.Install the
glusterfs
,glusterfs-fuse
andglusterfs-server
packages ongfs3
andgfs4
:# yum -y install glusterfs{,-fuse,-server}
Start the glusterd service and configure it to start when the system boots:
# service glusterd start # chkconfig glusterd on
Use
gfs1
and probe the new GlusterFS nodes:# gluster peer probe gfs3.example.org # gluster peer probe gfs4.example.org
Add the new bricks to the existing volume:
# gluster volume add-brick imap0 gfs3.example.org:/srv/gfs/brick gfs4.example.org:/srv/gfs/brick
Rebalance the bricks (use
gfs1
orgfs2
):# gluster volume rebalance imap0 start # watch -n 1 gluster volume rebalance imap0 status
When the rebalancing of the volume has been completed, remounting the volume on the GlusterFS client(s) makes it appreciate the change in storage volume.
# mount -o remount /var/spool/imap/
Configuring the GlusterFS Client¶
Using kolab.example.org
, this procedure configures the GlusterFS client to
mount the imap0
volume.
Install the
glusterfs
andglusterfs-fuse
packages:# yum -y install glusterfs{,-fuse}
Configure the mount to be made on system startup and mount:
# echo "gfs.example.org:/imap0 /var/spool/imap/ glusterfs defaults,_netdev 0 0" >> /etc/fstab # mount -a -t glusterfs
Change the directory ownership back to its original owner and group:
# chown cyrus:mail /var/spool/imap/ # chmod 750 /var/spool/imap/
FAQ¶
What happens when a GlusterFS node fails?¶
In a replica n volume, n-1 nodes can fail. For each individual brick, at least one replica must stay alive.
In situations where you might expect or are required take into account the failure of multiple nodes (that are replicas) simultaneously, such as might be the case when using old desktop PCs for your storage, you should increase the number of replicas.
There is a significant initial performance hit for the GlusterFS client, as it merely starts to realize one of the volume’s bricks is no longer available.
The write performance should not be impacted significantly, but the read performance is – not unlike with RAID 1 replicated disk volume.
You can find peers that are unavailable as being disconnected:
# gluster peer status Number of Peers: 3 Hostname: gfs2.example.org Uuid: 5e68482a-4164-4cfb-af2c-61a64cf894a7 State: Peer in Cluster (Connected) Hostname: gfs3.example.org Uuid: 89073c71-1cf7-4d6e-af93-dab8f13cee14 State: Peer in Cluster (Disconnected) Hostname: gfs4.example.org Uuid: fb7db59d-aaee-4dcc-98e3-c852243c8024 State: Peer in Cluster (Connected)
When the node comes back online, it will automatically repair itself before it is deemed connected. During the downtime, and during the repair, it is crucially important the other replica(s) does not fail as well.
Replica x, Distribute y - how much storage, how many nodes?¶
The total storage volume available is impacted most significantly by the number of replicas – the distribution is a JBOD aggregation of volumes.