May 11

Linux: Gluster storage and replication options explained

One of the key issues I see with any storage clustering is the loss of available storage. If I have four servers and each server has a 1 TB drive, how I configure the cluster determines the amount of available storage space. It comes down to how much redundancy does a person want vs how much storage space.

Replication Examples:
4 servers
1 TB drive each server

Distributed(default) glusterfs volume:
gluster volume create test-volume transport tcp fscluster1:/exp1 fscluster2:/exp2 fscluster3:/exp1 fscluster4:/exp2

Total storage available = 4TB
Complete files are store on one of the four servers. If you lose a server, the files that were store on that server are now gone. All other files will remain on the servers that are still active.

Replicated gluster volume:
gluster volume create test-volume replica 4 transport tcp fscluster1:/exp1 fscluster2:/exp2 fscluster3:/exp1 fscluster4:/exp2

Total storage available = 1TB

When replicating with all of the  servers, one loses a lot of available storage. In this case 3/4ths of my space is in use, but I have incredible redundancy. I can lose up to three servers and still have all of my data.

Distributed/Replicated gluster volume:
gluster volume create test-volume replica 2 transport tcp fscluster1:/exp1 fscluster2:/exp2 fscluster3:/exp1 fscluster4:/exp2

Total storage available = 2TB

Group1
fscluster1 and fscluster2 = 1TB
Group2
fscluster3 and fscluster4 = 1TB

In this example there are two groups of servers.  The servers in a group are replicating the files with each other.   You can lose any single server in the group and your files are still completely available. When using round RRDNS, files can end up stored on either group. From a gluster client perspective they appear to be on one hard drive. In reality any single file is on two hard drives, on two servers that are in the same single group. If a server in the group goes down then the single server will write the file to the other group.

Stripes
The next two types are striped volumes. Neither offers any file redundancy as the files are strip across the servers drives. Stripe volumes theoretically would be faster do to having more “arms” doing the work. I have not tested this yet.

Striped gluster volume:
gluster volume create test-volume stripe 4 transport tcp fscluster1:/exp1 fscluster2:/exp2 fscluster3:/exp1 fscluster4:/exp2
Total storage available = 1TB

A single file is spread across all four servers.  All four servers need to be running to access all files.

Distributed striped glusterfs volume:
gluster volume create test-volume stripe 2 transport tcp fscluster1:/exp1 fscluster2:/exp2 fscluster3:/exp1 fscluster4:/exp2
Total storage available = 2TB

Group1
fscluster1 and fscluster2 = 1TB
Group2
fscluster3 and fscluster4 = 1TB

There are two groups.  A single file will be stored in one of the two groups, but across both of the servers in the group.  Both server in the group must be running to access the file that is store on it.  When using round RRDNS, files can end up stored on either group. From a gluster client perspective they appear to be on one hard drive. In reality any single file is spread across two hard drives, on two servers that are in the same single group. If a server in the group goes down then the single server will write the file to the other group.


Copyright 2021. All rights reserved.

Posted May 11, 2016 by Timothy Conrad in category "Linux

About the Author

If I were to describe myself with one word it would be, creative. I am interested in almost everything which keeps me rather busy. Here you will find some of my technical musings. Securely email me using - PGP: 4CB8 91EB 0C0A A530 3BE9 6D76 B076 96F1 6135 0A1B