First Ceph Cluster

May 29, 2014

ceph

Storage clustering systems have been an interest of mine for many years. The idea that you can put a bunch of similarly shaped systems together and make them behave as a unit is fascinating. One could think of systems biological terms; the kernel being analogous to the nucleus in a cell, bringing individual cells together to serve a higher purpose of the organ or nerve ending.

I’ve never really needed a high performance cluster, and when I studied the technologies in the area of HPC, I didn’t have data that needed processing. Though I did thirst for storage, and with it and, data redundancy.

An affinity for Kerberos drove some of the direction about the technologies I could choose to invest time in. There were others, but OpenAFS one out for its Kerberos reliance, strong ACL support and the fact that there were a few universities that use the project. It always seems like a good sign if a project is used by a few major universities. I don’t know why. The project still looks to be alive and kicking, but this post is about Ceph, which I’ve been keeping my eye on for many years.

Now here we are in the modern age, the Ceph project looks to be getting the attention it deserves. Its been a long time coming. Recently purchased by RedHat, Ceph is a very approachable filesystem. The moving parts are relatively few and simple enough to understand in terms of the daemons that need running, the documentation is up to date and there are packages for my distro of choice, which is always a bonus.

Proprietorship aside, I still wanted to pull some levers and push some buttons.

One of the cool things they did is build a quick-start script called ceph-deploy that helps bring the cluster online using only SSH. I’ve not seen this kind of detail around deploying a new software project in a while. It represents a level of sophistication that I don’t find in lots of other Open Source projects. I’ve found software to be almost too much about the software itself, and not what it means to deploy and maintain this sutff. Its not easy.

So I wrote some Fabric to deploy a small cluster for testing on a small cluster. I just convert the documentation into code, and using fabric is cool.

I figure if I do a thing I should try to make it reproducible. So in an effort to satisfy my own curiosity, perhaps someone else will find this useful as well.