When K&C’s DevOps engineers build a docker cluster on a physical (bare-metal) server, the CEPHfs vs NFS question often arises. Which of the two distributed file storage systems should we use to store persistent data that should be available to all of the cluster’s servers? Without such storage, the whole concept of docker containers disappears because only, in this case, will the cluster function in high availability mode. Moreover, an application placed on the cluster’s worker node should get access to our data storage and proceed to work in case of a dropout, loss, or unavailability of one of the servers in the data storage cluster.
That a persistent storage solution is necessary, is clear. The question then becomes Cephfs or NFS as the optimal solution? The NFS – Network File System is one of the most commonly used data storage systems to meet our minimum requirements. It provides transparent access to files and server file systems. And it enables any client application able to work with a local file to also work with an NFS-file without any program modification.
From the scheme above, you can see that the NFS server contains data which is available to every server in the cluster. The given scheme works well for projects involving modest data volumes and without the requirement for high-speed input/output.
The whole load goes to the hard drive, which is on the NFS server and to which all other servers on the cluster call and perfors read/record operations.
A single endpoint to a server with data. In the case of a data server dropout, the possibility that our application will also fail increases respectively.
CEPH is one of the most advanced and popular distributed file systems and object storage system. It is a software-defined remote file system with an open source, which belongs to open source DevOps specialists, the Red Hat Company
1)RADOS – as an object.
2)RBD – as a block device.
3)CephFS – as a file, POSIX-compliant filesystem.
Access to the distributed storage of RADOS objects is given with the help of the following interfaces:
1)RADOS Gateway – Swift and Amazon-S3 compatible RESTful interface.
2)librados and the related C/C++ bindings.
3)rbd and QEMU-RBD – linux kernel and QEMU block.
Here you can see how data placement is implemented in the CEPH cluster with the replication x2:
And here you can see how data is restored inside the cluster in case of the loss of a CEPH cluster node:
CEPH has gained a wide audience and some well known companies make use of CEPHfs:
CEPH’s primary requirement of the infrastructure is the availability of a sustainable network connection between a cluster’s servers. The minimal requirement to the network is the presence of 1 Gb/s communications link between servers. With this, it’s recommended to use network interfaces with the bandwidth 10 GB/s.
From our experience of building CEPH clusters, it’s worth mentioning that the network infrastructure requirements cab cause bottlenecks in Docker clusters. Any problems in the network infrastructure can lead to delays in the receipt of data by customers, as well as slow down the cluster and lead to a rebalancing of data within the cluster. We recommend to place the CEPH cluster servers in one server rack, and also make connections between the servers with the help of additional internal network interfaces.
Our experience at K&C also includes clusters built with the network channels at 1GB/s. These are not connected with internal interfaces and replaced in different server racks, which in turn are situated in distinct data centers. Even in such a scheme, a cluster’s work can be regarded as satisfactory as it performs SLA 99.9% of data accessibility.
Let’s consider the building of a minimal cluster. In the given example, we’ll use a network interface 1 GB/s between servers of the CEPH cluster. Clients are connected through the same network interface. The primary requirement, in this case, is to resolve the problems mentioned above which occur when the data storage scheme with the NFS server is implemented. For clients, the data will be provided as a file system.
In a scheme like the above, we have three physical servers with three hard drives, allotted to the CEPH cluster’s data. Hard drives are of the HDD type (not SSD), the volume is 6Tb, replication factor – x3. As a result, total data volume amounts to 18 Tb. Each of the CEPH cluster’s servers, in turn, is an entry point to the cluster for end clients. This allows us to “lose” (server down / server maintenance /…) one of the CEPH cluster’s servers per unit time in order to not harm final client’s data and ensure they are available.
In case of the given scheme, we solve the problem represented by NFS as a single entry point to our data storage, as well as accelerate the speed of data operations.
Let’s test the throughput of our cluster using an example of the file record (size – 500 Gb) in the CEPH cluster from a client server.
The graph shows that loading the file into the CEPH cluster takes a little over five hours. In addition to that, you should pay attention to the network interface downloading: it is loaded at 30% – 300Mbps, not 100% as you may have assumed. The reason for that is the limitation of the recording speed of HDD hard disks. You can achieve a higher record/read response times when building a CEPH cluster by using SSD drives, but the total cost of the cluster in this case is significantly increased.
The choice between NFS and CEPH depends on a project’s requirements, scale, and will also take into consideration future evolutions such as scalability requirements. We’ve worked on projects for which CEPH was the optimal choice, and on other where it was NFS. Broadly speaking, in the case of small clusters where data loads are modest, NFS can be a cheap, easy and perfectly suitable choice. For larger projects where heavier data loads will be processed and stored, the more sophisticated CEPH solution will most likely be recommended.