In this post of our DevOps consulting series, we follow on from our earlier comparative analysis of CEPH or NFS as alternative Kubernetes data storage solutions with a guide to Ceph-Ansible. We’ll examine how Ceph-Ansible can be used for quick, error-free Ceph Storage Cluster deployment within a DevOps architecture. We’ll also look at particular use cases, the strengths and drawbacks of these tools in comparison to alternatives and conclude with a detailed step-by-step guide to installing Ceph Clusters using Ceph-Ansible for a Kubernetes environment.
What Is Ceph Data Storage And How And When Is It Used?
The exponential rate of growth in the data storage needs of modern organisations has resulted in the need for effective Big Data storage solutions. The Ceph storage tool has stepped up to meet that challenge.
Ceph is an open-source software project initiated by Red Hat. It is used to enable scalable object, block and file-based storage under a single system. Ceph Storage Clusters are paired with the CRUSH (Controlled Replication Under Scalable Hashing) algorithm to run commodity hardware. CRUSH manages the optimal distribution of data across a Ceph cluster while also freely retrieving data.
Ceph Storage Clusters contain a large amount of data. The use of subclusters beneath a cluster breaks that data up into more manageable and relevant chunks. In order for the data to be organized according to the appropriate lineage, subclusters must be properly configured as part of a ‘parent’ cluster. CRUSH’s role, as a scalable hashing algorithm, nicely divides a large data set into the appropriate clusters and subcluster, allowing for optimized retrieval. So Ceph’s role in big data storage is one that combines both storage optimisation with simple data access and retrieval.
A Ceph-Ansible use case – hybrid cloud infrastructure using Kubernetes
Modern hybrid software architectures that combine bare metal with cloud solutions (eg. AWS, Google Cloud) and use a containerisation tool like Docker and orchestrators like Docker Swarm, Kubernetes and Rancher – often encounter a problem:
How and where to store data applications in a way that they are accessible from anywhere in the infrastructure, regardless of the location of the Docker container with the application?
Ceph helps resolve this problem through a distributed storage system with high availability and scalability. For example, in Kubernetes-based architectures, Ceph has a provisioner for K8s PersistentVolume- CephFS and K8s PersistentVolumeClaims – RBD (Ceph Block Device). Ceph is also often used in big data processing and storage solutions because of its particularly strong horizontal scalability qualities.
Ceph-Ansible, an automation engine for provisioning and configuration management is also from Red Hat. It is widely considered to be the most flexible way to install and manage a significant Ceph Storage Cluster. Some engineers shy away from Ceph-Ansible as it isn’t necessarily the easiest solution to install and manage Ceph storage. But it also isn’t overly difficult with the right know-how. And the production-grade clusters that result mean the extra effort is often more than worthwhile.
Weaknesses of Ceph Storage
The downside to architecture solutions based on Ceph is that it leads to relatively high redundancy rates for servers and/or virtual machines. So while an effective big data and Kubernetes storage solution, it is not a cheap one.
Ceph Storage Clusters should also not be used for critical data as they do not offer high levels of security.
Deploying Ceph in Kubernetes using Ceph-Ansible
Let’s see how to deploy Ceph using Ceph-Ansible for future use in Kubernetes as block devices (PersistentVolumeClaims – RBD).
For our test bench we will use:
1x virtual server with ansible | |
192.168.1.2 | for external traffic |
10.0.1.4 | for internal traffic |
3x servers for ceph + on each server 3 free HDDs for OSD | |
192.168.2.1 192.168.2.2 192.168.2.3 | for external traffic |
10.0.1.1 10.0.1.2 10.0.1.3 | for internal traffic |
Grafana and Ceph Dashboard for visualization of the Ceph Storage Cluster will also be installed on one of the servers.
For our 4 servers, the internal network 10.0.1.0/24 is configured, which we will use for internal Ceph traffic
Step-by-Step instructions to preparing a server with Ansible
It is necessary to generate ssh keys and deploy them to all servers.
Download the repository:
git clone https://github.com/ceph/ceph-ansible
Now switch to the version we need in accordance with the following structure. It is also worth noting that for different versions there are also different requirements for the ansible version.
The stable- * brunches have been checked by QE and rarely receive corrections during their life cycle:
Brunch | Version |
stable-3.0 | support for Ceph versions of jewel and luminous. Ansible version 2.4 is required |
stable-3.1 | support for Ceph versions of luminous and mimic. Ansible version 2.4 is required |
stable-3.2 | support for Ceph versions of luminous and mimic. Ansible version 2.6 is required |
stable-4.0 | support for Ceph version of nautilus. Ansible version 2.8 is required |
We will use the nautilus version in the example
git checkout stable-4.0
Install all the necessary dependencies
pip install -r requirements.txt
Rename example config files:
cp site.yml.sample site.yml cp group_vars/all.yml.sample group_vars/all.yml cp group_vars/mons.yml.sample group_vars/mons.yml cp group_vars/osds.yml.sample group_vars/osds.yml
Create an inventory file with a description of all our servers
[mons] 192.168.2.1 192.168.2.2 192.168.2.3 [osds] 192.168.2.1 192.168.2.2 192.168.2.3 [mgrs] 192.168.2.1 [grafana-server] 192.168.2.1
We bring the main site.yml file to the following form:
--- # Defines deployment design and assigns role to server groups - hosts: - mons - osds gather_facts: false any_errors_fatal: true become: true tags: always vars: delegate_facts_host: True pre_tasks: # If we can't get python2 installed before any module is used we will fail # so just try what we can to get it installed - import_tasks: raw_install_python.yml - name: gather facts setup: when: not delegate_facts_host | bool - name: gather and delegate facts setup: delegate_to: "{{ item }}" delegate_facts: True with_items: "{{ groups['all'] }}" run_once: true when: delegate_facts_host | bool - name: install required packages for fedora > 23 raw: sudo dnf -y install python2-dnf libselinux-python ntp register: result when: - ansible_distribution == 'Fedora' - ansible_distribution_major_version|int >= 23 until: result is succeeded - name: check if it is atomic host stat: path: /run/ostree-booted register: stat_ostree tags: always - name: set_fact is_atomic set_fact: is_atomic: '{{ stat_ostree.stat.exists }}' tags: always tasks: - import_role: name: ceph-defaults - import_role: name: ceph-facts - import_role: name: ceph-validate - import_role: name: ceph-infra - hosts: mons gather_facts: false become: True any_errors_fatal: true pre_tasks: - name: set ceph monitor install 'In Progress' run_once: true set_stats: data: installer_phase_ceph_mon: status: "In Progress" start: "{{ lookup('pipe', 'date +%Y%m%d%H%M%SZ') }}" tasks: - import_role: name: ceph-defaults tags: ['ceph_update_config'] - import_role: name: ceph-facts tags: ['ceph_update_config'] - import_role: name: ceph-handler - import_role: name: ceph-common - import_role: name: ceph-config tags: ['ceph_update_config'] - import_role: name: ceph-mon - import_role: name: ceph-mgr when: groups.get(mgr_group_name, []) | length == 0 post_tasks: - name: set ceph monitor install 'Complete' run_once: true set_stats: data: installer_phase_ceph_mon: status: "Complete" end: "{{ lookup('pipe', 'date +%Y%m%d%H%M%SZ') }}" - hosts: mgrs gather_facts: false become: True any_errors_fatal: true pre_tasks: - name: set ceph manager install 'In Progress' run_once: true set_stats: data: installer_phase_ceph_mgr: status: "In Progress" start: "{{ lookup('pipe', 'date +%Y%m%d%H%M%SZ') }}" tasks: - import_role: name: ceph-defaults tags: ['ceph_update_config'] - import_role: name: ceph-facts tags: ['ceph_update_config'] - import_role: name: ceph-handler - import_role: name: ceph-common - import_role: name: ceph-config tags: ['ceph_update_config'] - import_role: name: ceph-mgr post_tasks: - name: set ceph manager install 'Complete' run_once: true set_stats: data: installer_phase_ceph_mgr: status: "Complete" end: "{{ lookup('pipe', 'date +%Y%m%d%H%M%SZ') }}" - hosts: osds gather_facts: false become: True any_errors_fatal: true pre_tasks: - name: set ceph osd install 'In Progress' run_once: true set_stats: data: installer_phase_ceph_osd: status: "In Progress" start: "{{ lookup('pipe', 'date +%Y%m%d%H%M%SZ') }}" tasks: - import_role: name: ceph-defaults tags: ['ceph_update_config'] - import_role: name: ceph-facts tags: ['ceph_update_config'] - import_role: name: ceph-handler - import_role: name: ceph-common - import_role: name: ceph-config tags: ['ceph_update_config'] - import_role: name: ceph-osd post_tasks: - name: set ceph osd install 'Complete' run_once: true set_stats: data: installer_phase_ceph_osd: status: "Complete" end: "{{ lookup('pipe', 'date +%Y%m%d%H%M%SZ') }}" - hosts: mons gather_facts: false become: True any_errors_fatal: true tasks: - import_role: name: ceph-defaults - name: get ceph status from the first monitor command: ceph --cluster {{ cluster }} -s register: ceph_status changed_when: false delegate_to: "{{ groups[mon_group_name][0] }}" run_once: true - name: "show ceph status for cluster {{ cluster }}" debug: msg: "{{ ceph_status.stdout_lines }}" delegate_to: "{{ groups[mon_group_name][0] }}" run_once: true when: not ceph_status.failed - import_playbook: infrastructure-playbooks/dashboard.yml when: - dashboard_enabled | bool - groups.get(grafana_server_group_name, []) | length > 0 - ansible_os_family in ['RedHat', 'Suse']
We edit the group_vars/all.yml file in it we set such important parameters as the version of the future cluster, specify the internal subnet, interfaces, log size and much more. For our example, the configuration of variables will be as follows:
ceph_origin: repository ceph_repository: community ceph_stable_release: nautilus monitor_interface: eth0 journal_size: 5120 #public_network: 0.0.0.0/0 - leave commented cluster_network: 10.0.1.0/24 #specify the network for internal traffic
Next, edit the file of variables responsible for configuring OSD group_vars/osds.yml
You can perform a fully automatic search and installation of OSD on the server by specifying a variable:
osd_auto_discovery: true
However, we explicitly specify the drives of our servers:
devices: - /dev/sdb - /dev/sdc - /dev/sdd
Preparation is completed and you now you are ready to run ansible-playbook
ansible-playbook site.yml-i inventory_hosts
The approximate deployment time is 10 minutes. The result of successful execution in the console will be the following output:
INSTALLER STATUS **************************************************************** Install Ceph Monitor : Complete (0:02:48) Install Ceph Manager : Complete (0:01:14) Install Ceph OSD : Complete (0:01:29) Install Ceph Dashboard : Complete (0:00:38) Install Ceph Grafana : Complete (0:01:08) Install Ceph Node Exporter : Complete (0:01:31) Thursday 22 August 2019 10:20:20 +0200 (0:00:00.064) 0:09:50.539 ********* ================================================================================= ceph-common : install redhat ceph packages ------------------------------- 79.72s ceph-container-engine : install container package ------------------------ 37.51s ceph-mgr : install ceph-mgr packages on RedHat or SUSE ------------------- 35.18s ceph-osd : use ceph-volume lvm batch to create bluestore osds ------------ 29.40s ceph-grafana : wait for grafana to start --------------------------------- 19.38s ceph-config : generate ceph configuration file: ceph.conf ---------------- 12.03s ceph-grafana : install ceph-grafana-dashboards package on RedHat or SUSE -- 9.11s ceph-common : install centos dependencies --------------------------------- 8.23s ceph-validate : validate provided configuration --------------------------- 7.23s ceph-mgr : wait for all mgr to be up -------------------------------------- 6.72s ceph-mon : fetch ceph initial keys ---------------------------------------- 5.33s ceph-dashboard : set or update dashboard admin username and password ------ 5.25s ceph-facts : set_fact fsid from ceph_current_status ----------------------- 4.30s check for python ---------------------------------------------------------- 4.02s ceph-mon : waiting for the monitor(s) to form the quorum... --------------- 3.94s ceph-facts : create a local fetch directory if it does not exist ---------- 3.75s ceph-facts : set_fact devices generate device list when osd_auto_discovery- 3.41s gather and delegate facts ------------------------------------------------- 3.37s ceph-osd : apply operating system tuning ---------------------------------- 2.91s ceph-container-engine : start container service --------------------------- 2.89s
We have successfully installed the following components:
Install CephMonitor Install CephManager Install CephOSD Install CephDashboard Install Ceph Grafana Install CephNode Exporter
To access the dashboard, use the link:
Dashboard web UI
http://192.168.2.1:8443/
Default login and password – admin / admin
We have successfully completed the basic setup of a Ceph distributed storage system. If the Storage Cluster needs to be subsequently expanded, just add a new server to the inventory and you’re good to go!