Kubernetes operators – what are they, why are they important and do you know how to build one?

Why and how to build your own Kubernetes operator

In this post of our Kubernetes consulting series, our focus is Kubernetes operators.

A Kubernetes operator is not an Ops specialist or team, but an automated operators framework that can be used on Kubernetes (or OpenShift). The operators framework was introduced by CoreOS, now part of RedHat and (as of 2019) IBM, in 2016. A great way to manage complex Kubernetes operations, the operators framework is quickly becoming a popular, and core, element of cloud-native, DevOps architectures.

[totb title=”K&C – Creating Beautiful Technology Solutions For 20+ Years . Can We Be Your Competitive Edge?” subtitle=”Drop us a line to discuss your needs or next project” buttonlink=” https://kruschecompany.com/contacts/ ” buttonlabel=”Get in Touch!”][/totb]

In this article, we’ll explain the role operators perform in scalable Kubernetes-based application architecture. And why they have become such a key part of DevOps automation. Much of the text in this section is inspired by IBM’s great explainer video on Kubernetes operators, which you can also watch as a general introduction to the concepts:

We’ll then guide you through a step-by-step tutorial to building an operator using Operator SDK and Ansible.

What is the Kubernetes control loop?

As a core part of the operators framework, it makes sense to first explain what the Kubernetes control loop is. The K8s control loop observes the state of what’s in a cluster. The next thing Kubernetes does is compare the actual state of a cluster with its desired state. That comparative process is called a diff.

The final phase of a Kubernetes control loop is resolving any diff by acting on it. This phase of the control loop is, predictably, termed the act phase.

The Kubernetes Control Loop

The control loop is core to how Kubernetes works. And there’s a controller that acts on that loop for every default resource that Kubernetes comes with.

Deploying a Kubernetes application without operators

As an end user, the first step is to write up some YAML – the spec for the application. For example, we’re doing a deployment, which involves defining some configurations. eg.

  1. What’s the image?
  2. Replicas
  3. assorted other configurations

We now have a Kubernetes resource to be deployed into the cluster. The control loop now kicks in and checks the difference between what is wanted in the cluster and its actual state. Kubernetes will notice, for example, there are no pods. It will then act on that diff and create pods.

A more complex application may have more than one YAML. There could be a second for the backend. That will also be deployed into a cluster, with a pod subsequently deployed using the controllers and control loop.

Now, if we want to scale up the application, make some changes, add secrets and environment variables etc., we’ll have to either set up new Kubernetes resources each and every time, or go back and edit the existing ones. That can start to become complex and time consuming.

Deploying a Kubernetes application with operators

How would we deploy the same application using an operator? The first step is installing the operator itself. That, of course, means you’ll need an operator that will do the job you want it to. You can either have someone custom build a Kubernetes operator for you, do it yourself if you have the relevant technical background, or find a suitable option among the growing library available on OperatorHub.

The first thing now needed in the Kubernetes cluster is the Operator Lifecycle Manager (OLM). The OLM manages the operators that have been installed. Next, the operator is deployed into the cluster.

The Operator

A Kubernetes operator is made up of two major components:

  1. CRD
  2. Controller

The CRD (custom resource definition) is something which, as opposed to default Kubernetes resources like deployments and pods, is something defined in Kubernetes by either the user or operator, so that YAML is created to work against the custom configuration.

The controller is a custom Kubernetes control loop, which runs as a pod in the cluster, and the control loop against the CRD.

If the operator has been created for the same custom application deployment as per the first example of deploying a K8s application without an operator, what differs? Instead of having to write up multiple deployments, config maps, secrets etc., just one YAML will need to be deployed.

Custom configurations could be assigned to the operator, or use the set defaults. The operator is then deployed directly into the cluster. The operator then takes over and is responsible for running the Kubernetes control loop and figuring out exactly what needs to be running.

The operator would, for example, realise a couple of deployments and pods are needed if our application is the same as that deployed without Kubernetes operators in the first example.

Why use Kubernetes operators?

Kubernetes operators are an approach to managing complex applications that is inherently more scalable (and just easier) than deploying Kubernetes clusters without an operator. The end-user only has to worry about the config that’s been exposed to them. The operator manages the control loop and the state of the application – how it needs to look.

Operators can be used for automating a large variety of processes. For example, after creating and deploying a config file in a few seconds, we can get a configured environment or service like Cache service (Redis, Memchace), Proxy (NGNX, HAProxy), Databases (Mysql), etc.

Popular Kubernetes operators

Check out this great Twitter thread on recommended K8s operators:

Building your own operators

There are now many fantastic operators out there, like the Prometheus Operator for Kubernetes native deployment and management of Prometheus and related monitoring components. You can check out our blog post dedicated to the step-by-step set-up of the Prometheus Operator for Kubernetes monitoring.

Before considering custom creating an operator, it always makes sense to look carefully at what is already out there. As already mentioned, OperatorHub is a great resource where the Kubernetes community shares operators.

But what if we want to develop a custom operator for something native to a specific application architecture? There’s a number of ways to do that.

Operator SDK

Operator SDK allows us to start building out operators ourselves. The easiest way to get started with an operator is to use the Helm operator.

The Helm approach

The Helm approach allows us to take a Helm chart and apply that towards an operator. This gets us close to a pretty mature operator for a pre-existing chart. If you aren’t particularly familiar with Helm, you might be interested in our blog post covering the fundamentals of how Helm contributes to a Kubernetes architecture.

Levels of Operator maturity

Operator maturity is broken down into five levels:

  1. Basic install – allows for the provisioning of the resources required.
  2. Upgrades – supports minor and patched upgrades to whatever is defined in the operator.
  3. Full lifecycle support – storage lifecycle, app lifecycle, backup and failure recovery.
  4. Insights – deep metrics, analysis, logging etc.
  5. Autopilot – automated horizontal and vertical scaling, config. tuning, acting on diffs.

Operator capability level (source)

Helm itself hits the first two levels of maturity.

For operators that meet maturity levels 3-5, Go and Ansible are the most popular relevant technologies. Operator SDK allows us to build operators using Helm, Go, Ansible and other technologies. You might also be interested in our Ansible tutorial, which will take you through the step-by-step process of installing and setting up this incredibly useful open-source RedHat tool.

A step-by-step guide to building a Kubernetes operator with Operator SDK and Ansible

The example operator we are going to create here using Operator SDK and Ansible will perform two core functions:

  • Create a namespace and apply Limit Ranges and Resource Quota to it
  • Create a deployment Nginx in a namespace

We are not going to focus on the installation process here. You can find a step-by-step guide on how to install Kubernetes Operator SDK at GitHub. Here, we’ll focus on the process of writing an operator.

Create a new project

We first need to create a new project, which we’ll do via the CLI (command-line interface):

operator-sdk new my-first-operator --api-version=krusche.io/v1alpha1 --kind=/v1alpha1 --kind=ResourcesAndLimits --type=ansible
cd my-first-operator

This command will create a project with the operator that will subscribe to the resource ResourcesAndLimits with APIVersion krusche.io/v1alpha1 and Kind ResourcesAndLimits.

The directory should be structured in the following way:

Directory/File  –  Goal

build/ – Contains scrips using which operator-SDK will be assembled and initialized

deploy/  – Contains a set of Kubernetes manifests using which operator will be deployed in the cluster

roles/  – сontains Ansible roles

watches.yaml  – Contains Group, Version, Kind, and method of launch Ansible

The file watches contains:

  • group: The group in Custom Resource to which our operator subscribes.
  • version: The version of Custom Resource to which our operator subscribes.
  • kind: The type of Custom Resource to which our operator subscribes.
  • role (default): The path to our Ansible roles.
  • playbook: The path to Ansible playbook. It is required for the case if we will use a playbook instead of a role.
  • vars: Described in the form of key-value. Will be passed as extra_vars
  • reconcilePeriod (optional): The negotiation interval that defines how often the role will run for this CR.
  • manageStatus (optional): If the value is set to true (default), the operator will manage the state of CR. If the value is set to false, then the status of CR is managed elsewhere with the help of the specified role/playbook of a separate controller.

Here is an example of the file Watches:

---
- version: v1alpha1
  group: foo.example.com
  kind: Foo
  role: /opt/ansible/roles/Foo
 
- version: v1alpha1
  group: bar.example.com
  kind: Bar
  playbook: /opt/ansible/playbook.yml
 
- version: v1alpha1
  group: baz.example.com
  kind: Baz
  playbook: /opt/ansible/baz.yml
  reconcilePeriod: 0
  manageStatus: false
  vars:
    foo: bar

Preparing and Installing an Operator in a Kubernetes Cluster

Since our operator will create namespaces, it needs the rights on a cluster, not only on a namespace.

In the file deploy/role:

    • Change Kind: Role to Kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
#kind: Role
kind: ClusterRole
metadata:
  creationTimestamp: null
  name: my-first-operator

Also, you need to add namespaces, resourcequotas, limitranges to resources and apiGroups: «» to rules

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  creationTimestamp: null
  name: my-first-operator
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - services
  - services/finalizers
  - endpoints
  - persistentvolumeclaims
  - events
  - configmaps
  - secrets
  - namespaces
  - resourcequotas
  - limitranges

In the file deploy/role_binding.yaml, you need to apply the following changes:

  • RoleBinding for ClusterRoleBinding
  • Role for ClusterRole in the roleRef section
  • Specify the namespace in which the operator will be expanded
#kind: RoleBinding
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: my-first-operator
subjects:
- kind: ServiceAccount
  name: my-first-operator
  namespace: default
roleRef:
  #kind: Role
  kind: ClusterRole
  name: my-first-operator
  apiGroup: rbac.authorization.k8s.io

In the file deploy/operator.yaml, you need to apply the following change:

  • WATCH_NAMESPACE=””
env:
  - name: WATCH_NAMESPACE
    value: ""
    #valueFrom:
    #  fieldRef:
    #    fieldPath: metadata.namespace

In the roles/resourcesandlimits/tasks/main.yml we change everything for:

---
# tasks file for resourcesandlimits
- name: Create namespace
  k8s:
    definition:
      kind: Namespace
      apiVersion: v1
      metadata:
        name: '{{ meta.name }}'
  ignore_errors: true
 
 
- name: Create Resource Quota
  k8s:
    definition: 
      kind: ResourceQuota
      apiVersion: v1
      metadata: 
        name: '{{ meta.name }}-resourcequota'
        namespace: '{{ meta.name }}'
      spec:
        hard:
          limits.cpu: "{{ limits_cpu }}"
          limits.memory: "{{ limits_memory }}"
          requests.cpu: "{{ requests_cpu }}"
          requests.memory: "{{ requests_memory }}"
          requests.storage: "{{ requests_storage }}"
          pods: "{{ limit_pods }}"
          services: "{{ limit_services }}"
          services.loadbalancers: 0
          services.nodeports: 0
          replicationcontrollers: 0
 
- name: Create Limit Ranges
  k8s:
    definition: 
      kind: LimitRange
      apiVersion: v1
      metadata: 
        name: '{{ meta.name }}-limitrange'
        namespace: '{{ meta.name }}'
      spec:
        limits:
        - type: Pod
          maxLimitRequestRatio:
            cpu: "{{ max_limit_request_ratio_cpu }}"
            memory: "{{ max_limit_request_ratio_memmory }}"
        - type: PersistentVolumeClaim
          max:
            storage: "{{ max_storage }}"
          min:
            storage: "{{ min_storage }}"

In the file deploy/crds/krusche.io_v1alpha1_resourcesandlimits_cr.yaml we also change everything for:

apiVersion: krusche.io/v1alpha1
kind: ResourcesAndLimits
metadata:
  name: developers-team-a
spec:
  limitsCpu: 5
  limitsMemory: 5Gi
  requestsCpu: 5
  requestsMemory: 5Gi
  requestsStorage: 204Gi
  limitPods: 10
  limitServices: 10
  maxLimitRequestRatioCpu: 2
  maxLimitRequestRatioMemmory: 2
  maxStorage: 100Gi
  minStorage: 20Gi

Keep in mind:

Variables from CR:

  • In the metadata section: name, namespace — are transferred to ansible as «{{ meta.name }}» , «{{ meta.namespace }}»
  • In the spec section: somevar — without capital letters, just as it is, is transferred to ansible as «{{ somevar }}» someVar — with capital letters is transferred to ansible as «{{ some_var }}»

Now, let’s deploy our CRD:

kubectl create -f deploy/crds/krusche.io_resourcesandlimits_crd.yaml

Assemble the operator and write the following in the registry:

operator-sdk build example/my-first-operator:0.0.1
docker push example/my-first-operator:0.0.1

The next step is to change the generated fields docker image and imagePullPolicy to deploy/operator.yaml:

sed -i 's|{{ REPLACE_IMAGE }}|example/my-first-operator:0.0.1|g' deploy/operator.yaml
sed -i 's|{{ pull_policy\|default('\''Always'\'') }}|Always|g' deploy/operator.yaml

For macOS:

sed -i "" 's|{{ REPLACE_IMAGE }}|example/my-first-operator:0.0.1|g' deploy/operator.yaml
sed -i "" 's|{{ pull_policy\|default('\''Always'\'') }}|Always|g' deploy/operator.yaml

Then, we need to deploy it as follows:

kubectl create -f deploy/service_account.yaml
kubectl create -f deploy/role.yaml
kubectl create -f deploy/role_binding.yaml
kubectl create -f deploy/operator.yaml

Deploying controllers

After the successful launch of the operator, we need to deploy our CR to create namespace:

kubectl apply -f deploy/crds/krusche.io_v1alpha1_resourcesandlimits_cr.yaml

Let’s take a look at our CR:

kubectl describe resourcesandlimits.krusche.io developers-team-a
Name:         developers-team-a
Namespace:    default
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"krusche.io/v1alpha1","kind":"ResourcesAndLimits","metadata":{"annotations":{},"name":"developers-team-a","namespace":"de...
API Version:  krusche.io/v1alpha1
Kind:         ResourcesAndLimits
Metadata:
  Creation Timestamp:  2019-12-17T11:53:34Z
  Generation:          1
  Resource Version:    20309
  Self Link:           /apis/krusche.io/v1alpha1/namespaces/default/resourcesandlimits/developers-team-a
  UID:                 0db952fb-2bed-4511-9309-9f9fbe11af66
Spec:
  Limit Pods:                       10
  Limit Services:                   10
  Limits Cpu:                       5
  Limits Memory:                    5Gi
  Max Limit Request Ratio Cpu:      2
  Max Limit Request Ratio Memmory:  2
  Max Storage:                      100Gi
  Min Storage:                      20Gi
  Requests Cpu:                     5
  Requests Memory:                  5Gi
  Requests Storage:                 204Gi
Status:
  Conditions:
    Ansible Result:
      Changed:             2
      Completion:          2019-12-17T11:55:50.36218
      Failures:            0
      Ok:                  4
      Skipped:             0
    Last Transition Time:  2019-12-17T11:53:34Z
    Message:               Awaiting next reconciliation
    Reason:                Successful
    Status:                True
    Type:                  Running
Events:                    <none>

Our operator was supposed to create namespace developers-team-a:

kubectl get namespaces

We see that our namespace has appeared:

NAME                STATUS   AGE
default             Active   3h16m
developers-team-a   Active   5m46s
kube-node-lease     Active   3h16m
kube-public         Active   3h16m
kube-system         Active   3h16m

Next, we need to check the settings of the namespace:

kubectl describe namespaces developers-team-a

We can see that our settings were applied:

Name:         developers-team-a
Labels:       <none>
Annotations:  cattle.io/status:
                {"Conditions":[{"Type":"ResourceQuotaInit","Status":"True","Message":"","LastUpdateTime":"2019-12-17T11:53:43Z"},{"Type":"InitialRolesPopu...
              lifecycle.cattle.io/create.namespace-auth: true
              operator-sdk/primary-resource: default/developers-team-a
              operator-sdk/primary-resource-type: ResourcesAndLimits.krusche.io
Status:       Active
 
Resource Quotas
 Name:                   developers-team-a-resourcequota
 Resource                Used  Hard
 --------                ---   ---
 limits.cpu              0     5
 limits.memory           0     5Gi
 pods                    0     10
 replicationcontrollers  0     0
 requests.cpu            0     5
 requests.memory         0     5Gi
 requests.storage        0     204Gi
 services                0     10
 services.loadbalancers  0     0
 services.nodeports      0     0
 
Resource Limits
 Type                   Resource  Min   Max    Default Request  Default Limit  Max Limit/Request Ratio
 ----                   --------  ---   ---    ---------------  -------------  -----------------------
 Pod                    cpu       -     -      -                -              2
 Pod                    memory    -     -      -                -              2
 PersistentVolumeClaim  storage   20Gi  100Gi  -                - 

Now, let’s prepare one more controller for adding deployment Nginx to any namespace.

Create deploy / crds / nginx_cr.yaml:

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: nginx.krusche.io
spec:
  group: krusche.io
  names:
    kind: Nginx
    listKind: NginxList
    plural: nginx
    singular: nginx
  scope: Namespaced
  subresources:
    status: {}
  validation:
    openAPIV3Schema:
      type: object
      x-kubernetes-preserve-unknown-fields: true
  versions:
  - name: v1alpha1
    served: true
    storage: true

And deploy/crds/nginx-1.17.6.yaml:

apiVersion: krusche.io/v1alpha1
kind: Nginx
metadata:
  name: nginx-1-17-6
  namespace: nginx-namespace
spec:
  size: 5
  version: 1.17.6

Add the following lines to the watches.yml file:

- version: v1alpha1
  group: krusche.io
  kind: Nginx
  role: /opt/ansible/roles/nginx

Copy the resourcesandlimits directory and rename it to Nginx:

cp -R roles/resourcesandlimits roles/nginx

Change the entire contents of the roles/nginx/tasks/main.yml file for:

---
- name: start nginx
  k8s:
    definition:
      kind: Deployment
      apiVersion: apps/v1
      metadata:
        name: '{{ meta.name }}-nginx'
        namespace: '{{ meta.namespace }}'
      spec:
        replicas: "{{size}}"
        selector:
          matchLabels:
            app: nginx
            version: "nginx-{{version}}"
        template:
          metadata:
            labels:
              app: nginx
              version: "nginx-{{version}}"
          spec:
            containers:
            - name: nginx
              image: "nginx:{{version}}"
              ports:
                - containerPort: 80

Next, we need to rebuild the operator with the new version of the docker image:

operator-sdk build example/my-first-operator:0.0.2
docker push example/my-first-operator:0.0.2

Change docker image in deploy/operator.yaml:

sed -i 's|example/my-first-operator:0.0.1|example/my-first-operator:0.0.2|g' deploy/operator.yaml

For macOS:

sed -i "" 's|example/my-first-operator:0.0.1|example/my-first-operator:0.0.2|g' deploy/operator.yaml

Add CR:

kubectl apply -f deploy/crds/nginx_cr.yaml

Update operator:

kubectl apply -f deploy/operator.yaml

Create namespace and deploy our CR:

kubectl create namespace nginx-namespace
kubectl apply -f deploy/crds/nginx-1.17.6.yaml

Now, we can check everything:

kubectl -n nginx-namespace get pod
NAME                                     READY   STATUS    RESTARTS   AGE
kc-nginx-1-17-6-nginx-859d7dcf99-ddbzm   1/1     Running   0          24s
kc-nginx-1-17-6-nginx-859d7dcf99-j884q   1/1     Running   0          24s
kc-nginx-1-17-6-nginx-859d7dcf99-mvj5s   1/1     Running   0          24s
kc-nginx-1-17-6-nginx-859d7dcf99-wpcfj   1/1     Running   0          24s
kc-nginx-1-17-6-nginx-859d7dcf99-x7szg   1/1     Running   0          53s

That’s it! You’ve built your own Kubernetes operator!

Kubernetes Operators – The Bottom Line

The operator we created in this guide subscribes to K8s resources through the Kubernetes api and automates their management, without the need for manual intervention. Ultimately, Kubernetes operators help you automate routine manual work. That’s usually always beneficial when possible

What tech stack has been used? All we needed to build our operator were the operator SDK framework and Ansible.

Featured blog posts