DevOps architecture for a cloud native app and infrastructure

A complete reference DevOps architecture example broken down tool by tool

DevOps is a software development culture, approach and set of tools designed to promote integration and collaboration between traditionally distinct development and operations work and teams. Improving efficiency through the automation of repetitive manual tasks in the integration and deployment (CI/CD pipelines) of software iterations is central to a DevOps approach.

A DevOps architecture incorporates the tools that enable that automation into the design of a software application and the infrastructure that sits between its non-production (development and testing) and production (operations) environments.

In this blog, we’ll seek to provide a better understanding of the specifics of a DevOps architecture by introducing the DevOps approach and detailing a particular example of an application’s architecture designed by K&C’s senior DevOps engineer. You will see how it answers to a defined set of requirements for a cloud-native web application, taking into consideration:

Application requirements

  • Cloud-native
  • Minimise vendor lock-in risk exposure as feasible

Cloud-native infrastructure

  • Usage of AWS-native components where available to reduce maintenance costs and complexity
  • Cloud agnostic infrastructure not required

Web Application Firewall (WAF)

  • Protect the web-based applications from common exploits, such as SQL injections (which can mitigated with SQL database documentation or a security tool like CrowdStrike which detects any malicious code and is installed on every device used by K&Cers), cross-site scripting and other security flaws
  • Audit logs
  • Distributing application traffic across multiple targets

Key management

  • Manage cryptographic keys and control audit logs to meet compliance need

Monitoring

  • Monitoring should run in separate environments independent of the application for high stability
  • Automated monitoring of all stages + deployment pipelines
  • Hierarchical organization of the following metrics:
    • performance
    • Business logic
    • Security
    • Development instrumented

Storage

  • Each development project free to decide on the optimal DBMS
  • From a DevOps perspective, as few different AWS-native DBMS as possible desirable

Development tools

  • Integrated and native development chain for CI/CD
  • GitOps Git as a single source of truth for declarative infrastructure) is an interesting trend. Analyse potential benefit within the context of the application

Security

  • Code Quality Analysis
  • Security checking of inhouse and open source code to conform to OWASP standards

Management Console

If AWS EKS is used instead of OpenShift, we should consider a simpler, more intuitive UI for Kubernetes management to optimize maintenance efforts.

Cloud Identity and Access Management (IAM)

  • Role concept
  • Separation of duties
  • Provisioning / de-provisioning
  • Auditing of grants and successful / failed accesses
  • Rotating master keys

But before presenting and breaking down an example of a DevOps architecture designed to meet these project requirements let’s briefly address why DevOps consultants, architects and teams are currently our most in-demand service as an IT services provider.

What is DevOps?

DevOps itself is a set of best practices established to unify traditionally separated development and operations teams responsible for a single application into one holistic team. As a modus operandi of organisational culture, DevOps breaks down the barriers between the development and testing environments and the production environment.

DevOps by Sprint Infographic

This escapes the traditional potential for friction between development and production when a production ‘ready’ iteration of a new feature is handed over from the development to operations team but encounters issues in production. It negates the potential for a ‘blame game’, and the wasted time and resources that are an inevitable consequence of a new feature not working smoothly in production, despite no issues appearing in the development and testing environment.

In a DevOps culture, there is no dev and ops, just DevOps. Key to this holistic approach is the automation of testing, deployment and then review, or monitoring in production. This automated process is the CI/CD pipeline, which means continuous integration and either continuous delivery or continuous deployment.

Well executed, a DevOps approach results in better quality software delivered in less time. That feeds through to reducing overall software development costs, improving business cases.

A DevOps CI/CD pipeline is achieved through the use of various tools, technologies and processes.

A DevOps architecture designed to meet the requirements of our cloud-native web application

Building a cloud-native web application on DevOps principles involves integrating the DevOps tools and technologies that give us a CI/CD pipeline. Since the requirement was also to build a DevOps architecture to run on AWS, this also influenced the choice of which tools and technologies to use.

Our two DevOps architecture proposals, one of which was based on just AWS EKS, and our recommended alternative using both AWS EKS and OpenShift blended AWS-native and Open Source DevOps tools and technologies.

The two DevOps architecture proposals were:

Reference DevOps Architecture 1 – AWS EKS

Reference DevOps Architecture for AWS EKS

Reference DevOps Architecture 2 – AWS EKS + OpenShift

Reference DevOps Architecture for AWS EKS and OpenShift

The difference between the two boiled down to the need to use Helm in the management of the Kubernetes cluster in the DevOps architecture proposal that didn’t use OpenShift and only Amazon EKS.

Let’s run through the role each component or tool which forms part of the DevOps architecture prototypes performs.

Amazon EKS

Amazon Elastic Kubernetes Service (Amazon EKS), is AWS’s native managed Kubernetes containers-as-a-service (CaaS). It simplifies the running of Kubernetes clusters on AWS by negating the necessity to install, operate and maintain a stand-alone Kubernetes control plane, which is instead managed by EKS.

Amazon EKS is able to detect and replace any abnormal control plane instances (nodes) that could become the catalyst of an issue, automatically restarting them as required across Availability Zones within the Region. EKS maintains high availability of Kubernetes clusters by taking advantage of the AWS Regions architecture to eliminate any single point of failure.

No longer vulnerable to the loss of one or more availability zones, the resulting AWS-managed Kubernetes cluster becomes far more resilient.

Unlike the OpenShift/EKS alternative, our EKS-based reference DevOps architecture collaborates with Heptio to integrate Kubernetes RBAC with IAM authentication.

Amazon EKS infographic

OpenShift Container Platform

OpenShift is a ‘family of containerization software’ built by RedHat and the OpenShift container platform is best described as a private platform-as-a-service (PaaS). OpenShift can be deployed and managed on an on-premise private cloud or bare metal server or overlaid onto the infrastructure of a cloud provider – in this case AWS.

The fully managed OpenShift service is deployed and operated on AWS, with the two companies collaborating to offer a combined service that is mutually supported, billed directly from AWS along with other AWS-native resources and tools a client makes use of.

OpenShift vs. AWS EKS for Kubernetes management and orchestration

Our recommended DevOps architecture uses OpenShift rather than a pure AWS-native EKS solution for a couple of core reasons:

AWS Elastic Kubernetes Services (EKS)

Pros

●      Native support by Amazon

●      Native KMS support

Cons

●      Additional management UI needed (like Rancher)

OpenShift

Pros

●      Simple out-of-the-box settings for security, networking allow for a quick start

Cons

●      No native implementation by AWS

●      Additional vendor (Red Hat) increases complexity

●      OpenShift Templates are less flexible as Helm charts

Flexibility

OpenShift allows for more flexibility on deployments, with K8s clusters deployable both on-premises or in AWS. That can be a major plus for organisations that still have data and/or integrations applications hosted on-premise in a multi-cloud or hybrid-cloud environment.

Learning curve

Both AWS EKS and OpenShift require Kubernetes expertise, but the OpenShift WebApp interface lessens the learning curve and makes it easier for DevOps teams to get up to speed.

Paying for OpenShift is more expensive than only using EKS but in the context of the app and DevOps architecture in question, we felt the efficiency and simplicity gains compensate for that. If the client organisation’s in-house team, which lack experience in Kubernetes set-up, orchestration and maintenance, were to encounter more problems with the pure EKS approach, that would almost certainly more than wipe out any surface-level savings.

Choice of environment – Hard (Kubernetes cluster) vs. Soft (named space)

Hard (cluster)

●      Adaption of IaC (Infrastructure as Code) for Prod and Pre-Prod

●      Testing of infrastructural changes in the development environment without affecting production

Soft (named space)

●      Less maintenance effort

●      Any IaC change affects all Kubernetes tenants (i.e. namespaces)

AWS KMS for key management

We recommended a hard environment bases on a Kubernetes cluster because it would allow for highly automated maintenance and separated clusters offer greater flexibility and a more robust infrastructure.

Our DevOps architecture uses AWS KMS for key management. KMS allows for the easy creation and management of cryptographic keys, controlling them across AWS services and within the application. Hardware security modules either already or in the process of being validated under FIPS 140-2 offer a high level of security and resilience and key usage logs are a plus, especially if an application must meet regulatory or compliance requirements.

Benefits include:

  • Controlled access through defined key usage permissions
  • Centralised key management
  • Encryption management allows sharing of encrypted resources between accounts and services, tracked by key usage logs that include AWS services using them on your behalf.

Monitoring and logging within a DevOps architecture: Prometheus, Grafana, Elasticsearch, Kibana and Jaeger

To answer to the requirements for monitoring and logging, we use a combination of Prometheus and Grafana for Kubernetes monitoring, Jaeger for distributed tracing, Kibana for logging and Grafana for broader application monitoring. All the tools are transferred from OpenShift into the management cluster.

Prometheus for Kubernetes monitoring

Open source containers and microservices event monitoring tool Prometheus scrapes numerical data based on time series by invoking the metric endpoints of monitored nodes. Prometheus collects and time stamps the metrics.

Prometheus is a free and open-source event monitoring tool for containers or microservices. Prometheus collects numerical data based on time series. The Prometheus server works on the principle of scraping. This invokes the metric endpoint of the various nodes that have been configured to monitor. These metrics are collected in regular timestamps and stored locally. The endpoint that was used to discard is exposed on the node.

Prometheus with Grafana

Grafana, a multi-platform data visualisation platform that charts or graphs the data source’s availability, and adds value above and beyond Prometheus’s browser expression. Grafana also offers out-of-the-box integration with Prometheus. Used to visualise metrics and logs in lots of different ways and a highly efficient way to search or live stream logs.

Elasticsearch and Kibana for logging in Kubernetes

Elasticsearch stores data in indices and acts as a distributed, scalable search engine for both full-text and structured search and with applications in analytics. In the context of a Kubernetes cluster, Elasticsearch is used to ingest logs.

Kibana’s role is in viewing the logs ingested into Elasticsearch. Kibana is part of the ELK Elastic Stack and best described as a “user interface that lets you visualise your Elasticsearch data and navigate the Elastic Stack. Do anything from tracking query load to understanding the way requests flow through your apps”.

Jaeger for distributed tracing

Jaeger is an Open Source distributed tracing system that includes components to store, visualise and filter traces. It implements the OpenTracing specification. Distributed tracing captures requests to build a picture of the full chain of calls from user requests to interactions between microservices. Jaeger also tracks how long requests took, the lifecycle of network calls such as HTTP and RPC and helps locate bottlenecks that affect performance.

In a Kubernetes environment, Jaeger enables distributed tracing for gRPC services.

Development tools in our DevOps architecture – AWS Container Registry (ECR), IaC Terraform, Helm, GitOps ArgoCD, AWS CodeCommit, AWS CodePipeline

A breakdown of the Dev tools used in our DevOps architecture:

AWS Elastic Container Registry – ECR

AWS-native ECR is a fully-managed Docker container registry that hosts Docker container images in a scalable and high availability architecture. Its role in a DevOps architecture is the reliable deployment of containers with resource-level control of individual repositories achieves through integration with AWS Identity and IAM.

IaC Terraform and Helm

Infrastructure-as-code (IaC) tool Terraform is used to manage the AWS infrastructure. Using Terraform, we can write the code for the infrastructure and maintain it in GIT. We can also maintain infrastructure states and roll back to the previous state if required. Terraform figures out how to achieve the desired infrastructure end-state specified by the code. It also supports immutable infrastructure.

Helm is the application package manager that will run on top of Kubernetes to describe and manage the application’s structure. It helps simplify microservices management through the provision of helm charts and simple management commands.

Argo CD

Argo CD follows the GitOps practice of using Git repositories as the single source of truth for the application’s state. Argo CD ensures application definitions, configurations and environments are declarative and version controlled and lifecycle management automated, auditable and easy to understand.

Within a Kubernetes cluster, Argo CD has implemented a controller for continuous monitoring of running applications, comparing the production state against the desired target state, as defined by the Git repo. If the production state is out of sync with the Git repo’s single source of truth, Argo CD reports and visualises discrepancies and facilitates a manual or automatic roll back to the target state.

Changes to the target state held in the Git can also be automatically pushed through to production.

AWS CodeCommit

CodeCommit is a key element to a CI/CD pipeline in an AWS environment. A fully-managed source control service, CodeCommit securely hosts Git repos, eliminating the need for the DevOps team to operate its own source control system, which can be a bottleneck when it comes to scaling infrastructure. Securely stores anything from source code to binaries and compatible with existing Git tools.

Security and code quality analysis with SonarQube

Code quality and security in our DevOps architecture rely on SonarQube, a tool with the motto of Continuous Inspection must become mainstream as Continuous Integration”.

SonarQube is an Open Source tool that provides automated code review, detecting errors, bugs, vulnerabilities and untidily implemented segments in source code across 27 different programming languages through Static Application Security Testing (SAST).

The tool can also be used to check compliance with organisational coding guidelines, as well as just general quality issues. Its contribution to security is to flag potential issues such as insecure coding approaches, outdated cryptographic libraries, consistency of debug output etc.

All of this is key to integrating security into DevOps, or DevSecOps, as the approach is sometimes termed. At K&C, our position is that all DevOps should be by default DevSecOps, negating the need for a unique term.

In a traditional Waterfall approach to software development, security is integrated into software as a final stage of development. The problem with retrospective security integration is that by that stage, there is a danger of fundamental software architectural issues not meeting rigorous security standards. At that point, it can often be too late to make the changes that would offer a robust level of security without redoing much of the development work.

If security is an integral part of the CI/CD DevOps pipeline, that weakness of post-facto integration is eliminated. SonarQube makes sure it is and new code is continuously checked for general quality and in the context of security considerations.

Rancher as our management console

In our DevOps architecture Rancher’s role is:

  • Kubernetes cluster operations and management
  • Workload management
  • Enterprise support

Rancher, which can deploy and manage K8s clusters on any infrastructure from datacentres to edge, is one of our favourite and integral DevOps tools.  Operational and security challenges are managed by Rancher through cluster management and provisioning. Rancher also offers a number of integrated tools that help DevOps teams to run containerised workloads.

Rancher also integrates with a GitHub repository to automate CI pipeline execution. As part of the CI/CD pipeline, Rancher:

  • Builds the application from code to image.
  • Validates builds.
  • Deploys build images to the cluster.
  • Runs unit tests.
  • Runs regression tests.

Cloud identity & access management – IAM with AWS IAM

Identity and access management (IAM) is another core security pillar. Since we’re using an AWS environment, the native AWS IAM is the obvious solution here. AWS IAM simply controls who is able to sign in to the cloud environment and sets which permissions to use AWS resources each signed in user has.

In a DevOps approach, IAM roles should allow for the team to be responsible for resources usage, permissions set to allow the team to react flexibly to incidents, reducing response times. A strong audit of IAM roles by an external third party is also recommended.

DevOps teams & consulting

If you could benefit from experienced and knowledgeable DevOps consultancy for your in-house projects, or need a flexible, scalable DevOps team to build or maintain your apps or Kubernetes clusters, K&C would love to help. Just drop us a line!

Featured blog posts