Digital businesses with multiple anonymous and registered user entry points need to find an efficient way for different systems and apps to have access to a unified database of users.
Large digital business ecosystems most often have multiple conversion points where users enter the ‘funnel’, including landing pages, websites, newsletters, apps, ecommerce sites and more. Some of these users are registered, while others are anonymous. Adding additional complexity, some users register directly via their email address, while others register through social media or Google accounts. Anonymous users have simply agreed to cookies.
User identification across various channels like mobile apps, content sites, shops and more is one of the central challenges for companies trying to obtain a complete picture of their customers.
A unified database of all users, that different systems and applications can access with a Single Sign On (SSO) solution, is key to efficiently extracting the most potential value from each and every user.
This case study details how the team extension provided to Valiton by K&C achieved that, working alongside their inhouse peers, using a combination of AWS-native tools and technologies and compatible open source tools and technologies.
Valiton – IT Services Provider To Major Digital Media & eCommerce Holding
Single Sign On (SSO) Unified User Database Built For The AWS Cloud
“Using Harbourmaster enables you to target your customers with an experience as unique as they are”.
Valiton was creating a new SSO service, Harbourmaster, to allow different systems and subsidiaries to access a single huge database of users. As well as storing the user data, each user needed to have an identifier. Subsidiaries and their systems all needed access to the same user but for different reasons.
Harbourmaster provides touchpoints with clients/provides. The system itself is administrative. “Events”, such as subscribing to a newsletter or special offers encourage users to sign up. Harbourmaster’s role is to then store and make those user databases conveniently available to a wider ecosystem, allowing for GDPR-compliant filtering.
A media subsidiary might want to push news notifications via mobile messaging or email. Another may need to send physical marketing materials by post and so need to pull name and address data. A third subsidiary or system may send surveys or promotional emails to users.
The same system needed to offer management insight into when and why user data requests were made by which systems and subsidiaries. And for GDPR, users themselves have to be able to edit their profiles and adjust permission settings, or delete their profile, all in one place. Processed data needed to be fetchable by identifiers and through specific filters.
The majority of data requests are machine-to-machine, requiring 100% availability of the central database service. This meant we needed to create a variety of services in different domains, aggregating them through an API documentation merge, with access to all client systems provided through a single API Gateway.
The result of this merging is a single page detailing API documentation of services, that a specific user might need and a cost efficient SSO system.
Having worked with Valiton on previous projects, providing them with specialist team extensions, the company turned to us here due to our significant experience in working with the combination of AWS, Kubernetes and Terraform – all technologies that would be part of realising this new SSO database solution.
We confirmed our agreement that the new service’s infrastructure should be deployed on AWS. The decision was taken on the basis of a cost analysis between major public cloud providers, specific AWS functionalities and technologies that would be leveraged and the fact that Valiton already deployed a range of services on AWS infrastructure.
The K&C team extension, together with Valiton’s inhouse team, achieved fully automated deployment of the new service. Once automated deployment was set up, the developers were able to apply their full focus to building the service’s business logic. The infrastructure is mirrored across the developer’s local machine and production environment.
That means every change made to the service is tested in exactly the same environment as it will run on in production.
Whenever a feature is added or bug fixed, the developer simply pushes changes to the remote repository where the automated CI/CD pipeline creates the updated build of the service, tests it, and deploys it to a testing environment. There the updated features or bug fixes are tested again. If everything is good, they are then updated in the production environment where test features or bug fixes might be further tested.
Fully automated scaling of services was achieved through Kubernetes, by allowing for service instances to be scaled up during peak demand before being scaled back to normal usage levels.
Infrastructure-as-Code (IaC) was decided upon as the approach to building Valiton’s Single Sign On (SSO) service in a way that would meet the key requirements and incorporate needed functionalities.
Hashicorp’s Terraform was selected as our IaC tool. Terraform’s strength is it allows for the creation of fully manageable module-based solutions. It is also cloud agnostic so perfectly compatible with our chosen AWS environment.
Terraform also comes with the additional tool kit of Terragrunt, which would allow us to avoid:
Terragrunt also allowed us to maintain:
Terragrunt meant we could create a few environment folders with different resources, available on demand.
We used Terraform/Terragrunt to control the following AWS resources:
We ran into some issues creating and manipulating Kubernetes infrastructure with the Terraform/Terragrunt combo, due to a buggy terraform_helm provider. It also didn’t allow us to use the latest version of Helm, which comes with new features we wanted to exploit.
Our K8s, which contained all of our applications, services, cron jobs etc., required an alternative solution and we opted for Helm Charts.
We created Helm Charts (templates that look like K8s resources, but can be changed at any time by Helm based on variables we send to it). The Helm tool allowed us to achieve the same result we had originally planned on using Terraform and Terragrunt in combination with Kubernetes for.
And the Helm solution ultimately resulted in better, faster performance.
As one of the most mature public cloud solutions (if not the most mature), AWS is always under consideration for any architecture that involves a public cloud resource. But there were three qualities represented by AWS that closed the case in relation to this particular project:
The most influential factor in the decision was the AWS infrastructure and SDK. Why?
Each service required a web-link in different environments, which meant a mess of links. Cloud formation allowed us to manage all the resources (including cache and path rules) using one tool.
Terraform modules and Terragrunt-style meant we could manage all the resources easily. For example, we had a service “link-manager”. When we wanted to have it in a feature environment, we had “link-manager.feature.domain.com”. In the case of staging, we had “link-manager.staging.domain.com”. Clean and simple.
Because we were going to manage all the resources on a single cloud platform, rather than sharing data in a multi cloud architecture, EKS was an ideal Kubernetes management service.
Also, in our case it wasn’t possible to divide requests to other services or databases for fetching data because as the data was highly dynamic and might change from one minute to the next. But we wanted to be able to analyse data snapshots and to do so we needed to send data from one place to another or store it elsewhere.
To solve this issue, we built a small application that then created and pushed a Docker image of the same service to our ECR repository on the same cloud. This used ECS with CloudWatch events (like a cron job) to run our container at the instructed time of day.
That left us with a major decision: either use Grafana or create a Lambda-like function for pulling data from the external resources to CloudWatch. We decided to use Grafana as it was automatically created by Istio.
In the past, we made standard use of EC2 instances for our services. But around six months ago AWS created a new service – Spot Instance. Spot Instance achieves what EC2 did but more cost effectively in scenarios where instances are used less frequently than anticipated.
Amazon MSK – a Kafka service recently added to the AWS took kit that offers the same services as alternatives by other vendors but with the additional advantages of:
Another strength of AWS as our cloud platform here is SQS DLQ (dead letter queue) – technique that is overlaid on a standard SQS by adding a few checkboxes to user interface. The technique makes it much easier to debug certain applications or isolate problems without the need for vanishing messages.
For instance, a production application gets stuck, messages come to SQS and then on to the jammed application. Eventually, the application crashes and the process has to start again, with the messages lost forever. SQS DLQ stores these messages for further analysis, which can often help in resolving the problem.
We wanted to have an event-driven architecture and made the decision to use Kafka to resolve the following complexities:
Kafka was that solution. And it was a nice surprise to discover that AWS provide their own solution for managing Kafka clusters.
We reached the conclusion that AWS’s Amazon MSK managed Kafka service was the best option for us. We would need to tweak some configurations to keep a wider retention period and default configurations are to 7 days. More information about the custom configuration can be found here:
Amazon MSK makes it easy to scale up disk size and to add new nodes to the cluster. The cluster described would be able to handle 4MB/s. Here is a small example to hint at our needs based on a Client user with 20m create/update changes per month. With a larger payload size of 1KB, a throughput of 8KB/s would be required. So, we have x512 more capacity.
In order to reach the CloudWatch/Grafana decision described above, we really had to dig deeply into researching both tools.
Another important takeaway from this project was the pricing structures of different public cloud providers for managing Kakfa. All the main providers offered similar features, but AWS’s pricing was far more attractive.