Our Open Source MongoDB Sharding Solution For A Scalable, High Availability IoT Database Design
In this case study we’ll look at MongoDB sharding, “a database architecture that partitions data by key ranges and distributes the data among two or more database instances. Sharding enables horizontal scaling”.
More specifically, we’ll show step-by-step how our nearshored DevOps team configured an open source MongoDB sharding solution for a scalable, high availability distributed database as an alternative to an expensive enterprise license. Spoiler – the net result for our client was a minimum MongoDB enterprise license saving of €100k a year. Every year.
As well as walking you through the use case context (our client SolarLog is a leading PV energy management system) of our decision to use open source MongoDB code, we’ll detail the configuration process step-by-step.
K&C - Creating Beautiful Technology Solutions For 20+ Years . Can We Be Your Competitive Edge?
Drop us a line to discuss your needs or next project.
Contact Us HERE
MongoDB – In Pursuit of The Holy Grail of an Automated, Self-Healing & Scalable Database Solution
How effectively companies and organisations leverage data is, as much as anything else, the foundation of their sustainable success and competitiveness in the contemporary world. And that comes down to three things:
Data storage and availability
Database technologies such as MongoDB focus on number 2. Software engineers have grappled with the challenges around creating autonomous, self-recoverable and scalable database systems since the 1970s and the advent SQL relational databases (first developed by IBM as their ‘System R’ prototype and commercialised by Oracle in 1979).
By the 1990s, the Object Oriented databases that led to NoSQL implementations had appeared and it is this format that is now dominates bulk data storage on the web. But the original problem remains. The advent of cloud computing and proliferation of IoT hardware has meant database technologies and systems are more important than ever. Data as currency is the economics of the digital economy.
The main categories of NoSQL databases are:
and Search Engines
Our focus here is Document Store databases, which hold data in the form of JSON documents. Document stores do not require a set schema and data structure can vary between documents. Document databases use the same document-model format used in their application code, which makes it easier for software developers to store and query the data in a database.
“The flexible, semi-structured, and hierarchical nature of documents and document databases allows them to evolve with applications’ needs. The document model works well with use cases such as catalogs, user profiles, and content management systems where each document is unique and evolves over time. Document databases enable flexible indexing, powerful ad hoc queries, and analytics over collections of documents”.
MongoDB, an open source project also available as an enterprise solution, has become a leading Document Store database technology, with particularly common application in an IoT environment.
Open source alternatives to MongoDB include, but are not limited to, CouchDB, Apache Cassandra and PostgreSQL as well as cloud vendor-native choices like AWS’s DynamoDB and AmazonDocumentDB (which is compatible with and can support MongoDB in an AWS environment).
MongoDB Shards And Sharded Clusters
In MongDB, sharding splits large data sets into smaller subsets held across multiple instances. These multiple smaller data sets are termed Shards, which work together as one large cluster-based data set.
From MongoDB 3.6 on (the most recent release in MongoDB 4.4), shards are deployed as a replica set, which offers the redundancy and high availability crucial to contemporary database technologies. Shards are only connected to directly by users or applications for local maintenance and administration operations.
A query to a singly shard will only return the subset of data held on that particular shard. Cluster-level operations such as read or write operations involve the user or application connecting to the mongos. The technology does not guarantee any two contiguous chunks of data being able to reside on a single shard.
Sharding Broken Down
Let’s take a closer look at the components of MongDB sharding architecture:
Each individual database in a sharded cluster centres around a primary shard. The primary shard holds all the cluster database’s un-sharded collections. A database’s primary shard is not related to the primary in a replica set.
When creating a new database, the primary shard is selected by the mongos as that holding the least amount of data in the cluster. The mongos identify the primary shard using the totalSize field returned by the listDatabase command.
The interface between the client applications and the sharded cluster is provided by the mongos instances, which route queries and write operations to the shards. As far as an application is concerned, a mongos instance behaves in the exact same way as any other MongoDB instance.
Changing The Primary Shard In A Database
It is possible to change the shard assigned as a database’s primary shard, though the migration process can take some time, during which collections associated to the database should not be accessed. To do so, the movePrimary command is executed.
Before migrating to a new primary shard, keep in mind larger data collections being transferred can impact on a cluster’s operations. Take this and network load into account.
When deploying a new sharded cluster containing shards previously employed as replica sets, existing databases continue to reside on their original replica sets. Databases created at a later date can sit on any of the cluster’s shards.
For an overview of a cluster, use the sh.status() method in the mongo shell. The report shows the database’s primary shard as well as how data chunks are distributed across the shards comprising a cluster.
Sharded Cluster Security
Intra-cluster security is maintained and unauthorised access from components blocked through the use of Internal/Membership Authentication. Internal authentication can only be enforced if each mongod in a cluster is initiated with the appropriate security settings.
Shard Local Users
Role-Based Access Control (RBAC) is supported on individual shards and restricts unauthorised access to data residing on it as well as its operations. To enforce RBAC, each mongod in the replica set should be initiated with the –auth option.
Each shard has its own shard-local users, which cannot be used on other shards or to connect to the cluster via a mongos.
Why Open Source MongoDB Using Sharding Was Our Choice Of Database Solution
Our client SolarLog is a global leader in the innovation of software and IoT devices for monitoring and managing photovoltaic plants. The company’s solutions are used by 327, 313 PV plants worldwide across 138 countries. It operates a huge number of IoT devices, which gather the data from the solar energy plants whose output its products and solutions are designed to optimise.
After partial processing, those large and growing data flows needed to be efficiently stored and easily and reliably accessible. Scalability was a requirement, as was self-healing qualities. The load had to be distributed between cluster nodes to ensure high availability and recoverability.
Our team elected to develop the database using open source MongoDB sharding. The easy alternative was a MongoDB enterprise license. However, that would have come at a cost of €100,000 annually before server costs.
Confident that a configuration that used MongoDB’s open source code and sharding for effective distribution across node clusters could be built to meet the project’s requirements in full, that is what we set about implementing.
Problem Solving – What We Learned In The Process Of Configuring Our MongoDB Sharding Solution
No project involving relatively complex custom architecture and development is successfully executed without any issue being encountered. This particular database solution was no exception. Initially, the wrong approach to creating backups was taken. In the final version of the solution, the scheme was adjusted to take snapshots of the file system, with database records blocked at the moment snapshots were captured.
Our solution for SolarLog is not groundbreaking in the sense we didn’t do anything with MongoDB open source code that hasn’t been done previously. However, our MongoDB sharding approach to clusters both fully answered to the requirements of the project and highlights how the often weighty expense of enterprise licenses can be avoided with a little thought and innovation.
Paying for enterprise solutions can be the convenient option, if the cash is available and the business case supports the overhead. But it is also, especially in the case of mature and well-developed and resourced open source technologies, not necessary. An open source alternative, especially one that can lead to costs savings of as much as €1 million over the lifespan of a database architecture, is almost always worth at least considering and exploring.
Does your next project need MongoDB or other database, web or cloud technologies expertise?
Drop us a line, we’d love to help.
Contact Us HERE