Serverless vs. Hadoop & Containers In The Evolution Of Big Data & AI

CloudUPDATED ON October 20, 2020


Serverless for Big Data AI

Serverless For Big Data & AI Apps – The Strengths of Containers Without The Downsides?

If you’re building a Big Data application in 2019, you will inevitably have to weigh up the choice between Cloud architecture based on containers and Serverless architecture. K&C’s Big Data & AI Consulting and Development team weighs up the pros & cons of the two options:

Both Serverless and containers offer huge cost advantages compared to what went before them. Prior to Serverless and pre-Serverless Cloud, the insights afforded by real Big Data analysis were practically limited to large enterprise because of the infrastructure overheads entailed.

K&C’s experts believe that if a Serverless architecture is not yet always the right solution for Big Data-powered AI applications, the technology is moving in that direction.

In this article we examine why, within the context of the evolution to first Hadoop and then containers, Serverless is so well suited to Big Data processing. And why it represents the future of app development more generally.

The Evolution Of Big Data Infrastructure – Hadoop to Cloud to Serverless

In 2012, Forbes published a guest post on the rise of Big Data, written by John Bantleman, CEO of database software company Rainstor. The ‘age of Big Data’ was announced. Bantleman wrote:

“We’ve entered the age of Big Data where new business opportunities are discovered every day because innovative data management technologies now enable organizations to analyze all types of data”. 

However, Bantleman quickly moved on to warn that the business opportunities Big Data was opening up would come with costs not yet appreciated. Collecting, storing, processing and using AI/machine learning algorithms to analyse the huge volumes of semi and unstructured data being generated requires a huge computing resource. 

Big Data Challenges

The infrastructure of any Big Data application must meet the following challenges:

  • Data flowing from multiple sources and in inconsistent formats such as structured and unstructured and including anomalies or requiring unnecessary ‘noise’ data to be stripped out.
  • Data flow is likely to be uneven with traffic flows spiking or dropping off at various points throughout the day and/or on a seasonal basis.

Before the rise of Cloud and Serverless offered storage and computing resource as a utility service, processing Big Data meant building and maintaining the server infrastructure to do so. That had to be large enough to accommodate peaks of data flow even if they were only occasional.

It is precisely the anomalies such as peaks and troughs in data flow that tend to offer the most valuable scientific or commercial insights so in most instances little value in infrastructure capacity that can handle the 90% or 99% the rest of the time. However, that meant paying for and maintaining expensive infrastructure that spent most of its time redundant.

First Hadoop and then Cloud computing changed that. By distributing Big Data sets across many cheap ‘commodity server’ nodes, which combine into a computational resource capable of storing and handling huge data sets, Hadoop significantly lowered the cost of the required bare metal infrastructure.

That 2012 Bantleman Forbes article estimated a Hadoop cluster and distribution facility for Big Data cost around $1 million compared to the $10 million to $100s of millions for enterprise data warehouses. But of course, $1 million is still not pocket change and maintained a barrier to entry that kept most out of Big Data applications.


Serverless for Big Data AI Apps


Next came Cloud computing and containers. Cloud providers such as AWS turned computing power into a service – removing the requirement for major upfront investment in hardware infrastructure. Even the more budget overhead Hadoop represented. The pay-as-you-go and fluidly scalable model of Cloud opened the door for the experimentation and innovation that led to a rich open source development ecosystem.

It has also allowed many young companies using Big Data to grow and flourish that would otherwise have had to contend with much tougher barriers to entry.

Cloud Computing meant no upfront investment and only paying for the processing power needed for irregular data flow peaks while they happened.

Big Data-Powered Machine Learning AI Giving Humanity The Knowledge To Change The World

A game changer. Cloud computing’s democratisation of Big Data can be credited as the catalyst for a new technology revolution. One that spans digital technology and biotechnology. Revolutions gathering pace in medicine, pharmaceuticals, finance, commerce, agriculture, food technology and pretty much any other sector you may care to mention are happening because start-ups and SMEs can now afford to build and run Big Data applications.

The Machine Learning AI zeroing in on patterns previously undetectable is suddenly turbo charging new discoveries.

Within a few short decades the world we live in will be unrecognisable. Yes, technology has advanced quickly over the decades before. But what Cloud-powered Big Data and AI will achieve over the next several will be a paradigm shift.

We’ll be able to cure quite possible a majority of previously incurable diseases and conditions. The human genome and those of other forms of life will be mapped. Autonomous vehicles will reshape the economy and our lifestyles more than most imagine today. Ecommerce will be truly a truly personalised experience. The list goes on.

Cloud Computing Democratised Big Data By Cost – Serverless Lowers The Skills Bar

But as much as Cloud Computing has knocked down barriers to entry for Big Data and the Machine Learning AI that feeds on it, there is still a bottleneck. Cloud has hugely cut costs and containers such as Kubernetes have made building apps more efficient and flexible.

But setting up and maintaining the Cloud container architecture for Big Data AI applications is still very difficult. The main gains are in velocity and time required to maintain architecture. But containers require specialists with very specific as well as deep and wide ranging knowledge and experience. Those specialists are expensive, either to hire ‘off-the-shelf’ or as an investment in further training. If they can be found at all.

The explosion of IoT across pretty much every sector imaginable means huge demand for Big Data and Machine Learning specialists. Everyone is fishing in the same shallow pool of professionals. The result is hiring developers with the skills needed to build Cloud architecture for Big Data apps is starting to resemble a hunt for hen’s teeth if you are an employer. It’s a big problem.

Luckily, humanity has an unerring knack for innovating and creating to solve big problems. Serverless architectures have been developed as an evolution of Cloud architecture and they are of particular benefit to Big Data and Machine and Deep Learning.

Serverless solutions such as AWS Lambda, Microsoft’s Azure, Google’s Functions and IBM’s OpenWhisk take on much of the heavy lifting of Cloud architecture development. They provide ‘components’ for most of the common functionalities between different apps. That means only truly unique functionalities have to be custom-coded and deployed from scratch. The rest, from user authentication to structuring and sorting data can be done with ‘plug-and-play’ components built and tested by Cloud vendors with huge resources and offered as a service – ‘Function-as-a-Service’.

That means that organisations building Big Data applications no longer need anywhere near the same number of DevOps experts that represent the very highest level of specialist knowledge and experience.

Serverless furthers the democratisation of Big Data and AI by lowering the development resource barrier to entry, carrying on the baton from Cloud and containers, which lowered infrastructure costs.

Organisations can now rely on the expertise of the developers at AWS, Google or Microsoft for most of an app instead of having to find, train and find a way to encourage it not to fly the coop themselves.

Hadoop, Cloud, Serverless application costs compared

Serverless As The Future Of Cloud Development

RightScale’s 2018 State of the Cloud report shows Serverless is the fastest-growing cloud service model, with annual growth of 75%. In early 2017, AWS put its own Serverless growth at over 300% year-on-year. As the Serverless ecosystem develops, adoption growth can be expected to multiply from even that runaway rate.

Because the components that Serverless architecture is built from are developed by some of the best engineers in the world, and thoroughly tested, they automate many of the biggest challenges of Big Data yet to be fully addressed by container-based Cloud architecture. Namely:

API Calls: specialist Serverless APIs validate requests from different entities preventing unauthorised access to data without the requirement from developers with a specialist security background. The latency issue is resolved through Serverless by developers optimising database tables so the most frequently used data is positioned in ‘hotter’ areas from where they can be called instantaneously. Use of Serverless resources can also be further optimised by assigning less used tables a lower read capacity.

Compatibility: Serverless is helping lower the risk of Cloud-vendor lock-in as well-built Serverless architectures can be made compatible (vendor agnostic) with every major provider. All that is required is for vendor-native components to be switched, such as AWS Lambda for Microsoft Azure or Google Functions.

The ecosystem of Serverless plug-ins for esoteric tools like SSH is rapidly expanding as is that for open-source components. Cloud vendor migration is still not without challenges but as Serverless adoption continues, barriers will continue to fall. This will, of course, have further knock-on effects around pricing models, which will continue to become more competitive, further opening up access to the use of Big Data and Big Data-powered AI. 

K&C – Your Munich-Based Serverless Application Development Experts

Krusche & Company has built up a reputation as one of Munich, Germany and Europe’s most trusted IT services providers over more than 20 years. We strongly believe in the Serverless future of Cloud-based applications and have been early adopters in preparing our DevOps experts for that.

Whether your needs are in the field of Big Data and Machine Learning or any other kind of application that could benefit long term from a Serverless approach, we’re on hand to help. We offer consultancy as well as the provision of dedicated teams of developers or team extensions to cover your skills or resources gaps.

Please do get in touch with any Serverless application development needs or questions you might have. We’d be delighted to help.

Related Service

Angular Development and Migration Services

Read more >

Kubernetes Consulting, Training, Support & Management

Read more >