Advanced autoscaling of Cluster API based clusters

ClusterAPI is becoming the standard approach for deploying Kubernetes clusters no matter which infrastructure provider you want to run on, whether that be a public cloud provider like AWS, Azure, GCP, Oracle, Akamai etc., a virtualization platform like vSphere, Proxmox, Openstack, etc., or bare metal servers using a management layer like Metal3, Tinkerbell, Canonical MaaS etc.

ClusterAPI is not only a great option being used directly by many consumers, it also is a key foundation for the majority of the commercial multi cloud Kubernetes distributions today such as Tanzu Kubernetes Grid (VMware by Broadcom), EKS Anywhere (AWS), Rancher (Suse), Anthos (Google), NKP (Nutanix), Palette (SpectroCloud), and many more.

ClusterAPI offers a way to define and manage our Kubernetes clusters declaratively using CRDs defined by the project and by the different providers, making it an extensible yet standardized platform allowing for simplicity o0f management for Kubernetes clusters across a multitude of different targets. Currently there are 32 official Infrastructure providers for ClusterAPI, 9 bootstrap providers, and 10 Control plane providers allowing for a wide range of support for deploying clusters exactly as you require in your own environments.

Workload autoscaling in the Kubernetes world is a very common and beneficial feature used in most Kubernetes environments via built in mechanisms like HPA, or by additional tooling such as VPA or my personal favorite KEDA which can help us perform event driven autoscaling of our workloads and which is becoming a de-facto standard in this space.

Another type of autoscaling which is critical when it comes to Kubernetes is Cluster Autoscaling. While workload autoscaling is a very mature space with great tools and practices, the Cluster Autoscaling world is much more challenging.

Workload autoscaling is agnostic to the underlying infrastructure and the specific Kubernetes distribution being used, whereas cluster autoscaling is highly coupled to both making it a much more fragmented space, with many challenges which will differ based on your specific setup.

Lets take for example AWS vs Proxmox. When using AWS we have cloud concepts such as Autoscaling Groups (ASGs) which can be used by different autoscaling solutions such as the Cluster Autoscaler project which is an official sub-project of the Kubernetes SIG Autoscaling. While this works great for AWS, Proxmox does not work that way and no similar concept exists making Cluster Autoscaling a much bigger challenge.

This is where ClusterAPI though can offer a huge benefit. Cluster Autoscaler has a provider based architecture and while the majority of them are different cloud providers, one of the providers is actually ClusterAPI!

This means that if our clusters are deployed and managed using ClusterAPI we can now add autoscaling capabilities to our cluster without any infrastructure provider dependencies. This is enabled because the ClusterAPI provider simply treats the ClusterAPI resources (In specific MachineDeployments and MachinePools) as the “cloud provider” and it delegates the actual creation of the machines and interaction with the specific infrastructure provider to ClusterAPI, allowing for a seamless, and cloud agnostic interface for Cluster Autoscaler.

This approach has proven to be very successful and has benefited many environments, however as we all know, the Kubernetes landscape never rests and newer, shinier and more advanced technologies keep arising changing often the focus and direction the industry decides to take in a specific area.

Cluster Autoscaling for example is one these areas where we are seeing emerging technologies in this space gaining a lot of traction, which are trying to solve challenges seen with the traditional Cluster Autoscaler. The key project we see gaining traction in this area is Karpenter.

Karpenter which was originally developed by AWS and is now an official sub project of Kubernetes managed under SIG Autoscaling, provides new and exciting advancements in the Cluster Autoscaling space which can help with lowering operational overhead, improve resource optimization, and reduce costs.

While Karpenter is extremely interesting and in AWS it is gaining huge traction, because it is a newer project, it also has the challenge of not supporting nearly as many cloud and infrastructure providers. currently only AWS and Azure have production ready providers for Karpenter making it a great solution in either of those clouds but irrelevant if running anywhere else.

This is an area that the ClusterAPI community has been looking at for the past few months and we now have an Alpha implementation of a Karpenter provider for ClusterAPI which just like with the Cluster Autoscaler provider, it provides a cloud agnostic interface for autoscaling any Kubernetes cluster deployed and managed via ClusterAPI!

While this provider is currently under heavy development and is in very early stages, it is a huge step in the right direction, and is the start of a very interesting and exciting path towards making Karpenter much more accessible to the masses, and also increasing the case for the major benefits of ClusterAPI based management of Clusters!

The Karpenter provider for ClusterAPI is currently in the process of being migrated to the Kubernetes SIGs GitHub organization under the umbrella of SIG Cluster lifecycle which is the SIG in charge of ClusterAPI.

The repo is currently located here but the final URL should be this and all status of the migration can be found here.

I am extremely excited to see where we as a community evolve this provider in the near future, and as a maintainer of the repo, We welcome all comments, issues, feedback and of course PRs to help us build the best provider and solution for the wider community!

Another interesting space I have been investing time investigating and evangelizing throughout the community (most recently at OSS Summit Europe last week in Vienna) is the potential of using KEDA for Cluster Autoscaling which is already possible with ClusterAPI based clusters, as KEDA can auto scale any Kubernetes resource which implements the /scale sub-resource which ClusterAPI MachineDeployments do!

Using KEDA as a Cluster Autoscaler is still very much uncharted territory with some clear rough edges, but the potential is huge, and the ability it opens for predictive autoscaling, instead of just reactive autoscaling is extremely interesting to me, and i hope to continue work in this area too, in order to create more and more solutions, and options for the community in this space!

If you are interested in the KEDA + ClusterAPI possibilities, you can checkout the slides from my recent talk at OSS Summit, or reach out directly!

Advanced autoscaling of Cluster API based clusters

Like this:

Related

Leave a ReplyCancel reply

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from vRabbi's Blog