Karpenter is a high-performance, flexible, open-source Kubernetes cluster autoscaler designed for AWS. It improves application availability and cluster efficiency by rapidly launching right-sized compute resources in response to changing application load.
Karpenter’s flexibility, ease of use, granular control and high level of automation are a significant upgrade over Cluster Autoscaler — helping you to more quickly adjust resources, efficiently scale, and continuously optimize.
With the recent release of Karpenter 1.0 General Availability (GA), it’s the perfect time to adopt Karpenter.
It only takes 20 minutes on average to migrate
Migrating to Karpenter via EKS managed node groups or Fargate is straightforward and involves minimal disruption, thanks to its compatibility with existing Kubernetes clusters and its ability to leverage standard Kubernetes resources.
For detailed instructions, check out our complete ebook guide to migrating to Karpenter.
The Ultimate Guide to Adopting Karpenter
Best Practices For Setting Up Karpenter
Here at nOps, we are huge fans of Karpenter and have helped many teams make the transition. Here are our best practices to keep in mind as you set up Karpenter.
Use Fargate or a Dedicated Node Group
Use Pod Disruption Budgets
Karpenter will ask the scheduler to destroy nodes as part of its normal course of operation. The only way to ensure service reliability will be to inform the scheduler of the requirements for each Deployment or StatefulSet. Refer to the Kubernetes documentation for more information.
Disruption budgets are critical for maintaining application availability during updates or scaling operations by limiting the number of pods that can be disrupted at any given time. They help prevent service downtime by avoiding the simultaneous termination of too many pods. Additionally, disruption budgets balance maintenance and stability, allowing necessary updates while keeping a minimum number of pods running, ensuring a reliable and stable Kubernetes environment.
Avoid Custom Launch Templates
Karpenter guidelines recommend avoiding custom launch templates since they don’t support the automatic upgrade of nodes, multi-architecture support, or securityGroup discovery.
Instead of launch templates, you can use custom user data or directly add custom AMIs in AWS node templates.
Configure Node Expiration On Your NodePool
Set Up NodePools According To Your Workload Types
Create specific NodePools for GPU workloads or general compute
Set Up A Large Variety Of Instance Types For Better Availability
One of the main benefits of Karpenter is its ‘just in time capacity’ which basically means that Karpenter chooses an instance type that fits your workload as best as possible. But to leverage the power of this feature, you need to set up a large array of instances.
If you limit the instance types, you won’t be able to maximize the benefits of using Karpenter.
Enabling broader instance type usage enhances availability by allowing Karpenter to choose from a wider range of instances, ensuring that capacity can always be found, even during high demand periods. This flexibility reduces the chances of resource shortages and improves cluster resilience. Additionally, using diverse instance types optimizes spot instance utilization by leveraging varying prices and availability of different spot markets. This approach not only reduces costs but also increases the likelihood of securing Spot instances, providing both economic and operational benefits for Kubernetes clusters managed by Karpenter.
Use Spot Instances With Interruption Handling
Spot instances provide significant cost savings compared to On-Demand instances, but they can get interrupted if the demand increases beyond the available capacity.
Enabling interruption handling in Karpenter can help manage involuntary interruptions, like with Spot instances that can subsequently cause workload disruptions. It can also handle other events like maintenance, instance terminating, and instance stopping events.
To enable interruption handling, you just need to enable aws.interruptionQueueName in the Karpenter Settings. If you do so, it’s important that you are not also using Node Termination Handler as this will cause contention.
Specify resources for your deployments/pods
Distribute pods across multiple nodes and zones
Distributing pods across multiple nodes and zones enhances resilience and availability in Kubernetes applications. By spreading pods, the risk of a single point of failure affecting service is minimized, as workloads can continue running on other nodes or zones if one fails.
Karpenter automates this distribution by dynamically provisioning nodes across different availability zones, ensuring balanced load and optimized resource usage. Incorporating pod topology constraints, Karpenter ensures that pods are placed according to specified rules, preventing resource contention and boosting performance and reliability, maintaining uninterrupted service even during node or zone disruptions
Best Practices For Migrating to Karpenter
Migrating to Karpenter is straightforward, taking only 20 minutes on average. You simply install Karpenter on your cluster, configure a few provisioning specifications based on your needs, and it seamlessly takes over the node provisioning process with minimal disruption to your existing operations.
For detailed instructions and a full list of best practices, check out our complete ebook guide to adopting Karpenter.
Best Practices For Configuring Karpenter
Karpenter guidelines recommend avoiding custom launch templates since they don’t support the automatic upgrade of nodes, multi-architecture support, or securityGroup discovery.
Instead of launch templates, you can use custom user data or directly add custom AMIs in AWS node templates
Prioritize Savings Plans and/or Reserved Instances
Split Between On-Demand & Spot Instances
This configuration allows you to create a mixed instance setup where a specific percentage of your EKS nodes run on On-Demand instances, while the remaining portion runs on Spot instances. This setup is ideal for workloads that can tolerate interruptions and benefit from the cost savings of Spot instances.
To do this, you can create a NodePool each for Spot and On-Demand with disjoint values for a unique new label called capacity-spread. Then, assign values to this label to configure the split. If you’d like to have a 20/80 split, you could add the values [“2″,”3″,”4″,”5”] for the Spot NodePool, and [“1”] for the On-Demand NodePool.
Tip: Balancing between Savings Plans, Reserved Instances, and Spot is extremely difficult to do manually.
Your workloads are changing every minute — trying to keep up with all the changes is a full-time job. But with the power of AI, you can stay continually optimized for minimal effort.
nOps Compute Copilot ensures all of your compute is on the most cost-effective capacity at all times, whether that’s RIs, SPs, or Spot. As the market and your utilization shifts, it makes adjustments to workload placement to maximize savings.
Fully utilize all of your Reserved Instances and Savings Plan every month and never over-commit again. nOps backs you with a 100% utilization guarantee (or you get a refund).
Book a demo to find out how easy it is to do RI, SP and Spot with nOps.
Protecting Batch Jobs During the Disruption (Consolidation) Process
This feature addresses the need to safeguard long-running batch jobs from being disrupted during the node consolidation process managed by Karpenter. Consolidation is a process where Karpenter identifies underutilized nodes that can be removed or replaced to reduce cluster costs. However, this process can disrupt running pods, including critical batch jobs.
By using the karpenter.sh/do-not-disrupt: “true” annotation, you can protect these pods from being moved or interrupted until their tasks are complete, ensuring that they run to completion without interference.
Or you can configure the `disruption` field, combining consolidationPolicy with consolidateAfter.
With disruption you can tell Karpenter which types of Nodes should be considered. You can also choose to disable consolidation entirely by setting the string value ‘Never’.
Advanced Best Practices
Update Nodes Using Drift
Customizing Nodes with Your Own User Data Automation
By using the userData field in the EC2NodeClass, users can automate additional configurations to their worker nodes upon launch without deviating from the standard AWS EKS optimized AMI. This can include tasks like modifying Kubernetes settings, mounting volumes, or running specific startup scripts.
Overprovision Capacity in Advance to Increase Responsiveness
This strategy is designed to ensure that you have immediate availability of compute resources when needed by preemptively provisioning extra capacity. This is particularly useful for scenarios where you know in advance that a large number of pods will need to be launched simultaneously, such as during data pipeline processing. By overprovisioning capacity ahead of time, you can significantly reduce the time it takes for your actual workloads to start, improving overall responsiveness and performance.
A sensible percentage might be 10-20% for mission-critical production environments.
Karpenter + nOps is Even Better
You can get the most out of Karpenter with nOps. nOps continuously manages Karpenter for you, providing the most efficient, reliable and cost-effective operations for less manual effort.
- Complete EKS visibility. Allocate 100% of your unified AWS spend, see the efficiency of your clusters, drill down to the container level and more.
- AI-Powered Continuous Cost Savings: nOps Copilot is aware of all your purchase commitments and the Spot market, so you automatically get the best performance at the lowest costs.
- nOps is Invested in Karpenter’s Success: nOps has been optimizing and working with the Karpenter community since early beta versions and will continue to support the latest updates as they are released.
Karpenter + nOps are better together
nOps Compute Copilot built on Karpenter makes it simple for users to maintain stability and optimize resources efficiently.
New to Karpenter? No problem! The Karpenter experts at nOps will help you navigate Karpenter migration. We also support other autoscaling technologies like Cluster Autoscaler and ASGs.
nOps was recently ranked #1 in G2’s cloud cost management category and we optimize over $1.5 billion in cloud spend for our customers. Book a demo to find out how to save in just 10 minutes!