Karpenter has emerged as the most advanced node scheduling technology for EKS in the current market. And the Karpenter team has been hard at work making further improvements to this groundbreaking open-source Kubernetes autoscaler.

With Karpenter recently graduating to beta, there are many codebreaking changes to consider as you make the update. If you’re looking to find out what these changes mean for you and your team, we’ve got you covered.

We went through the full list of Karpenter Updates over the past 6 months and picked out just the most important Karpenter features, changes, and fixes to discuss (such as added support for managing nodes’ InstanceProfile, improved validation for some custom resources, released v1beta1 of the API, and more!)

Read on to find out what you’re getting yourself into with the newest Karpenter updates.

Key changes

Let’s start with the most important recent changes:

Release of v1beta1 API

For us, the breaking change to Karpenter is the rollout of the v1beta1 API and the removal of the v1alpha5 API. Karpenter will no longer respect the earlier API starting at version 0.33 — making it critical to prepare your changed IaC definitions prior to any upgrade to that version.

This update changes the terminology used in Karpenter custom resource definitions (CRDs). This means you’ll need to convert all of your Provisioners, NodeTemplates, and Machines into NodePools, NodeClaims, and EC2NodeClasses.

The team has released a command-line Golang tool called karpenter-convert that will automate the conversion of your existing manifests. Version 0.32.x will provide deprecated access to the old APIs and in 0.33.x they will no longer be accepted, so be careful with stepping-through versions. The official v1beta1 migration guide can be found here. We’re really excited about this step towards a stable API!

New IAM Rules

There are changes to tagging and InstanceProfile management in 0.33. We need to change the tags that Karpenter is allowed to operate on to stay consistent with the new terminology and able to create and attach InstanceProfiles.

  • For EC2: policies are tag-based, so the ec2:RunInstances, ec2:CreateFleet, and ec2:CreateLaunchTemplate permissions must include the new karpenter.sh/nodepool tag in addition to any previous tags such as provisioner-name.
  • For IAM: InstanceProfile generation is now handled by Karpenter, so it will need permission to iam:CreateInstanceProfile, iam:AddRoleToInstanceProfile, iam:RemoveRoleFromInstanceProfile, iam:DeleteInstanceProfile, and iam:GetInstanceProfile. This should be limited to tags that the controller is responsible for managing.

We found the example policy statement in the Karpenter docs to be a great basis for our own work.

Drift Detection

Version 0.30 adds drift detection for a number of fields most cluster owners would prefer to remain static. This allows Karpenter to replace nodes that have diverged from the NodeTemplate assigned to the node pool. Fields that will be discovered by drift are:

  • Instance Profile
  • AMI Family
  • UserData Tags
  • Metadata Options
  • Block Device Mappings
  • Detailed Monitoring
  • Context

If nodes are found to have drifted, they will be replaced as described in the Karpenter docs. As always, this will be respectful of Pod Disruption Budgets and other constraints to voluntary evictions.

CEL Validation

In 0.32, changes were made to the way Karpenter validates objects that belong to a few of the CRD’s to bring them in line with the Kubernetes CEL or Common Expression Language. This affects NodeClaim Requirements as well as EC2NodeClass and Labels on NodePools.

We like this because it allows third-party linters and other apps to rely on our manifests to follow the rules.

InstanceProfile Changes

When determining the InstanceProfile used with nodes in NodePools, in version 0.31 and prior the only option was to specify an exact InstanceProfile that would be attached to the instance. This requires that the IAM profile be available prior to creating the NodePool.

We’ve added instructions around this above, and for our purposes we love that we don’t have to worry about InstanceProfiles at all and can instead expect Karpenter to manage and destroy them as needed. This brings it in line with KOPS and ClusterAutoscaler before it.

InstanceProfile Management Added

A method to allow Karpenter to create and manage InstanceProfile objects for your nodes based on an arbitrary Role is introduced in version 0.31. This requires the cluster to have access to an IAM endpoint as well as IAM policy that allows it to create, assign, and delete roles. We mentioned the IAM changes in the Key Changes section above.

(Temporary) Removal of Explicit Profiles

With the release of the v1beta1 API for EC2NodeClass, the Spec.InstanceProfile field was removed. This broke the previous functionality of explicitly declaring an InstanceProfile.

To restore the previous functionality, the field was restored. Version 0.32.2 and 0.33.0 have restored previous behavior.

More Noteworthy Changes

Take a glance at this list for a summary of the other noteworthy changes in the last four Karpenter releases. This is a roll-up of the team’s fantastic work in the second half of 2023, setting a firm foundation for stable releases moving forward.

0.30

  • AssumeRoleARN and AssumeRoleDuration added to support using Karpenter in an assumed role, i.e. in a cross-account situation 4370
  • Custom userdata support for Windows 4300

0.31

  • Graceful shutdown for Bottlerocket instances 4571
  • Instances are tagged with their name 4611
  • Better logging for CloudFormation failures 4556

0.32

  • Add status field for imageID 4637
  • Release API v1beta1 4744

0.33

  • Support for Elastic Fabric Adapter resource added 5068

About nOps Compute Copilot

At nOps, we’re big fans of Karpenter. That’s why we built Compute Copilot on Karpenter to make it ultra-easy to save. It can help you cut your EKS bill by 40-70% while freeing countless hours from resource management.

How does it work? nOps adds to Karpenter:

  • Awareness of your commitments (Reserved Instances & Savings Plans) across your organization
  • Advanced analysis of Spot market pricing data
  • Advanced Spot termination prediction
  • Continuous workload reconsideration

The solution intelligently automates workload management, analyzing your organizational utilization and commitments in real time to move the optimal amount of your workload onto stable and discounted Spot instances.

By constantly monitoring your dynamic usage and the real-time Spot market, Copilot empowers users to run mission-critical workloads with peace of mind that they are scheduled on the most cost-optimized, reliable, and stable option at every moment.

Our mission is to empower engineering teams to realize more Karpenter savings for less effort — freeing up time to focus on building and innovating.

nOps was recently ranked #1 in G2’s cloud cost management category. Join our customers using nOps to slash your cloud costs by booking a demo today!