As IT budgets are squeezed and companies face pressure to cut costs, Auto Scaling Group (ASG) costs often comprise a significant portion of your cloud bill. That’s why we put together a complete, step-to-step guide to cost optimizing your ASGs.

We’ll take you through the full process, from preventing overprovisioning to running stateless applications safely on Spot – with practical how-to information, screenshots, and best practices.

Let’s dive into the basics, or you can skip right to utilizing Spot in your ASGs.

Set Up CloudWatch Alarms

The first step is gathering information on the current state of your ASGs via monitoring with a solution like AWS CloudWatch, Datadog, or New Relic. For example, AWS CloudWatch provides real-time monitoring of important ASGs metrics like CPU utilization, network I/O, and instance counts, which you need to gain real-time insight into the state of your ASGs and the percentage you are utilizing.

Datadog metrics covering one week of usage

Datadog metrics covering one week of usage

CloudWatch metrics covering one week of usage

CloudWatch metrics covering one week of usage.

Based on these metrics, set up CloudWatch alarms to proactively manage costs and ensure quick responses to unexpected changes in demand. Here are the steps:

1. Open the AWS Management Console and navigate to the CloudWatch service.

2. Create CloudWatch alarms for the chosen metrics. Define the threshold values that should trigger scaling actions.

CloudWatch service in AWS Management Console

3. For example, create an alarm that triggers when the average CPU utilization across instances in your Auto Scaling group exceeds 70%

Image displaying an example on how to create an alarm in CloudWatch that triggers when the average CPU utilization across instances in your Auto Scaling group exceeds 70%

4. Create budgeting alarms. CloudWatch monitoring alarms alert you about over-usage based on metrics. To monitor based on cost instead, you can set up budget alarms. 

Navigate to the AWS Management Console →  AWS Budgets dashboard → Create Budget → select the Budget Type → Define budget with a “name”, “Set the Budget Amount”, “Select Cost Filters” → Configure Alerts.

Define and Automate Dynamic Scaling Policies.

Once you’ve set up CloudWatch alarms, the next step is to automate your scaling policies. Here are the basic steps:

1. Identify the metrics most critical for your application’s performance and stability. Common metrics include CPU utilization, network throughput, memory usage or custom application-specific metrics.

2. Establish appropriate minimum, maximum, and desired capacity settings for your ASGs. These should align with your anticipated workload demands, ensuring that the ASG can scale efficiently without over-provisioning.

For example, scale out (add instances) when CPU utilization exceeds a certain threshold, and scale in (remove instances) when the utilization decreases.

One strategy to consider is adjusting settings proactively in advance of predicted changes in demand, optimizing resource utilization and cost. We can also schedule the scaling based on the monitoring metrics where we can observe spikes and create the ASG to scale out during certain events or times of the week.

Cost-optimize this workload by scaling in during off-usage

Cost-optimize this workload by scaling in during off-usage.

On the other hand, if your application has sporadic or unpredictable loads, consider using a mix of instances, possibly including burstable types (e.g. AWS T-series) that offer baseline performance with the ability to burst.

3.Create Dynamic Scaling Policies to properly utilize your resources and maintain cost-effective performance as your usage changes. In the AWS Management Console, go to the Auto Scaling service.

Define policies for scaling out and scaling in based on the CloudWatch alarms. Configure your ASG to actively adjust its capacity in anticipation of idle or loading windows, i.e. periods of low and high demand. You’ll need to regularly review historical data in CloudWatch to continually fine-tune your scaling policies.

Tip: Vertical vs horizontal scaling

After monitoring your application’s requirements, you can decide to set up either horizontal or vertical scaling. Vertical Scaling involves changing the instance type to a more powerful one as needed. Horizontal Scaling is adding more instances of the same type. If your infrastructure supports it, horizontal scaling tends to be more cost-optimized.

4.Configure Scaling Actions: This may include adjusting the desired capacity, minimum and maximum instance counts, and cooldown periods that prevent rapid scaling events in response to fluctuations, ensuring stability.

Create dynamic scaling policies and configure scaling actions based on CloudWatch metrics

Steps 3 & 4: Create dynamic scaling policies and configure scaling actions based on CloudWatch metrics

With the release of the v1beta1 API for EC2NodeClass, the Spec.InstanceProfile field was removed. This broke the previous functionality of explicitly declaring an InstanceProfile.

To restore the previous functionality, the field was restored. Version 0.32.2 and 0.33.0 have restored previous behavior.

Ensure you’re using the right instance types

Here are the main factors to consider when selecting instance types. First, you need to understand your application.

Type

Purpose

Example

CPU Intensive

Compute-optimized — for applications that are CPU-bound, such as Batch processing jobs or gaming servers

C-Series

Memory Intensive

For memory-bound applications like databases or big data processing tasks

R-Series

Network Intensive

For applications that require high network throughput, like video streaming services

A1, M5n

HPC Optimized

For applications requiring high-performance processors, such as complex simulations and deep learning tasks

H1, Z1d

 

 

Storage Optimized

Tailored for tasks that need high sequential read/write access to large local data sets. They excel in delivering tens of thousands of low-latency, random I/O operations per second (IOPS) for applications

I3, D2

 

 

Accelerated Computing

Leverage hardware accelerators or co-processors for more efficient execution of tasks like floating point calculations, graphics processing, and data pattern matching, compared to traditional software running on CPUs

P2, P3, F1

High Availability and Fault Tolerance: Ensure that the same instance type is available in all Availablity Zones. Choose instance types and distribute them across multiple Availability Zones to enhance availability and fault tolerance. Pricing varies across regions — choose cheaper regions without latency issues (for example us-east-2 is slightly cheaper than us-east-1).

Tip: Consider the growth trajectory of your application. Select an instance type that not only meets current needs but can also accommodate future growth without frequent changes.

Utilizing Spot in your ASGs

Optimizing your ASGs only goes so far — for the biggest discounts, moving your usage onto discounted Spot instances can help you save significantly. However, there are challenges involved in this approach:

Spot is unstable. AWS gives you a discount on the instance, but not a guarantee that you’ll be able to use it to the end of your compute need. When a user willing to pay the full On-Demand price emerges, AWS can terminate these instances with a two-minute warning (known as a Spot Instance Interruption). These unexpected interruptions can cause workloads to fail, potentially posing a major problem for production or mission-critical applications.

The Spot market frequently shifts. Even if you find a suitable Spot option for a particular instance type, market pricing and availability might change the next day — meaning frequent interruptions and the constant work of finding new suitable Spot options for your workload.

Savings Plans and Reserved Instances go underutilized. Unless you constantly track and balance the amount of Spot you use with your existing AWS commitments, you may inadvertently use too much Spot and waste the SP and RIs that you’ve already paid for.

In order to mitigate some of these issues and benefit from substantial Spot savings, here are some best practices to follow.

#1: Thoroughly Analyze Workload Suitability: Before migrating instances to Spot, assess the suitability of your workload. Stateless applications with no dependency on individual instances’ persistence are ideal candidates to avoid compromising on reliability.

#2: Implement Spot Instance Diversification: Rather than relying on a single instance type, diversify your Spot Instances across various instance types and availability zones. This strategy reduces the impact of sudden market fluctuations on your workload. (We’ll get into the details below).

#3: Monitor and Respond to Spot Termination Notifications: Subscribe to SNS or SQS notifications and actively monitor Spot Terminations to gracefully terminate your Spot instances and/or launch a replacement without downtime.

#4: Automate Scaling Policies: With tools like CloudWatch, you can trigger scaling actions based on predefined metrics to ensure timely adjustments to changes in Spot market conditions.

Diversifying Spot Instances

The most common way to achieve a diverse Spot instance list is to add Mixed Instance Policies to your ASG.  Analyze your workload to understand what instance families & instance types it can run on and put them in your “Instance Type Requirements”:

Instance Type Requirements

Depending on your workload type you may want to select the proper Allocation Strategy.

Diversifying Spot Instances

The most common way to achieve a diverse Spot instance list is to add Mixed Instance Policies to your ASG.  Analyze your workload to understand what instance families & instance types it can run on and put them in your “Instance Type Requirements”:

Allocation Strategy

Description

Use Case

priceCapacityOptimized

Optimizes for capacity and price by selecting the lowest priced pools with high availability. Recommended for various workloads, including stateless applications, microservices, web apps, data analytics, and batch processing.

Stateless applications, microservices, web applications, data analytics, batch processing.

capacityOptimized

Optimizes for capacity by launching instances into the most available pools based on real-time data. Suitable for workloads with a higher cost of interruption, such as CI, image rendering, Deep Learning, and HPC. Option to set priorities for instance types.

Continuous Integration (CI), image rendering, Deep Learning, High Performance Compute (HPC) workloads.

lowestPrice

Default strategy, selects the lowest priced pool. May lead to higher interruptions. Recommended to override with priceCapacityOptimized for improved capacity and price optimization.

Default strategy; override with priceCapacityOptimized for better optimization.

For example, if you have a mission critical workload and you want to prioritize stability, you can select capacityOptimized.if your workload is highly fault tolerant and you want to save the maximum amount of money, select lowestPrice.

To better understand the market conditions for your selected instance types you can use Spot Placement Scores provided by AWS.

Spot Placement Scores
However, it’s worth noting understanding and interpreting placement scores accurately demands a deep understanding of how AWS calculates these scores and their implications on Spot instance availability. Incorporating Spot placement scores into auto-scaling strategies adds complexity, because you need to balance cost, performance, and availability.
As described above, changing Spot market pricing and availability means users must continuously monitor and react quickly to score changes to optimize their Spot instance use.
Building a complex forecasting and prediction model with lack of enough historical data is very complex and the reliability of those instances is highly volatile.

Let nOps Compute Copilot automatically cost-optimize your ASGs

If these challenges sound familiar, nOps Compute Copilot can help. Just simply integrate it with your ASGs and let nOps handle the rest.

Leveraging historical and current Spot data based on the $1 billion+ in cloud spend we manage, our solution employs a comprehensive scoring system for each instance. This scoring considers factors such as Spot lifetime in the region, Availability Zone, price, capacity, and more. In real-time, scores are assigned to every Spot market instance based on these factors.

Copilot moves your ASGs onto the best Spot instances for you, using ML to consider the four dimensions of Price (lowest cost), Capacity (sufficient availability), Reliability (preferring instances with the highest lifetime), and Priority (considering customer-chosen instance families). And by leveraging ASG built-in features like Mixed Instance Policies, it ensures you can always fall back to On-Demand if needed to avoid the possibility of downtime. 

Based on this proprietary AI-driven management of instances for the best price in real time, Copilot continually analyzes market pricing and your existing commitments to ensure you are always on the best blend of Spot, Reserved, and On-Demand.

Here are the key benefits of delegating the hassle of cost optimization to nOps.

Hands free cost savings. Copilot automatically selects the optimal instance types for your ASG workloads, freeing up your time and attention to focus on building and innovating.

Enterprise-grade SLAs for the highest standards of reliability. Run production and mission-critical workloads on Spot with complete confidence.

Effortless onboarding. Just plug in your Batch, ASG, EKS, or other compute-based workload to start saving effortlessly.

No upfront cost. You pay only a percentage of your realized savings, making adoption risk-free.

nOps was recently ranked #1 in G2’s cloud cost management category. Join our customers using nOps to slash your cloud costs by booking a demo today!

More On How It Works

The below diagram provides additional information on how Copilot optimizes your ASGs for effortless savings.
A diagram on how Copilot optimizes your ASGs for effortless savings.

1. ASG launches a new on-demand instance

2. Lambda intercepts the EC2 Instance State-change Notification event from EventBridge

3. If the created instance is not protected from termination and should be replaced, Compute Copilot performs the following steps.

a. Copy the Launch Template or Launch Configuration from the ASG launch template

b. In the copied Launch Template, modify Network Interfaces, Tags, Block Device and/or Mappings from the instance Launch Template or Configuration if needed

c. Fetch recommended instance types from nOps API

d. Request Spot Fleet with the copied Launch Template and recommended instance types

e. Once the Spot request is fulfilled, get the Spot Instance and wait for it to its state to be Running

f. Attach the created Spot Instance to the ASG

g. Wait for the attached Spot Instance’s state to be InService

h. If there is a lifecycle hook for termination for On-Demand, the On-Demand will go through the lifecycle hook action for termination

i. Terminate the On-Demand instance

j. If there is a Mixed-Instance Policy, modify the percentages of On-Demand and Spot accordingly.

For more information, you can also consult the documentation.