As macroeconomic conditions squeeze budgets, controlling cloud costs has overtaken security as the top cloud challenge organizations face. Many organizations are currently overspending, with EC2 (followed by RDS and S3) likely the most expensive item on your AWS bill.

That’s where rightsizing comes in. By analyzing historical usage and performance, you can identify and rightsize instances that are not consuming all the resources currently available to them.

Rightsizing these resources to their proper level goes a long way toward taming cloud costs. In this blog post, we’ll take you through a comprehensive, hands-on guide with step-by-step instructions and tips for rightsizing with CloudWatch metrics.

Data is the key to effective rightsizing

Vast amounts of rightsizing recommendations are available through tools such as AWS Trusted Advisor. Yet, the problem is that rightsizing recommendations are frequently wrong. 

Reliable recommendations require granular historical data on memory, utilization, network bandwidth, volume size, and more. This data is difficult to collect and analyze, and there isn’t a single source for all of the metrics that you need. As a result, rightsizing recommendations are often untrustworthy — and engineers don’t act on them as a result.

This guide will teach you how to collect the data you need from CloudWatch Agent to rightsize confidently, so you can save without compromising reliability.

How to use CloudWatch metrics to make effective rightsizing decisions

Amazon CloudWatch gives us basic metrics by default which are CPU, network and storage. However, reliable recommendations also need to consider memory. AWS offers a paid service (or “agent”) that needs to be installed to get that fourth metric, called “CloudWatch Agent”. (Alternatively, you can collect this data through a custom engineering solution or third party tool such as Datadog). 

  1. Integrate CloudWatch Agent. AWS has an easy-to-follow guide including commands to run on your EC2 instance. 
     Screenshot of Command line for installing CloudWatch
    Command line for installing CloudWatch
  2. Wait 15 days. Once you’ve successfully configured the AWS CloudWatch agent, it will now start giving us memory utilization data. However, you need to collect the memory metrics for a minimum of 15 full days (or for 360 hours* in a 30-day period) to be able to make statistically significant recommendations. If your workload is predictable, it may not be necessary to wait a full 15 days — on the other hand, more variable workloads may need longer.

     

    *If your workload is turned off for part of the day, observe the equivalent of 15 24-hour periods (15 * 24 = 360 hours) over a 30-day period.
  3. Review metrics in Cloudwatch
     Screenshot of CloudWatch metrics used for rightsizing
    CloudWatch metrics used for rightsizing


    Once the metrics are collected, they can be reviewed in the CloudWatch dashboard. Evaluate 4 key metrics: vCPU, Memory, Storage and Network. These metrics are tagged as custom metrics. Please note that CloudWatch recommendations support multiple Operating Systems, including Windows and Linux, working at an OS level to bring detailed metrics. 

    For the most reliable recommendations, we suggest that you (1) increase the granularity of metrics, and (2) look at maximum utilization rather than average or mean/median utilization. Going after maximum utilization is a slightly more conservative, but safer, decision.

  4. Perform rightsizing calculations
    To rightsize, compare the current usage of each metric against two baselines: one for your current instance type and another for the next smaller instance type. If your maximum usage is less than 80% of the smaller instance’s baseline (i.e., you’re using 20% less than what the smaller instance typically supports), it’s safe to consider downsizing to that smaller instance type.

Here are the metrics and formulas used to rightsize.

Here are the metrics and formulas used to rightsize.

Note that these recommendations hold as long as you are planning to rightsize within the same instance family. When rightsizing outside the instance family, you also need to consider CPU architecture, storage type, storage speed, and other factors.

More rightsizing tips and best practices

Here are more tips to keep in mind as you rightsize your EC2 instances.

Follow AWS recommendations for CPU and memory

AWS’s general rule for EC2 instances is that if your maximum CPU and memory usage is less than 40% over a four-week period, you can safely reduce capacity.

For compute-optimized instances, some best practices are to:

  • Focus on very recent instance data (such as the previous month), as old data may not be actionable
  • Focus on instances that have run for at least half the time you’re observing
  • Ignore burstable (T2) instance families, as they are designed to run at a low CPU % for extended periods

Turn off idle instances

Another easy way to reduce operational costs is to turn off instances that are no longer in use. The AWS guideline states that it’s safe to stop or terminate instances that have been idle for more than two weeks.

Some key considerations for terminating instances include (1) who owns it (2) what is the potential impact of terminating (3) how hard is it to recreate.

Another simple way to reduce costs is to stop instances used in development and production during hours when these instances are not in use. Assuming a 50-hour work week, you can save 70% by automatically stopping dev/test/production instances during non-business hours. Many tools are available for scheduling, including Amazon EC2 Scheduler, AWS Lambda, and AWS Data Pipeline. Or, third-party tools such as nOps use AI to learn your usage patterns and automate the process for you.

Monitor your resource usage over time

To achieve cost optimization, rightsizing becomes an ongoing process. Even if you rightsize workloads initially, changing performance and capacity requirements can result in underused or idle resources that drive unnecessary AWS costs.

As you monitor current performance, identify the following usage needs and patterns so that you can take advantage of potential rightsizing options:

Example workload, with hourly granularity
Example workload, with hourly granularity
  • Steady state –When workloads maintain consistent levels over time, and you can forecast compute needs accurately, Reserved Instances offer significant savings.
  • Variable, but predictable – If your load changes predictably, consider EC2 Auto Scaling to handle predictable fluctuations in traffic.
  • Dev/test/production – These can generally be turned off during non-business hours. (You’ll need to rely on tagging to identify dev/test/production instances.)
  • Temporary – For temporary workloads that have flexible start times and can be interrupted, consider using an Amazon EC2 Spot Instance.

Delegate your rightsizing to nOps for safe and effortless savings

nOps now integrates with your AWS-native CloudWatch, for effortless rightsizing savings. We now automatically analyze every EC2 instance in your environment for CloudWatch recommendations, which you can apply with one click on the platform.

For high-resolution recommendations, we also automatically ingest enhanced CloudWatch data from every instance with  CloudWatch Agents. Real-time coverage of resource-level insights such as memory, CPU, network bandwidth, volume size and more are fed into nOps’s state-of-the-art ML engine — for the most reliable rightsizing recommendations.

Rightsize with nOps for:

  • The best rightsizing recommendations available. nOps automatically collects and analyzes highly granular data, for 100% accurate and reliable rightsizing recommendations.
  • Significant time savings. nOps integrates with the two most popular AWS monitoring solutions (CloudWatch Agent or Datadog) and EventBridge. It automates the complex and time-consuming rightsizing process into a single click — freeing up engineers to build and innovate.
    One-click rightsizing, right from nOps using EventBridge
    One-click rightsizing, right from nOps using EventBridge
  • Up to 50% in immediate cost savings. When engineers don’t act on rightsizing recommendations, underutilized and idle resources continue to drive unnecessary AWS costs. nOps make it completely pain-free and effortless for engineers to actually act on recommendations and start saving.

How it works

  1. nOps integrates with your CloudWatch or CloudWatch Agent to collect all of the metrics needed for rightsizing recommendations, based on your last 15 days of usage. Our API queries your data every 24 hours.
  2. We quickly and efficiently process huge amounts of your CloudWatch data, crossed-referenced with AWS EC2 metadata and the latest AWS On-Demand pricing data. These three sources are combined and fed through a Rightsizing Engine, to accept or reject each individual EC2 instance based on its utilization (resulting in recommendations for only underutilized resources).
    Each of your instances is analyzed taking all relevant info into account, such as the metrics necessary for your particular operating system. For each instance in your environment, we make the following calculations:
    • Max Disk usage
    • Max Network usage
    • Max RAM utilization (depending on which version you have)
    • Max CPU utilization

      Our rightsizing algorithm compares maximum recorded usage against the capacity of a lower instance type, multiplied by a threshold value that accounts for potential future usage spikes.

  3. These rightsizing recommendations are then pushed to nOps microservices, which are responsible for showing recommendations from the nOps platform on the UI.

    The nOps dashboard shows your rightsizing savings
    The nOps dashboard shows your rightsizing savings
  4. Every 24 hours the process runs from top to bottom.

About nOps

nOps was recently ranked #1 in G2’s cloud cost management category.

Join our customers using nOps to cut cloud costs and leverage automation with complete confidence by booking a demo today!