One of the significant advantages of running workloads on the cloud, and AWS in particular, is the ability to elastically scale workloads up and down to pay for what you use. When it comes time to pay for that EC2 compute you use in AWS, there are numerous ways to structure your bill. 

On-demand instances ensure stable and uninterrupted workloads, making them reliable for users prioritizing consistency over cost savings. These, in turn, can be paid for at the on-demand price, with long-term commitments in the form of Savings Plans or Reserved Instances.

On the other hand, spot instances provide significant discounts and don’t require any commitments but come with the risk of abrupt termination. These instances are priced based on supply and demand dynamics, allowing users to take advantage of cost benefits when available. However, spot instances are subject to interruptions, and AWS’s termination signal is only with a 2-minute countdown.

As an aside, it’s important to note that in 2017 AWS made significant changes to the pricing model of the spot market AWS, which in effect, means that the spot market is not a true spot market anymore. In most cases in AWS, when you ask for a spot instance, your max bid is set to the equivalent on-demand price, and the spot price is not as elastic as it used to be. For a deeper dive, you can explore this AWS Blog post that they made at the time. 

That said, spot instances can still offer tremendous cost advantages as far as cost, with the potential for workload interruption. Therefore if you want to use spot instances, you have to weigh the benefits against the risks when considering them.

Fortunately, there is a solution that allows users to mitigate these challenges and enjoy the benefits of both cost savings and workload stability. This blog will cover AWS’s mechanism for spot termination, the challenges, and how we can get ahead. Read through!

Understanding Spot Instances and Their Termination Risks!

Spot Instances are designed to provide users with significantly reduced costs compared to traditional on-demand instances. However, the trade-off for the lower cost is the possibility of termination at any moment by AWS. 

Since Spot Instances are allocated from unused resources, the cloud provider can reclaim them whenever they are needed for higher-priority tasks or on-demand instances. This sudden termination can result in the loss of data, interruption of ongoing computations or processes, and potential negative impacts on system availability and performance.

How to get ahead of spot termination risks?

How to get ahead of spot termination risks?

To mitigate the risks associated with Spot Instances, it is beneficial to have advanced knowledge of potential interruptions. AWS provides data points, such as the AWS Spot placement score and Interruption frequency score, that can assist in making informed decisions and optimizing workload allocation.

  • The AWS Spot placement score is a data point provided by Amazon Web Services (AWS) that indicates the probability of Spot Instances being interrupted. It takes into account factors like historical availability and demand for Spot Instances in a particular Availability Zone (AZ). A higher placement score suggests a lower likelihood of interruption, making it favorable for workload allocation.
  • The interruption frequency score is another valuable data point provided by AWS. It represents the historical frequency of Spot Instance interruptions in a specific instance type and AZ. This score helps users understand the stability and reliability of Spot Instances in a given configuration. A lower interruption frequency score indicates a more reliable instance type for workload allocation.

For instance, if a user receives a Spot placement score of 9 (out of 10) and an interruption frequency score of 0.01 (indicating a low interruption frequency), it suggests a highly reliable instance type in a stable AZ. This knowledge enables users to confidently allocate their workloads to these instances, knowing that the risk of sudden termination is significantly reduced.

However, one of the major challenges is the sheer number of possible combinations to analyze. With over 500 instance types and multiple availability zones, the task of evaluating each combination can be daunting and time-consuming. This level of analysis becomes impractical, especially for users managing large-scale deployments or complex architectures.

Additionally, analyzing the latest data is also to be accompanied by historical data. However, incorporating historical data into the analysis further adds complexity. It requires aggregating and processing a considerable volume of data, which can be challenging and resource-intensive.

How Can nOps Karpenter Solution (nKS) Help In Spot Termination Predictions?

How can nOps Karpenter solution (nKS) help in spot termination predictions?

nOps Karpenter Solution (nKS) is a powerful solution that takes into account the spot predictions with a more efficient approach.

  • nKS leverages AWS Spot placement scores, interruption frequency scores, and additional information, such as how often the spot instance was terminated in every availability zone id in the last hour, last day, and last week. 
  • Using the ML-powered termination detection model, nKS predicts the likelihood of a Spot Instance being terminated within the next hour. This prediction provides valuable lead time for users to shift their workloads and select alternative instance types that are expected to remain active for longer. 
  • nKS gathers the latest available data and performs the termination predictions constantly. This real-time information ensures accuracy and enables users to make timely adjustments based on the most up-to-date insights.
  • nKS incorporates an early detection algorithm identifying the safest instance types to launch and run Spot Instances on. 

Explore more about nOps Karpenter Solution (nKS) here!