- Blog
- EKS Optimization
- Multidimensional Pod Autoscaling: How It Works
Multidimensional Pod Autoscaling: How It Works
If you run Kubernetes at scale, you’ve probably seen this: you turn on Vertical Pod Autoscaler (VPA) to rightsize requests and your utilization shoots up (great!) – then your Horizontal Pod Autoscaler (HPA) reads that higher utilization as “we’re under pressure” and scales out replicas (not great). Net effect: costs go up right when you were trying to save.
Today we’re launching Phase 1 of Multidimensional Pod Autoscaling (MPA) in nOps Container Rightsizing: automatic HPA threshold alignment that keeps horizontal scaling in sync with your dynamically rightsized containers. It works with your existing HPAs, requires no per-workload tinkering, and rolls back cleanly.
This feature is intended for platform engineers, SREs, and FinOps leaders who want VPA savings and predictable HPA behavior – without babysitting dozens of YAMLs.
What you’ll learn in this post
- Why VPA + HPA can fight each other (and how that burns money)
- How nOps now auto-aligns HPA thresholds with rightsized requests
- Exactly how it works (flow, YAML, and commands)
- DIY option for teams that prefer manual control
- Limitations, GitOps notes (ArgoCD), and what’s next for MPA
The Problem: VPA makes pods efficient; HPA misreads that as “scale out”
When you introduce a Vertical Pod Autoscaler (VPA), the goal is simple: improve usage and request efficiency by pushing utilization closer to 100% and lowering requests to better match real demand.
However, HPA uses utilization as a signal. If the HPA target is 50% and utilization suddenly jumps from 40% to 85% because VPA reduced requests, HPA interprets this as “over target → add replicas,” even if absolute CPU or memory usage hasn’t actually increased.
The Result: surprise replica growth, noisy oscillations, and higher costs right after a VPA rollout.
A few practical issues tend to surface when combining VPA and HPA in production. Real-world gotchas include:
- Multi-container pods (sidecars) inflate or dilute pod-level signals.
- Per-container HPAs (ContainerResource) behave differently than pod-level (Resource).
- GitOps controllers (e.g., ArgoCD autosync) revert any tuning you try to apply.
The nOps Approach: Multidimensional Pod Autoscaling, Phase 1
We extend our Container Rightsizing (VPA-based) engine to observe the same metric HPA uses and recompute HPA targets to match the new, lower requests.
Here’s the core idea:
New HPA Threshold (%) = ceil( Maximum Usage / Rightsized Request * 100 )
- Pod-level metrics (Resource): we sum the max usage and the rightsized requests across the containers HPA considers.
- Per-container metrics (ContainerResource): we compute per named container, only for that container’s usage/request.
When rightsizing updates requests, nOps automatically writes the corresponding HPA target (e.g., from 50% → 93%) so HPA only scales when true demand grows—not when you’ve simply gotten more efficient.
If you disable rightsizing, we restore the original HPA targets from recorded annotations.
What's New: HPA Threshold Alignment (CPU & Memory, Utilization target)
We’ve introduced HPA Threshold Alignment for CPU and memory utilization targets — a change designed to make VPAs and HPAs work together more intelligently.
The benefits include:
- Prevents cost inflation: avoids accidental scale-outs caused by efficient requests.
- Saves engineering time: no more patching HPAs across dozens of services.
- Improves stability: reduces ping-pong between VPA savings and HPA scale-outs.
Scope: CPU/Memory Utilization only. AverageValue, custom/external metrics (e.g., queue length), and non-resource signals are not adjusted.
How It Works
- Detect eligible HPAs. nOps finds HPAs that scale the workload on CPU or memory utilization (pod-level or per-container).
- Analyze usage vs. requests. For each workload (and container, if applicable), nOps compares historical max usage to the rightsized request.
- Recalculate the HPA threshold. New Target (%) = ceil(Max Usage / Rightsized Request * 100)
- Apply the target automatically. We update the HPA’s averageUtilization and store the original in annotations for rollback.
- Revert on disable. Turn off Container Rightsizing and nOps restores the original thresholds.
How To Turn It On (Helm)
helm upgrade --install nops-agent nops/nops-agent \
--namespace nops-system --create-namespace \
... \
--set vpa.recommender.extraArgs.enable-hpa-adjustment=true
Already using Container Rightsizing? This simply adds HPA alignment on top – no app changes required.
A quick look at HPA YAMLs
Pod-level Resource metric (CPU):
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50 # nOps may update this (e.g., -> 93)
Per-container ContainerResource metric (CPU):
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
metrics:
- type: ContainerResource
containerResource:
name: cpu
container: app
target:
type: Utilization
averageUtilization: 60 # nOps may update this for container "app"
DIY (Manual) Option
Prefer to manage thresholds yourself (or only for selected services)? Use the same formula:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
metrics:
- type: ContainerResource
containerResource:
name: cpu
container: app
target:
type: Utilization
averageUtilization: 60 # nOps may update this for container "app"
Patch a pod-level Resource metric:
kubectl patch hpa web-hpa -n default --type='json' \
-p='[
{"op":"replace","path":"/spec/metrics/0/resource/target/averageUtilization","value":180}
]'
Patch a per-container ContainerResource metric:
kubectl patch hpa web-hpa -n default --type='json' \
-p='[
{"op":"replace","path":"/spec/metrics/0/containerResource/target/averageUtilization","value":180}
]'
Adjust the /spec/metrics/<index> to match the metric you’re targeting. For memory, change name: cpu → memory.
Where to get inputs:
- Maximum Usage: Container Rightsizing details modal (red line, last 30 days).
- Rightsized Request: Container Rightsizing dashboard (recommended CPU/memory).
Real-world examples
API service (Deployment) with CPU HPA @ 50%
VPA lowers requests by ~2×; utilization jumps to 95%. Without alignment, HPA doubles replicas.
With nOps: HPA target recalculated to ~93%. Replicas remain stable. Cost drops; SLOs unchanged.
Pipeline worker with per-container HPA
App container is the signal; sidecar shouldn’t influence scaling.
With nOps: We adjust only the app container’s target (ContainerResource), preserving intended behavior.
Mixed Sidecars
Pod-level HPA sums signals across containers with non-zero requests.
nOps uses the same accounting so targets match what HPA truly sees.
GitOps with ArgoCD (autosync)
If HPAs are managed by ArgoCD autosync, Argo will revert nOps’ edits.
- Option 1: Disable autosync for HPAs you want nOps to manage.
- Option 2: Add ignoreDifferences for the target fields:
spec:
ignoreDifferences:
- group: autoscaling
kind: HorizontalPodAutoscaler
jsonPointers:
- /spec/metrics/0/resource/target/averageUtilization
- /spec/metrics/0/containerResource/target/averageUtilization
# add more indices if you have multiple metrics
Why this improves on “VPA-only” or “HPA-only”
Improvements include:
- Fewer surprise scale-outs: HPA responds to true demand changes, not just better efficiency.
- Predictable stability: Less oscillation = steadier replicas and latency.
- Cluster-wide savings: Lower requests → tighter bin-packing → fewer nodes → lower bill – without replica inflation.
Operational notes & best practices
Keep in mind:
- Supported signals: CPU/Memory Utilization targets only. AverageValue and custom/external metrics are out of scope for Phase 1.
- Restore anytime: We annotate original targets and can revert when you disable rightsizing.
- Multi-container math: For pod-level metrics, we compute against the same set HPA uses (containers with non-zero requests). For per-container metrics, we compute per the named container.
- GitOps guardrails: If a controller owns the HPA, configure it to tolerate the fields nOps touches.
Current limitations (Phase 1)
Here are the current limitations of Phase 1:
- No automatic alignment for AverageValue, custom, or external metrics.
- We don’t create or remove metrics – only adjust existing CPU/Memory Utilization targets.
- For extreme outliers, you may still want SLO-aware buffers (e.g., cap targets ≤ 200–300%).
What’s Next (towards full MPA)
In phase 2, nOps will suggest and (optionally) create/manage HPAs for workloads that don’t have one.
- Why: Many services are rightsized but never gain elastic savings.
- How: Detect eligible Deployments/StatefulSets → propose CPU/Memory Utilization targets using ceil(max_usage / rightsized_request * 100) → set sane min/max replicas and stabilization → keep targets aligned as requests evolve.
- Guardrails: Dry-run previews, GitOps-friendly YAML export, namespace allow-lists, rollback.
- Scope: CPU/Memory Utilization metrics (no custom metrics yet).
Together, Phases 1 and 2 evolve Compute Copilot into a Multidimensional Pod Autoscaler: rightsized containers, the right HPA targets, and the right signals – continuously aligned on autopilot.
The Bottom Line
You shouldn’t have to choose between VPA savings and HPA stability. With Multidimensional Pod Autoscaling (Phase 1), nOps keeps them in lockstep: rightsized requests and right-sized HPA targets—on autopilot.
How to Get Started
If you're already on nOps...
Enable HPA alignment in your nOps K8S Helm deployment:
helm upgrade --install nops-agent nops/nops-agent \
--namespace nops-system --create-namespace \
... \
--set vpa.recommender.extraArgs.enable-hpa-adjustment=true
Need help? Ping your CSM or visit the Help Center.
If you’re new to nOps…
nOps was recently ranked #1 with five stars in G2’s cloud cost management category, and we optimize $2+ billion in cloud spend for our customers.
Join our customers using nOps to understand your cloud costs and leverage automation with complete confidence by booking a demo with one of our AWS experts.
