Introducing: AI Recommendations – Amazon Bedrock Claude Long Context Savings
Introducing: AI Recommendations – Amazon Bedrock Claude Long Context Savings
Long Context (LCtx) usage in Amazon Bedrock Claude models can silently inflate cost when sessions grow too large, or when applications unintentionally keep sessions open longer than necessary (200k + tokens). Many teams lack visibility into when LCtx is activated, how much it costs, and—most importantly—how to prevent unnecessary spend.
As LLM adoption scales, understanding and optimizing contextual token handling has become a critical challenge for enterprises. Without proactive detection, organizations overpay for LCtx input/output tokens that could often be avoided with simple workflow adjustments.
We built this feature to help teams stay in control of their generative AI costs—by automatically detecting LCtx usage, quantifying the financial impact, and surfacing clear operational recommendations to eliminate waste.
What's New
Long-Context Recommendations is the latest addition to Inform Cost Savings Recommendations. This feature helps your organization identify when Claude Sonnet 4/4.5 sessions enter the expensive LCtx mode—and provides actionable guidance to reduce those costs immediately.
Starting today, you can access this feature directly inside Inform Cost Savings Recommendations.
Amazon Bedrock Long Context Usage Detection
Long Context Usage Detection automatically scans Bedrock usage and identifies any requests that invoke CacheWriteInputTokenCount_LCtx, CacheReadInputTokenCount_LCtx, InputTokenCount_LCtx, or OutputTokenCount_LCtx.
This gives teams clear visibility into hidden LCtx expenses by processing billing line items and quantifying token activity at scale.
Now you can detect LCtx activation:
- Identify which workloads triggered LCtx
- See affected resource IDs, and token counts
- Quantify how many tokens were billed in LCtx mode
Who Benefits Most
| Platform Engineering Teams | FinOps & Cloud Cost Managers | Application & ML Engineers |
|
|
|
How It Works
1. We ingest Bedrock billing and usage.
2. We detect any usage types containing Long Context, and gather the total tokens used.
3. We calculate both actual and potential cost using optimized multipliers (e.g., 50% reduction for Input/Cache, 1.5x normalization for Output).
4. We generate recommendations whenever cost improvements exceed the threshold—typically suggesting:
- “Summarize the session”
- “Create a new session to eliminate LCtx accumulation“
5. Your results and insights are available directly in your nOps Cost Saving Recommendations page
This gives teams full transparency into LCtx behavior—without needing to manually inspect logs, tokens, or session patterns.
How to Get Started
To start using Amazon Bedrock Long Context Usage Detection, navigate to Inform → Cost Saving Recommendations → AI Category.
If you're already on nOps...
Have questions about the new feature? Need help getting started? Our dedicated support team is here for you. Simply reach out to your Customer Success Manager or visit our Help Center. If you’re not sure who your CSM is, send our Support Team a message.
If you’re new to nOps…
nOps was recently ranked #1 with five stars in G2’s cloud cost management category, and we optimize $2+ billion in cloud spend for our customers.
Join our customers using nOps to understand your cloud costs and leverage automation with complete confidence by booking a demo with one of our AWS experts.
