Introducing: AI Recommendations – Amazon Bedrock Claude Long Context Savings

Long Context (LCtx) usage in Amazon Bedrock Claude models can silently inflate cost when sessions grow too large, or when applications unintentionally keep sessions open longer than necessary (200k + tokens). Many teams lack visibility into when LCtx is activated, how much it costs, and—most importantly—how to prevent unnecessary spend.

As LLM adoption scales, understanding and optimizing contextual token handling has become a critical challenge for enterprises. Without proactive detection, organizations overpay for LCtx input/output tokens that could often be avoided with simple workflow adjustments.

We built this feature to help teams stay in control of their generative AI costs—by automatically detecting LCtx usage, quantifying the financial impact, and surfacing clear operational recommendations to eliminate waste.

What's New

Long-Context Recommendations is the latest addition to Inform Cost Savings Recommendations. This feature helps your organization identify when Claude Sonnet 4/4.5 sessions enter the expensive LCtx mode—and provides actionable guidance to reduce those costs immediately.

Starting today, you can access this feature directly inside Inform Cost Savings Recommendations.

Amazon Bedrock Long Context Usage Detection

Long Context Usage Detection automatically scans Bedrock usage and identifies any requests that invoke CacheWriteInputTokenCount_LCtx, CacheReadInputTokenCount_LCtx, InputTokenCount_LCtx, or OutputTokenCount_LCtx.

This gives teams clear visibility into hidden LCtx expenses by processing billing line items and quantifying token activity at scale.

Now you can detect LCtx activation:

Identify which workloads triggered LCtx
See affected resource IDs, and token counts
Quantify how many tokens were billed in LCtx mode

Who Benefits Most

Platform Engineering Teams	FinOps & Cloud Cost Managers	Application & ML Engineers
Gain visibility into hidden LLM costs Ensure application patterns don’t misuse context Enforce best practices across engineering orgs	Detect high-cost LCtx usage with zero manual analysis Quantify waste and cost-avoidance opportunities Provide accurate showback for AI spend by team or application	Understand when prompts or conversations overflow recommended lengths Receive guidance on session summarization and reset strategies Prevent runaway LLM context growth in production systems

How It Works

1. We ingest Bedrock billing and usage.

2. We detect any usage types containing Long Context, and gather the total tokens used.

3. We calculate both actual and potential cost using optimized multipliers (e.g., 50% reduction for Input/Cache, 1.5x normalization for Output).

4. We generate recommendations whenever cost improvements exceed the threshold—typically suggesting:

“Summarize the session”
“Create a new session to eliminate LCtx accumulation“

5. Your results and insights are available directly in your nOps Cost Saving Recommendations page

This gives teams full transparency into LCtx behavior—without needing to manually inspect logs, tokens, or session patterns.

How to Get Started

To start using Amazon Bedrock Long Context Usage Detection, navigate to Inform → Cost Saving Recommendations → AI Category.

If you're already on nOps...

Have questions about the new feature? Need help getting started? Our dedicated support team is here for you. Simply reach out to your Customer Success Manager or visit our Help Center. If you’re not sure who your CSM is, send our Support Team a message.

If you’re new to nOps…

nOps was recently ranked #1 with five stars in G2’s cloud cost management category, and we optimize $2+ billion in cloud spend for our customers.

Join our customers using nOps to understand your cloud costs and leverage automation with complete confidence by booking a demo with one of our AWS experts.

Table of Contents

Introducing: AI Recommendations – Amazon Bedrock Claude Long Context Savings