All news
AI May 4, 2026 · 2 min read

Anthropic Claude in production: prompt-cache patterns for app analytics

Anthropic's prompt caching is now a year old in production. The patterns that work for chat features and the patterns that work for analytics features are different. Here's what indie devs running both have learned.

By the AppsOps news desk ·

Prompt caching — sending the same context once and getting a 90% discount on every subsequent read of it — is the single biggest cost lever Anthropic shipped in the past 18 months. Indie iOS apps integrating Claude have been running it in production for a year now. Two distinct patterns have emerged for two distinct workloads.

Pattern A — Chat features

For chat-style features (assistants, support bots, conversational search), the cache target is the system prompt + tool definitions. These don't change between user turns. Wrap them in a cache breakpoint at the top of the prompt; everything before the breakpoint reads at 10% of the input-token cost. Cache hit rates of 85–95% are routine.

The mistake here is caching only the system prompt and not the tool schemas. Tool definitions are often longer than the prompt itself — caching them is where the savings actually compound.

Pattern B — Analytics features

For analytics features (sentiment scoring on reviews, summary generation on sales reports, classifying user behavior), the workload is different: each call has unique user-data, but the analyzing-instructions are constant. Cache target shifts to the instructions block + any few-shot examples. User data goes after the cache breakpoint.

Cache hit rates here are lower (~70%) because the cache invalidates more often as you tune the analyzing prompt. The fix is to lock the analyzing block early in the project and only iterate on output parsing afterward.

The cross-cutting lesson

Both patterns share one rule: put what's STABLE first in the prompt, what's VARIABLE last. The cache breakpoint should sit between the two. Most indie devs start with their prompts structured the other way around — variable user data at the top — and have to refactor to get caching working.

What this means for app builders

If you're using Claude in production and your cache hit rate is below 70%, prompt structure is the first thing to audit. Per Anthropic's docs, hit rates above 85% are routine when the prompt is structured stable-first. The math: 90% discount on 85% of input tokens is a 76% input-cost reduction without changing model or vendor.

Share this

Related news

Read & learn. Then ship.

Tech news is interesting. AppsOps actually ships the App Store work — PPP-fair pricing for 175 App Store territories, AI metadata in 39 languages, AI screenshot localization, price A/B experiments. $19/mo, 14-day free trial.

Try AppsOps free — no card