Anthropic Claude in production: prompt-cache patterns for app analytics
Anthropic's prompt caching is now a year old in production. The patterns that work for chat features and the patterns that work for analytics features are different. Here's what indie devs running both have learned.
Prompt caching — sending the same context once and getting a 90% discount on every subsequent read of it — is the single biggest cost lever Anthropic shipped in the past 18 months. Indie iOS apps integrating Claude have been running it in production for a year now. Two distinct patterns have emerged for two distinct workloads.
Pattern A — Chat features
For chat-style features (assistants, support bots, conversational search), the cache target is the system prompt + tool definitions. These don't change between user turns. Wrap them in a cache breakpoint at the top of the prompt; everything before the breakpoint reads at 10% of the input-token cost. Cache hit rates of 85–95% are routine.
The mistake here is caching only the system prompt and not the tool schemas. Tool definitions are often longer than the prompt itself — caching them is where the savings actually compound.
Pattern B — Analytics features
For analytics features (sentiment scoring on reviews, summary generation on sales reports, classifying user behavior), the workload is different: each call has unique user-data, but the analyzing-instructions are constant. Cache target shifts to the instructions block + any few-shot examples. User data goes after the cache breakpoint.
Cache hit rates here are lower (~70%) because the cache invalidates more often as you tune the analyzing prompt. The fix is to lock the analyzing block early in the project and only iterate on output parsing afterward.
The cross-cutting lesson
Both patterns share one rule: put what's STABLE first in the prompt, what's VARIABLE last. The cache breakpoint should sit between the two. Most indie devs start with their prompts structured the other way around — variable user data at the top — and have to refactor to get caching working.
What this means for app builders
If you're using Claude in production and your cache hit rate is below 70%, prompt structure is the first thing to audit. Per Anthropic's docs, hit rates above 85% are routine when the prompt is structured stable-first. The math: 90% discount on 85% of input tokens is a 76% input-cost reduction without changing model or vendor.
Share this