All guides

Prompt caching 101

Reduce latency and cost with cache-safe prompt blocks.

Intermediate10 min readOct 10, 2024
LatencyCachingOptimization
Key takeaways
  • Cache stable prompt prefixes and templates.
  • Include tool schemas in cache keys.
  • Invalidate caches on prompt or policy changes.

Cacheable blocks

Split prompts into stable and variable blocks. Cache the stable portions.

Cache key design

Include model, prompt version, and tool schema hashes in cache keys.

Invalidate safely

Invalidate caches when prompts, tools, or policy rules change.