Learn how to use prompt caching with Anthropic models and Agno.
Prompt caching can help reducing processing time and costs. Consider it if you are using the same prompt multiple times in any flow.You can read more about prompt caching with Anthropic models here.
You can also use Anthropic’s extended cache beta feature. This updates the cache duration from 5 minutes to 1 hour. To activate it, pass the extended_cache_time argument and the following beta header:
from pathlib import Pathfrom agno.agent import Agentfrom agno.models.anthropic import Claudefrom agno.utils.media import download_file# Load an example large system message from S3. A large prompt like this would benefit from caching.txt_path = Path(__file__).parent.joinpath("system_promt.txt")download_file( "https://agno-public.s3.amazonaws.com/prompts/system_promt.txt", str(txt_path),)system_message = txt_path.read_text()agent = Agent( model=Claude( id="claude-sonnet-4-20250514", default_headers={"anthropic-beta": "extended-cache-ttl-2025-04-11"}, # Set the beta header to use the extended cache time system_prompt=system_message, cache_system_prompt=True, # Activate prompt caching for Anthropic to cache the system prompt extended_cache_time=True, # Extend the cache time from the default to 1 hour ), system_message=system_message, markdown=True,)# First run - this will create the cacheresponse = agent.run( "Explain the difference between REST and GraphQL APIs with examples")print(f"First run cache write tokens = {response.metrics['cache_write_tokens']}") # type: ignore# Second run - this will use the cached system promptresponse = agent.run( "What are the key principles of clean code and how do I apply them in Python?")print(f"Second run cache read tokens = {response.metrics['cached_tokens']}") # type: ignore