← Docs
Prompt caching
Anthropic's prompt caching cuts cost 50–90% on repeated system prompts or long contexts. SeaLink passes the cache_control header through and bills based on actual cache hits.
When to enable caching
- System prompt > ~5KB called repeatedly (chatbots, agent loops)
- RAG with a fixed retrieval prefix + varying questions
- Code assistant pinning a whole file as context
- Long-doc Q&A: many questions over same doc
Example
Python
from openai import OpenAIclient = OpenAI(base_url="https://api.sealink.asia/v1",api_key="<your-sealink-key>",)# Long, reusable system prompt — cached on the first call.SYSTEM = open("knowledge_base.md").read() # imagine 50KBresp = client.chat.completions.create(model="claude-sonnet-4-6",messages=[{"role": "system","content": SYSTEM,# SeaLink passes this hint through to Anthropic / OpenAI."cache_control": {"type": "ephemeral"},},{"role": "user", "content": "Question 1"},],)# Subsequent calls within ~5 minutes pay ~10% of the cached prefix cost.print(resp.usage.prompt_tokens_details)# {"cached_tokens": 12500}
Models that support caching
- claude-sonnet-4-6 · claude-haiku-4-5 · claude-opus-4-7(5-min TTL; hit price ~10%)
- gpt-4o · gpt-4o-mini · o3-mini (automatic, no hint needed)
Real savings example
Customer-bot system 12.5K tokens × 10K calls/month. No cache: 12.5 × 10K × $3/1M = $375. With cache (90% hit): one write + 9999 × 10% price = ~$56. Saves $319/month.
Seeing cache hits in your console
SeaLink's usage page records cached_tokens per call. Dashboard has a daily cache hit rate chart (live in v1).