Skip to main content

Overview API OAuth PKCE Prompt Caching Request Labeling Request Logging

Prompt Caching

Glama supports prompt caching to optimize costs across supported providers and models. Here's how it works with different providers:

Monitoring

Cache effectiveness can be monitored via the analytics and logs dashboards or the /gateway/v1/completion-requests/:id API.

Provider-Specific Details

OpenAI

Automatic caching with no configuration needed
No cost for cache writes
Cache reads: 50% of original input price
Minimum prompt size: 1024 tokens

Additional Resources

https://platform.openai.com/docs/guides/prompt-caching

Anthropic Claude

Requires cache_control breakpoints
Cache writes: 125% of original input price
Cache reads: 10% of original input price
Maximum 4 breakpoints with 5-minute expiration
Best for large text bodies (character cards, RAG data, etc.)

Example Cache Control Usage:

{
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "LARGE_TEXT_CONTENT",
          "cache_control": { "type": "ephemeral" }
        },
        { "type": "text", "text": "Your question here" }
      ]
    }
  ]
}

Additional Resources

https://www.anthropic.com/news/prompt-caching

DeepSeek

Automatic caching with no configuration needed
Cache writes: Same as original input price
Cache reads: 10% of original input price

Additional Resources

https://api-docs.deepseek.com/guides/kv_cache

Note: Cache pricing and features are subject to change. Check our API documentation for the most current information.