Glama supports prompt caching to optimize costs across supported providers and models. Here's how it works with different providers:
Cache effectiveness can be monitored via the analytics and logs dashboards or the /gateway/v1/completion-requests/:id API.
cache_control breakpoints{
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "LARGE_TEXT_CONTENT",
"cache_control": { "type": "ephemeral" }
},
{ "type": "text", "text": "Your question here" }
]
}
]
}Note: Cache pricing and features are subject to change. Check our API documentation for the most current information.