Skip to main content
Glama

SlimContext MCP Server

A Model Context Protocol (MCP) server that wraps the SlimContext library, providing AI chat history compression tools for MCP-compatible clients.

Overview

SlimContext MCP Server exposes two powerful compression strategies as MCP tools:

  1. trim_messages - Token-based compression that removes oldest messages when exceeding token thresholds

  2. summarize_messages - AI-powered compression using OpenAI to create concise summaries

Installation

npm install -g slimcontext-mcp-server # or pnpm add -g slimcontext-mcp-server

Development

# Clone and setup git clone <repository> cd slimcontext-mcp-server pnpm install # Build pnpm build # Run in development pnpm dev # Type checking pnpm typecheck

Configuration

MCP Client Setup

Add to your MCP client configuration:

{ "mcpServers": { "slimcontext": { "command": "npx", "args": ["-y", "slimcontext-mcp-server"] } } }

Environment Variables

  • OPENAI_API_KEY: OpenAI API key for summarization (optional, can be passed as tool parameter)

Tools

trim_messages

Compresses chat history using token-based trimming strategy.

Parameters:

  • messages (required): Array of chat messages

  • maxModelTokens (optional): Maximum model token context window (default: 8192)

  • thresholdPercent (optional): Percentage threshold to trigger compression 0-1 (default: 0.7)

  • minRecentMessages (optional): Minimum recent messages to preserve (default: 2)

Example:

{ "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Hello!" }, { "role": "assistant", "content": "Hi there! How can I help you today?" }, { "role": "user", "content": "Tell me about AI." } ], "maxModelTokens": 4000, "thresholdPercent": 0.8, "minRecentMessages": 2 }

Response:

{ "success": true, "original_message_count": 4, "compressed_message_count": 3, "messages_removed": 1, "compression_ratio": 0.75, "compressed_messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "assistant", "content": "Hi there! How can I help you today?" }, { "role": "user", "content": "Tell me about AI." } ] }

summarize_messages

Compresses chat history using AI-powered summarization strategy.

Parameters:

  • messages (required): Array of chat messages

  • maxModelTokens (optional): Maximum model token context window (default: 8192)

  • thresholdPercent (optional): Percentage threshold to trigger compression 0-1 (default: 0.7)

  • minRecentMessages (optional): Minimum recent messages to preserve (default: 4)

  • openaiApiKey (optional): OpenAI API key (can also use OPENAI_API_KEY env var)

  • openaiModel (optional): OpenAI model for summarization (default: 'gpt-4o-mini')

  • customPrompt (optional): Custom summarization prompt

Example:

{ "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "I want to build a web scraper." }, { "role": "assistant", "content": "I can help you build a web scraper! What programming language would you prefer?" }, { "role": "user", "content": "Python please." }, { "role": "assistant", "content": "Great choice! For Python web scraping, I recommend using requests and BeautifulSoup..." }, { "role": "user", "content": "Can you show me a simple example?" } ], "maxModelTokens": 4000, "thresholdPercent": 0.6, "minRecentMessages": 2, "openaiModel": "gpt-4o-mini" }

Response:

{ "success": true, "original_message_count": 6, "compressed_message_count": 4, "messages_removed": 2, "summary_generated": true, "compression_ratio": 0.67, "compressed_messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "system", "content": "The user expressed interest in building a web scraper and requested help with Python. The assistant recommended using requests and BeautifulSoup libraries for Python web scraping." }, { "role": "assistant", "content": "Great choice! For Python web scraping, I recommend using requests and BeautifulSoup..." }, { "role": "user", "content": "Can you show me a simple example?" } ] }

Message Format

Both tools expect messages in SlimContext format:

interface SlimContextMessage { role: 'system' | 'user' | 'assistant' | 'tool' | 'human'; content: string; }

Error Handling

All tools return structured error responses:

{ "success": false, "error": "Error message description", "error_type": "SlimContextError" | "OpenAIError" | "UnknownError" }

Common error scenarios:

  • Missing OpenAI API key for summarization

  • Invalid message format

  • OpenAI API rate limits or errors

  • Invalid parameter values

Token Estimation

SlimContext uses a simple heuristic for token estimation: Math.ceil(content.length / 4) + 2. This provides a reasonable approximation for most use cases. For more accurate token counting, you would need to implement a custom token estimator in your client application.

Compression Strategies

Trimming Strategy

  • Preserves all system messages

  • Preserves the most recent N messages

  • Removes oldest non-system messages until under token threshold

  • Fast and deterministic

  • No external API dependencies

Summarization Strategy

  • Preserves all system messages

  • Preserves the most recent N messages

  • Summarizes middle portion of conversation using AI

  • Creates contextually rich summaries

  • Requires OpenAI API access

License

MIT

Contributing

  1. Fork the repository

  2. Create a feature branch

  3. Make your changes

  4. Add tests for new functionality

  5. Submit a pull request

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/agentailor/slimcontext-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server