README.md•9.77 kB
# MCP Prompt Tester
A simple MCP server that allows agents to test LLM prompts with different providers.
## Features
- Test prompts with OpenAI and Anthropic models
- Configure system prompts, user prompts, and other parameters
- Get formatted responses or error messages
- Easy environment setup with .env file support
## Installation
```bash
# Install with pip
pip install -e .
# Or with uv
uv install -e .
```
## API Key Setup
The server requires API keys for the providers you want to use. You can set these up in two ways:
### Option 1: Environment Variables
Set the following environment variables:
- `OPENAI_API_KEY` - Your OpenAI API key
- `ANTHROPIC_API_KEY` - Your Anthropic API key
### Option 2: .env File (Recommended)
1. Create a file named `.env` in your project directory or home directory
2. Add your API keys in the following format:
```
OPENAI_API_KEY=your-openai-api-key-here
ANTHROPIC_API_KEY=your-anthropic-api-key-here
```
3. The server will automatically detect and load these keys
For convenience, a sample template is included as `.env.example`.
## Usage
Start the server using stdio (default) or SSE transport:
```bash
# Using stdio transport (default)
prompt-tester
# Using SSE transport on custom port
prompt-tester --transport sse --port 8000
```
### Available Tools
The server exposes the following tools for MCP-empowered agents:
#### 1. list_providers
Retrieves available LLM providers and their default models.
**Parameters:**
- None required
**Example Response:**
```json
{
"providers": {
"openai": [
{
"type": "gpt-4",
"name": "gpt-4",
"input_cost": 0.03,
"output_cost": 0.06,
"description": "Most capable GPT-4 model"
},
// ... other models ...
],
"anthropic": [
// ... models ...
]
}
}
```
#### 2. test_comparison
Compares multiple prompts side-by-side, allowing you to test different providers, models, and parameters simultaneously.
**Parameters:**
- `comparisons` (array): A list of 1-4 comparison configurations, each containing:
- `provider` (string): The LLM provider to use ("openai" or "anthropic")
- `model` (string): The model name
- `system_prompt` (string): The system prompt (instructions for the model)
- `user_prompt` (string): The user's message/prompt
- `temperature` (number, optional): Controls randomness
- `max_tokens` (integer, optional): Maximum number of tokens to generate
- `top_p` (number, optional): Controls diversity via nucleus sampling
**Example Usage:**
```json
{
"comparisons": [
{
"provider": "openai",
"model": "gpt-4",
"system_prompt": "You are a helpful assistant.",
"user_prompt": "Explain quantum computing in simple terms.",
"temperature": 0.7
},
{
"provider": "anthropic",
"model": "claude-3-opus-20240229",
"system_prompt": "You are a helpful assistant.",
"user_prompt": "Explain quantum computing in simple terms.",
"temperature": 0.7
}
]
}
```
#### 3. test_multiturn_conversation
Manages multi-turn conversations with LLM providers, allowing you to create and maintain stateful conversations.
**Modes:**
- `start`: Begins a new conversation
- `continue`: Continues an existing conversation
- `get`: Retrieves conversation history
- `list`: Lists all active conversations
- `close`: Closes a conversation
**Parameters:**
- `mode` (string): Operation mode ("start", "continue", "get", "list", or "close")
- `conversation_id` (string): Unique ID for the conversation (required for continue, get, close modes)
- `provider` (string): The LLM provider (required for start mode)
- `model` (string): The model name (required for start mode)
- `system_prompt` (string): The system prompt (required for start mode)
- `user_prompt` (string): The user message (used in start and continue modes)
- `temperature` (number, optional): Temperature parameter for the model
- `max_tokens` (integer, optional): Maximum tokens to generate
- `top_p` (number, optional): Top-p sampling parameter
**Example Usage (Starting a Conversation):**
```json
{
"mode": "start",
"provider": "openai",
"model": "gpt-4",
"system_prompt": "You are a helpful assistant specializing in physics.",
"user_prompt": "Can you explain what dark matter is?"
}
```
**Example Usage (Continuing a Conversation):**
```json
{
"mode": "continue",
"conversation_id": "conv_12345",
"user_prompt": "How does that relate to dark energy?"
}
```
## Example Usage for Agents
Using the MCP client, an agent can use the tools like this:
```python
import asyncio
import json
from mcp.client.session import ClientSession
from mcp.client.stdio import StdioServerParameters, stdio_client
async def main():
async with stdio_client(
StdioServerParameters(command="prompt-tester")
) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
# 1. List available providers and models
providers_result = await session.call_tool("list_providers", {})
print("Available providers and models:", providers_result)
# 2. Run a basic test with a single model and prompt
comparison_result = await session.call_tool("test_comparison", {
"comparisons": [
{
"provider": "openai",
"model": "gpt-4",
"system_prompt": "You are a helpful assistant.",
"user_prompt": "Explain quantum computing in simple terms.",
"temperature": 0.7,
"max_tokens": 500
}
]
})
print("Single model test result:", comparison_result)
# 3. Compare multiple prompts/models side by side
comparison_result = await session.call_tool("test_comparison", {
"comparisons": [
{
"provider": "openai",
"model": "gpt-4",
"system_prompt": "You are a helpful assistant.",
"user_prompt": "Explain quantum computing in simple terms.",
"temperature": 0.7
},
{
"provider": "anthropic",
"model": "claude-3-opus-20240229",
"system_prompt": "You are a helpful assistant.",
"user_prompt": "Explain quantum computing in simple terms.",
"temperature": 0.7
}
]
})
print("Comparison result:", comparison_result)
# 4. Start a multi-turn conversation
conversation_start = await session.call_tool("test_multiturn_conversation", {
"mode": "start",
"provider": "openai",
"model": "gpt-4",
"system_prompt": "You are a helpful assistant specializing in physics.",
"user_prompt": "Can you explain what dark matter is?"
})
print("Conversation started:", conversation_start)
# Get the conversation ID from the response
response_data = json.loads(conversation_start.text)
conversation_id = response_data.get("conversation_id")
# Continue the conversation
if conversation_id:
conversation_continue = await session.call_tool("test_multiturn_conversation", {
"mode": "continue",
"conversation_id": conversation_id,
"user_prompt": "How does that relate to dark energy?"
})
print("Conversation continued:", conversation_continue)
# Get the conversation history
conversation_history = await session.call_tool("test_multiturn_conversation", {
"mode": "get",
"conversation_id": conversation_id
})
print("Conversation history:", conversation_history)
asyncio.run(main())
```
## MCP Agent Integration
For MCP-empowered agents, integration is straightforward. When your agent needs to test LLM prompts:
1. **Discovery**: The agent can use `list_providers` to discover available models and their capabilities
2. **Simple Testing**: For quick tests, use the `test_comparison` tool with a single configuration
3. **Comparison**: When the agent needs to evaluate different prompts or models, it can use `test_comparison` with multiple configurations
4. **Stateful Interactions**: For multi-turn conversations, the agent can manage a conversation using the `test_multiturn_conversation` tool
This allows agents to:
- Test prompt variants to find the most effective phrasing
- Compare different models for specific tasks
- Maintain context in multi-turn conversations
- Optimize parameters like temperature and max_tokens
- Track token usage and costs during development
## Configuration
You can set API keys and optional tracing configurations using environment variables:
### Required API Keys
- `OPENAI_API_KEY` - Your OpenAI API key
- `ANTHROPIC_API_KEY` - Your Anthropic API key
### Optional Langfuse Tracing
The server supports Langfuse for tracing and observability of LLM calls. These settings are optional:
- `LANGFUSE_SECRET_KEY` - Your Langfuse secret key
- `LANGFUSE_PUBLIC_KEY` - Your Langfuse public key
- `LANGFUSE_HOST` - URL of your Langfuse instance
If you don't want to use Langfuse tracing, simply leave these settings empty.