π Shepherd MCP
MCP (Model Context Protocol) server for Shepherd - Debug your AI agents like you debug your code.
This MCP server allows AI assistants (Claude, Cursor, etc.) to query and analyze your AI agent sessions from multiple observability providers.
Supported Providers
AIOBS (Shepherd backend) - Native Shepherd observability
Langfuse - Open-source LLM observability platform
Installation
Or run directly with uvx:
Configuration
Environment Variables
AIOBS (Shepherd)
AIOBS_API_KEY(required) - Your Shepherd API keyAIOBS_ENDPOINT(optional) - Custom API endpoint URL
Langfuse
LANGFUSE_PUBLIC_KEY(required) - Your Langfuse public API keyLANGFUSE_SECRET_KEY(required) - Your Langfuse secret API keyLANGFUSE_HOST(optional) - Custom Langfuse host URL (defaults to cloud.langfuse.com)
.env File Support
shepherd-mcp automatically loads .env files from the current directory or any parent directory. This means if you have a .env file in your project root:
It will be automatically loaded when the MCP server starts.
Claude Desktop
Add to your claude_desktop_config.json:
Cursor
Add to your .cursor/mcp.json:
Or if installed via pip:
Available Tools
AIOBS (Shepherd) Tools
aiobs_list_sessions
List all AI agent sessions from Shepherd.
Parameters:
limit(optional): Maximum number of sessions to return
Example prompt:
"List my recent AI agent sessions from AIOBS"
aiobs_get_session
Get detailed information about a specific session including the full trace tree, LLM calls, function events, and evaluations.
Parameters:
session_id(required): The UUID of the session to retrieve
Example prompt:
"Get AIOBS session details for abc123-def456"
aiobs_search_sessions
Search and filter sessions with multiple criteria.
Parameters:
query(optional): Text search (matches name, ID, labels, metadata)labels(optional): Filter by labels as key-value pairsprovider(optional): Filter by LLM provider (e.g., 'openai', 'anthropic')model(optional): Filter by model name (e.g., 'gpt-4o-mini', 'claude-3')function(optional): Filter by function nameafter(optional): Sessions started after date (YYYY-MM-DD)before(optional): Sessions started before date (YYYY-MM-DD)has_errors(optional): Only return sessions with errorsevals_failed(optional): Only return sessions with failed evaluationslimit(optional): Maximum number of sessions to return
Example prompts:
"Find all AIOBS sessions that used OpenAI with errors" "Search for sessions from yesterday that failed evaluations"
aiobs_diff_sessions
Compare two sessions and show their differences including:
Metadata: Duration, labels, timestamps
LLM calls: Count, tokens (input/output/total), average latency, errors
Provider/Model distribution: Which providers and models were used
Function events: Total calls, unique functions, function-specific counts
Trace structure: Trace depth, root nodes
Evaluations: Pass/fail counts and rates
System prompts: Compare system prompts across sessions
Request parameters: Temperature, max_tokens, tools used
Response content: Content length, tool calls, stop reasons
Parameters:
session_id_1(required): First session UUID to comparesession_id_2(required): Second session UUID to compare
Example prompt:
"Compare AIOBS sessions abc123 and def456"
Langfuse Tools
langfuse_list_traces
List traces with pagination and filters. Traces represent complete workflows or conversations.
Parameters:
limit(optional): Maximum results per page (default: 50)page(optional): Page number (1-indexed)user_id(optional): Filter by user IDname(optional): Filter by trace namesession_id(optional): Filter by session IDtags(optional): Filter by tagsfrom_timestamp(optional): Filter after timestampto_timestamp(optional): Filter before timestamp
Example prompt:
"List the last 20 Langfuse traces"
langfuse_get_trace
Get a specific trace with its observations (generations, spans, events).
Parameters:
trace_id(required): The trace ID to fetch
Example prompt:
"Get Langfuse trace details for trace-id-123"
langfuse_list_sessions
List sessions with pagination. Sessions group related traces together.
Parameters:
limit(optional): Maximum results per pagepage(optional): Page numberfrom_timestamp(optional): Filter after timestampto_timestamp(optional): Filter before timestamp
Example prompt:
"Show me Langfuse sessions from the last week"
langfuse_get_session
Get a specific session with its metrics and traces.
Parameters:
session_id(required): The session ID to fetch
Example prompt:
"Get Langfuse session details for session-123"
langfuse_list_observations
List observations (generations, spans, events) with filters.
Parameters:
limit(optional): Maximum results per pagepage(optional): Page numbername(optional): Filter by observation nameuser_id(optional): Filter by user IDtrace_id(optional): Filter by trace IDtype(optional): Filter by type (GENERATION, SPAN, EVENT)from_timestamp(optional): Filter after timestampto_timestamp(optional): Filter before timestamp
Example prompt:
"List all GENERATION type observations from Langfuse"
langfuse_get_observation
Get a specific observation with full details including input, output, usage, and costs.
Parameters:
observation_id(required): The observation ID to fetch
Example prompt:
"Get details for Langfuse observation obs-123"
langfuse_list_scores
List scores/evaluations with filters.
Parameters:
limit(optional): Maximum results per pagepage(optional): Page numbername(optional): Filter by score nameuser_id(optional): Filter by user IDtrace_id(optional): Filter by trace IDfrom_timestamp(optional): Filter after timestampto_timestamp(optional): Filter before timestamp
Example prompt:
"Show me Langfuse scores for trace trace-123"
langfuse_get_score
Get a specific score/evaluation with full details.
Parameters:
score_id(required): The score ID to fetch
Example prompt:
"Get Langfuse score details for score-123"
Legacy Tools (Deprecated)
For backwards compatibility, the following tools are still available but will be removed in a future version:
list_sessionsβ Useaiobs_list_sessionsget_sessionβ Useaiobs_get_sessionsearch_sessionsβ Useaiobs_search_sessionsdiff_sessionsβ Useaiobs_diff_sessions
Use Cases
1. Debugging Failed Runs
"Show me all AIOBS sessions that had errors in the last 24 hours"
2. Performance Analysis
"Compare AIOBS session abc123 with session def456 and tell me which one was more efficient"
3. Prompt Regression Detection
"Find Langfuse traces with failed evaluations"
4. Cost Tracking
"List Langfuse observations and summarize the total cost"
5. Session Inspection
"Get the full trace tree for the most recent Langfuse trace and explain what happened"
6. Cross-Provider Analysis
"Show me both AIOBS sessions and Langfuse traces from today"
Development
Setup
Running Tests
Running Locally
Publishing to PyPI
Releases are automatically published to PyPI via GitHub Actions when a release is created.
To publish manually:
Architecture
License
MIT