# MCP Server API Documentation
**Version:** 0.1.0
**Base URL:** `http://localhost:8000` (or your server URL)
**API Type:** RESTful with Server-Sent Events (SSE) for streaming
---
## Table of Contents
1. [Authentication](#authentication)
2. [Endpoints](#endpoints)
- [Health Checks](#health-checks)
- [Manifest](#get-server-manifest)
- [Execute Tool](#execute-tool)
- [Stream Results](#stream-results)
- [Cancel Call](#cancel-call)
- [Cancel All Calls](#cancel-all-calls)
3. [Request/Response Examples](#requestresponse-examples)
4. [Error Handling](#error-handling)
5. [Best Practices](#best-practices)
6. [Rate Limits & Quotas](#rate-limits--quotas)
---
## Authentication
All MCP endpoints (except health checks) require **Bearer Token** authentication.
### Header Format
```
Authorization: Bearer <your-token>
```
### Getting Your Token
The server token is configured via the `MCP_SERVER_TOKEN` environment variable.
**Default (POC):** `super-secret-token`
**Production:** Use a secure, randomly generated token.
### Example
```bash
curl -H "Authorization: Bearer super-secret-token" \
http://localhost:8000/mcp/manifest
```
---
## Endpoints
### Health Checks
These endpoints do **not** require authentication.
#### GET `/healthz`
Liveness probe - checks if server is running.
**Request:**
```bash
curl http://localhost:8000/healthz
```
**Response:**
```json
{
"status": "ok"
}
```
**Status Codes:**
- `200 OK`: Server is alive
---
#### GET `/readyz`
Readiness probe - checks if server is ready to accept requests.
**Request:**
```bash
curl http://localhost:8000/readyz
```
**Response (Ready):**
```json
{
"ready": true
}
```
**Response (Not Ready):**
```json
{
"ready": false,
"reasons": ["missing OPENAI_API_KEY"]
}
```
**Status Codes:**
- `200 OK`: Server is ready
- `503 Service Unavailable`: Server not ready (missing configuration)
---
### Get Server Manifest
#### GET `/mcp/manifest`
Get the list of available tools and their capabilities.
**Authentication:** Required
**Request:**
```bash
curl -H "Authorization: Bearer super-secret-token" \
http://localhost:8000/mcp/manifest
```
**Response:**
```json
{
"server_name": "medx-mcp-server",
"version": "0.1",
"role": "AI-powered clinical agentic platform featuring our MedX-powered AI Agents and HealthOS, delivering advanced diagnostic support and personalized healthcare.",
"description": "AI-powered clinical agentic platform featuring our MedX-powered AI Agents and HealthOS, delivering advanced diagnostic support and personalized healthcare.",
"capabilities": [
"Advanced diagnostic support",
"Personalized healthcare recommendations",
"Clinical decision support",
"AI-powered medical consultations"
],
"tools": [
{
"id": "openai_chat",
"name": "openai_chat",
"description": "Call OpenAI chat models (gpt-4o-mini default).",
"inputs": {
"messages": "array of {role, content}",
"model": "string",
"max_tokens": "int"
}
}
]
}
```
**Status Codes:**
- `200 OK`: Manifest returned successfully
- `401 Unauthorized`: Missing or invalid token
- `403 Forbidden`: Invalid authentication token
---
### Execute (Simplified)
#### POST `/mcp/execute`
Execute the default tool asynchronously. Returns immediately with a `call_id` that can be used to stream results.
**Authentication:** Required
**Request Body (Simplified):**
```json
{
"messages": [
{"role": "user", "content": "What are symptoms of anemia?"}
],
"session_id": "patient-session-123",
"request_id": "unique-request-id-456"
}
```
**Request Fields (Simplified):**
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `messages` | array | ✅ Yes | Array of message objects with `role` and `content` |
| `session_id` | string | ❌ No | Session identifier for conversation tracking |
| `request_id` | string | ❌ No | Unique request ID for idempotency (recommended) |
| `metadata` | object | ❌ No | Additional metadata (stored but not processed) |
Notes:
- The server always uses tool `openai_chat` and model `gpt-4o-mini`.
- If no `system` message is provided, the server injects a default Jivi AI system prompt.
**Message Object Format:**
```json
{
"role": "system" | "user" | "assistant",
"content": "Message text"
}
```
**Request Example (Simplified):**
```bash
curl -X POST \
-H "Authorization: Bearer super-secret-token" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Hello, what is anemia?"}
],
"session_id": "patient-123",
"request_id": "req-456"
}' \
http://localhost:8000/mcp/execute
```
**Response:**
```json
{
"call_id": "03431c1a-1522-451c-9a28-1926439ae1b4",
"status": "started"
}
```
**Response Fields:**
| Field | Type | Description |
|-------|------|-------------|
| `call_id` | string | Unique identifier for this call (use for streaming) |
| `status` | string | Current status: "started" |
**Status Codes:**
- `200 OK`: Call started successfully
- `401 Unauthorized`: Missing or invalid token
- `403 Forbidden`: Invalid authentication token
- `422 Unprocessable Entity`: Invalid request format
- `404 Not Found`: Tool not found
**Idempotency:**
If you send the same `request_id` again, the server will return the existing `call_id` and status without creating a duplicate call. This is useful for:
- Handling network timeouts
- Preventing duplicate API charges
- Ensuring consistent responses
**Example with Idempotency:**
```bash
# First request
POST /mcp/execute {"request_id": "req-123", ...}
→ Returns: {"call_id": "call-456", "status": "started"}
# Retry with same request_id
POST /mcp/execute {"request_id": "req-123", ...}
→ Returns: {"call_id": "call-456", "status": "finished"} # Same call!
```
---
### Stream Results
#### GET `/mcp/stream/{call_id}`
Stream events for a call using Server-Sent Events (SSE). This endpoint provides real-time updates as the tool executes.
**Authentication:** Required
**Path Parameters:**
- `call_id` (string, required): The call ID returned from `/mcp/execute`
**Request:**
```bash
curl -N -H "Authorization: Bearer super-secret-token" \
http://localhost:8000/mcp/stream/03431c1a-1522-451c-9a28-1926439ae1b4
```
**Response Format (SSE):**
```
event: partial
data: {"type": "partial", "text": "Anemia"}
event: partial
data: {"type": "partial", "text": " is"}
event: partial
data: {"type": "partial", "text": " a"}
event: final
data: {"type": "final", "text": "Anemia is a condition..."}
```
**Event Types:**
| Event Type | Description |
|------------|-------------|
| `partial` | Incremental token/chunk of the response |
| `final` | Complete response text (all partial chunks combined) |
| `error` | Error occurred during execution |
| `cancelled` | Call was cancelled |
**Event Data Structure:**
```json
{
"type": "partial" | "final" | "error" | "cancelled",
"text": "Token or full text",
"message": "Error or cancellation message (for error/cancelled types)"
}
```
**Streaming Behavior:**
- Stream remains open until a `final`, `error`, or `cancelled` event is received
- Timeout: 5 minutes (300 seconds) of inactivity
- Multiple clients can stream the same `call_id` simultaneously
- Tokens arrive in real-time as they are generated
**JavaScript Example:**
```javascript
const eventSource = new EventSource(
'http://localhost:8000/mcp/stream/03431c1a-1522-451c-9a28-1926439ae1b4',
{
headers: {
'Authorization': 'Bearer super-secret-token'
}
}
);
let fullText = '';
eventSource.addEventListener('partial', (event) => {
const data = JSON.parse(event.data);
fullText += data.text;
console.log('Partial:', data.text);
});
eventSource.addEventListener('final', (event) => {
const data = JSON.parse(event.data);
fullText = data.text; // Complete text
console.log('Complete:', fullText);
eventSource.close();
});
eventSource.addEventListener('error', (event) => {
console.error('Error:', event.data);
eventSource.close();
});
```
**Python Example:**
```python
import requests
import json
url = "http://localhost:8000/mcp/stream/03431c1a-1522-451c-9a28-1926439ae1b4"
headers = {"Authorization": "Bearer super-secret-token"}
with requests.get(url, headers=headers, stream=True) as response:
for line in response.iter_lines():
if line:
# SSE format: "data: {...}"
if line.startswith(b"data: "):
data = json.loads(line[6:]) # Remove "data: " prefix
if data["type"] == "partial":
print(data["text"], end="", flush=True)
elif data["type"] == "final":
print(f"\n\nComplete: {data['text']}")
break
```
**Status Codes:**
- `200 OK`: Stream established
- `401 Unauthorized`: Missing or invalid token
- `403 Forbidden`: Invalid authentication token
- `404 Not Found`: Call ID not found
---
### Cancel Call
#### POST `/mcp/cancel/{call_id}`
Cancel a running or pending call.
**Authentication:** Required
**Path Parameters:**
- `call_id` (string, required): The call ID to cancel
**Request:**
```bash
curl -X POST \
-H "Authorization: Bearer super-secret-token" \
http://localhost:8000/mcp/cancel/03431c1a-1522-451c-9a28-1926439ae1b4
```
**Response:**
```json
{
"status": "cancelled"
}
```
**Behavior:**
- Marks call as cancelled in registry
- Sends cancellation signal to background task
- Pushes `cancelled` event to stream queue
- Records cancellation in session buffer (if session_id provided)
**Status Codes:**
- `200 OK`: Cancellation requested
- `401 Unauthorized`: Missing or invalid token
- `403 Forbidden`: Invalid authentication token
- `404 Not Found`: Call ID not found
---
### Cancel All Calls
#### POST `/mcp/cancel_all`
Cancel all active calls. Useful for emergency shutdowns or cleanup.
**Authentication:** Required
**Request:**
```bash
curl -X POST \
-H "Authorization: Bearer super-secret-token" \
http://localhost:8000/mcp/cancel_all
```
**Response:**
```json
{
"status": "cancelled",
"count": 3,
"call_ids": [
"call-123",
"call-456",
"call-789"
]
}
```
**Response Fields:**
| Field | Type | Description |
|-------|------|-------------|
| `status` | string | Always "cancelled" |
| `count` | integer | Number of calls cancelled |
| `call_ids` | array | List of cancelled call IDs |
**Status Codes:**
- `200 OK`: Cancellation completed
- `401 Unauthorized`: Missing or invalid token
- `403 Forbidden`: Invalid authentication token
---
## Request/Response Examples
### Complete Flow: Medical Query
#### Step 1: Execute Tool
```bash
curl -X POST \
-H "Authorization: Bearer super-secret-token" \
-H "Content-Type: application/json" \
-d '{
"tool": "openai_chat",
"input": {
"messages": [
{
"role": "system",
"content": "You are a careful medical assistant. Provide general information only."
},
{
"role": "user",
"content": "What are common symptoms of iron deficiency anemia?"
}
],
"model": "gpt-4o-mini",
"max_tokens": 512
},
"session_id": "patient-conversation-1",
"request_id": "medical-query-001"
}' \
http://localhost:8000/mcp/execute
```
**Response:**
```json
{
"call_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "started"
}
```
#### Step 2: Stream Results
```bash
curl -N -H "Authorization: Bearer super-secret-token" \
http://localhost:8000/mcp/stream/a1b2c3d4-e5f6-7890-abcd-ef1234567890
```
**Stream Output:**
```
event: partial
data: {"type": "partial", "text": "Common"}
event: partial
data: {"type": "partial", "text": " symptoms"}
event: partial
data: {"type": "partial", "text": " of"}
... (many more partial events) ...
event: final
data: {"type": "final", "text": "Common symptoms of iron deficiency anemia include:\n\n1. Fatigue\n2. Weakness\n3. Pale skin\n..."}
```
#### Step 3: Conversational Follow-up (Same Session)
```bash
curl -X POST \
-H "Authorization: Bearer super-secret-token" \
-H "Content-Type: application/json" \
-d '{
"tool": "openai_chat",
"input": {
"messages": [
{"role": "system", "content": "You are a careful medical assistant."},
{"role": "user", "content": "What are common symptoms of iron deficiency anemia?"},
{"role": "assistant", "content": "Common symptoms of iron deficiency anemia include:\n\n1. Fatigue\n2. Weakness\n3. Pale skin\n..."},
{"role": "user", "content": "What tests should I ask my doctor about?"}
],
"max_tokens": 400
},
"session_id": "patient-conversation-1",
"request_id": "medical-query-002"
}' \
http://localhost:8000/mcp/execute
```
---
## Error Handling
### Error Response Format
All errors follow a consistent format:
```json
{
"error": {
"message": "Error description",
"error_code": "ERROR_CODE",
"type": "ExceptionClassName"
}
}
```
### Error Codes
| Error Code | Status Code | Description |
|------------|-------------|-------------|
| `AUTH_ERROR` | 401 | Missing or invalid authentication |
| `AUTHZ_ERROR` | 403 | Invalid token |
| `TOOL_NOT_FOUND` | 404 | Requested tool does not exist |
| `CALL_NOT_FOUND` | 404 | Call ID not found |
| `VALIDATION_ERROR` | 422 | Invalid request format |
| `OPENAI_ERROR` | 502 | OpenAI API error |
### Common Errors
#### 401 Unauthorized
```json
{
"error": {
"message": "Missing or malformed Authorization header",
"error_code": "AUTH_ERROR",
"type": "AuthenticationError"
}
}
```
**Solution:** Include `Authorization: Bearer <token>` header.
#### 404 Tool Not Found
```json
{
"error": {
"message": "Tool 'invalid_tool' not found",
"error_code": "TOOL_NOT_FOUND",
"type": "ToolNotFoundError"
}
}
```
**Solution:** Check `/mcp/manifest` for available tools.
#### 404 Call Not Found
```json
{
"error": {
"message": "Call 'invalid-call-id' not found",
"error_code": "CALL_NOT_FOUND",
"type": "CallNotFoundError"
}
}
```
**Solution:** Use a valid `call_id` from `/mcp/execute` response.
#### Stream Errors
During streaming, errors are sent as SSE events:
```
event: error
data: {"type": "error", "message": "OpenAI API error: Rate limit exceeded"}
```
---
## Best Practices
### 1. Always Use `request_id` for Idempotency
```json
{
"request_id": "unique-client-request-id-12345"
}
```
**Benefits:**
- Prevents duplicate charges on retries
- Ensures consistent responses
- Handles network failures gracefully
**Generation Tips:**
- Use UUID: `uuid.uuid4().hex`
- Include timestamp: `f"{timestamp}-{unique-id}"`
- Include session: `f"{session_id}-{request_number}"`
### 2. Use `session_id` for Conversation Tracking
```json
{
"session_id": "user-123-conversation-1"
}
```
**Benefits:**
- Track conversation history
- Enable context-aware responses
- Support multi-turn conversations
### 3. Handle Streaming Properly
**Do:**
- Keep connection open until `final` or `error` event
- Accumulate `partial` events to build complete text
- Handle timeouts gracefully
- Close connection after `final` event
**Don't:**
- Close connection on first `partial` event
- Ignore `final` event (it contains complete text)
- Assume streaming will never timeout
### 4. Error Handling Strategy
```python
try:
# Execute call
response = requests.post(execute_url, json=payload, headers=auth_headers)
response.raise_for_status()
call_id = response.json()["call_id"]
# Stream results
stream_response = requests.get(stream_url.format(call_id=call_id),
headers=auth_headers, stream=True)
for line in stream_response.iter_lines():
if line.startswith(b"data: "):
data = json.loads(line[6:])
if data["type"] == "error":
raise Exception(f"Stream error: {data['message']}")
elif data["type"] == "final":
return data["text"]
except requests.exceptions.HTTPError as e:
if e.response.status_code == 401:
# Handle authentication error
elif e.response.status_code == 404:
# Handle not found error
```
### 5. Implement Retry Logic
```python
import time
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
def create_session_with_retries():
session = requests.Session()
retry_strategy = Retry(
total=3,
backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504]
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("http://", adapter)
session.mount("https://", adapter)
return session
```
**Important:** Use same `request_id` on retries for idempotency!
### 6. Monitor Health Endpoints
Before making API calls, check server health:
```python
health_response = requests.get(f"{base_url}/healthz")
ready_response = requests.get(f"{base_url}/readyz")
if ready_response.json().get("ready"):
# Server is ready, proceed with requests
else:
# Server not ready, handle accordingly
```
### 7. Set Appropriate Timeouts
```python
# For execute endpoint (should return quickly)
execute_response = requests.post(
execute_url,
json=payload,
headers=auth_headers,
timeout=10 # 10 seconds
)
# For stream endpoint (long-running)
stream_response = requests.get(
stream_url,
headers=auth_headers,
stream=True,
timeout=300 # 5 minutes (matches server timeout)
)
```
---
## Rate Limits & Quotas
**Current Implementation:**
- No rate limiting enforced (POC)
- Server processes requests concurrently
- Limited by OpenAI API rate limits
**Production Considerations:**
- Implement per-client rate limiting
- Set quotas for API usage
- Monitor and log all requests
- Consider request queuing for high load
---
## Complete Client Implementation Example
### Python Client
```python
import requests
import json
import uuid
from typing import Optional, Iterator
class MCPClient:
def __init__(self, base_url: str, token: str):
self.base_url = base_url.rstrip('/')
self.token = token
self.headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
}
def execute(
self,
tool: str,
messages: list,
session_id: Optional[str] = None,
request_id: Optional[str] = None,
model: Optional[str] = None,
max_tokens: int = 512
) -> dict:
"""Execute a tool and return call_id."""
payload = {
"tool": tool,
"input": {
"messages": messages,
"max_tokens": max_tokens
}
}
if model:
payload["input"]["model"] = model
if session_id:
payload["session_id"] = session_id
if request_id:
payload["request_id"] = request_id
else:
payload["request_id"] = str(uuid.uuid4())
response = requests.post(
f"{self.base_url}/mcp/execute",
json=payload,
headers=self.headers,
timeout=10
)
response.raise_for_status()
return response.json()
def stream(self, call_id: str) -> Iterator[dict]:
"""Stream events for a call."""
url = f"{self.base_url}/mcp/stream/{call_id}"
response = requests.get(url, headers=self.headers, stream=True, timeout=300)
response.raise_for_status()
for line in response.iter_lines():
if line.startswith(b"data: "):
yield json.loads(line[6:])
elif line.startswith(b"event: "):
event_type = line[7:].decode()
# Event type is in the line, next line will have data
def execute_and_stream(
self,
messages: list,
session_id: Optional[str] = None,
**kwargs
) -> Iterator[str]:
"""Execute and stream in one call."""
result = self.execute("openai_chat", messages, session_id, **kwargs)
call_id = result["call_id"]
full_text = ""
for event in self.stream(call_id):
if event["type"] == "partial":
full_text += event["text"]
yield event["text"]
elif event["type"] == "final":
yield event["text"]
break
elif event["type"] == "error":
raise Exception(f"Error: {event.get('message', 'Unknown error')}")
def cancel(self, call_id: str) -> dict:
"""Cancel a call."""
response = requests.post(
f"{self.base_url}/mcp/cancel/{call_id}",
headers=self.headers,
timeout=5
)
response.raise_for_status()
return response.json()
def health_check(self) -> dict:
"""Check server health."""
response = requests.get(f"{self.base_url}/healthz", timeout=5)
response.raise_for_status()
return response.json()
def ready_check(self) -> dict:
"""Check server readiness."""
response = requests.get(f"{self.base_url}/readyz", timeout=5)
return response.json()
# Usage Example
if __name__ == "__main__":
client = MCPClient("http://localhost:8000", "super-secret-token")
# Check server
print("Health:", client.health_check())
print("Ready:", client.ready_check())
# Execute and stream
messages = [
{"role": "user", "content": "What is anemia?"}
]
print("\nStreaming response:")
for chunk in client.execute_and_stream(messages, session_id="demo-1"):
print(chunk, end="", flush=True)
print("\n")
```
### JavaScript/TypeScript Client
```typescript
class MCPClient {
constructor(private baseUrl: string, private token: string) {}
async execute(
tool: string,
input: any,
sessionId?: string,
requestId?: string
): Promise<{ call_id: string; status: string }> {
const response = await fetch(`${this.baseUrl}/mcp/execute`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${this.token}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
tool,
input,
session_id: sessionId,
request_id: requestId || crypto.randomUUID()
})
});
if (!response.ok) {
throw new Error(`HTTP ${response.status}: ${await response.text()}`);
}
return response.json();
}
stream(callId: string): ReadableStream<string> {
const url = `${this.baseUrl}/mcp/stream/${callId}`;
return new ReadableStream({
async start(controller) {
const eventSource = new EventSource(url, {
withCredentials: false
});
// Note: Browser EventSource doesn't support custom headers
// You may need to pass token as query param or use fetch API
eventSource.addEventListener('partial', (event: any) => {
const data = JSON.parse(event.data);
controller.enqueue(data.text);
});
eventSource.addEventListener('final', (event: any) => {
const data = JSON.parse(event.data);
controller.enqueue(data.text);
controller.close();
eventSource.close();
});
eventSource.addEventListener('error', (event: any) => {
controller.error(new Error(event.data));
eventSource.close();
});
}
});
}
async cancel(callId: string): Promise<void> {
await fetch(`${this.baseUrl}/mcp/cancel/${callId}`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${this.token}`
}
});
}
}
// Usage
const client = new MCPClient('http://localhost:8000', 'super-secret-token');
const result = await client.execute('openai_chat', {
messages: [{ role: 'user', content: 'Hello!' }]
});
const stream = client.stream(result.call_id);
const reader = stream.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
process.stdout.write(value);
}
```
---
## Support & Troubleshooting
### Common Issues
**1. "Missing auth" error**
- Ensure `Authorization` header is included
- Check token is correct
- Verify header format: `Bearer <token>` (space required)
**2. Stream times out**
- Check network connection
- Verify server is running
- Ensure call_id is valid and call hasn't completed
**3. Call not found**
- Verify call_id is from recent execute request
- Check if call completed (calls may be cleaned up)
- Ensure you're using correct call_id format
**4. Slow responses**
- Normal for long AI generations
- Check OpenAI API status
- Consider using smaller `max_tokens` for faster responses
### Debugging Tips
1. **Enable verbose logging** (client-side):
```python
import logging
logging.basicConfig(level=logging.DEBUG)
```
2. **Check server logs**:
- Logs are written to `logs/server.log`
- Monitor for errors or warnings
3. **Test with curl** first:
```bash
# Simple test
curl -H "Authorization: Bearer token" \
http://localhost:8000/mcp/manifest
```
4. **Validate request format**:
- Use JSON validator
- Check message format matches specification
- Ensure all required fields are present
---
## Changelog
### Version 0.1.0
- Initial release
- Support for `openai_chat` tool
- Streaming via SSE
- Idempotency support
- Session tracking
- Cancellation support
---
## License & Terms
This API documentation is provided for the MCP Server POC.
For production use, consult your organization's API terms and conditions.