# Agentic Workbench Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** Einen MCP-Server bauen, der als Tool-Orchestrator fungiert und Aufgaben mit hierarchischer Navigation an das richtige Tool delegiert.
**Architecture:** Drei Kernkomponenten (Catalog, Navigator, Executor) die zusammenarbeiten. Der Catalog lädt die Tool-Hierarchie aus dem Dateisystem. Der Navigator nutzt ein schnelles LLM (Cerebras/Gemini) für schrittweise Navigation. Der Executor verbindet sich zu MCP-Servern und führt Tools aus.
**Tech Stack:** Python 3.11+, MCP SDK, httpx, pydantic, typer, pytest
---
## Task 1: Catalog Models (Pydantic)
**Files:**
- Create: `src/agentic_workbench/models.py`
- Test: `tests/test_models.py`
**Step 1: Write the failing test**
```python
# tests/test_models.py
import pytest
from agentic_workbench.models import ToolParameter, ToolDefinition, ServiceDefinition, CategoryDefinition
def test_tool_parameter_required():
param = ToolParameter(
name="channel",
type="string",
required=True,
description="Channel name"
)
assert param.name == "channel"
assert param.required is True
def test_tool_definition():
tool = ToolDefinition(
name="send_message",
description="Send a message",
parameters=[
ToolParameter(name="text", type="string", required=True, description="Message text")
]
)
assert tool.name == "send_message"
assert len(tool.parameters) == 1
def test_service_definition():
service = ServiceDefinition(
name="slack",
description="Slack communication",
mcp_server="slack-mcp",
tools=[]
)
assert service.mcp_server == "slack-mcp"
def test_category_definition():
category = CategoryDefinition(
name="communication",
description="Messaging services",
services=[]
)
assert category.name == "communication"
```
**Step 2: Run test to verify it fails**
Run: `pytest tests/test_models.py -v`
Expected: FAIL with "ModuleNotFoundError: No module named 'agentic_workbench.models'"
**Step 3: Write minimal implementation**
```python
# src/agentic_workbench/models.py
"""Data models for Agentic Workbench."""
from pydantic import BaseModel
class ToolParameter(BaseModel):
"""A parameter for a tool."""
name: str
type: str
required: bool
description: str
class ToolDefinition(BaseModel):
"""A tool that can be executed."""
name: str
description: str
parameters: list[ToolParameter] = []
class ServiceDefinition(BaseModel):
"""A service containing multiple tools."""
name: str
description: str
mcp_server: str
tools: list[ToolDefinition] = []
class CategoryDefinition(BaseModel):
"""A category containing multiple services."""
name: str
description: str
services: list[ServiceDefinition] = []
class SelectedTool(BaseModel):
"""Result of navigation - the selected tool with parameters."""
category: str
service: str
tool: str
mcp_server: str
parameters: dict = {}
class ExecutionResult(BaseModel):
"""Result of tool execution."""
success: bool
tool_used: str
result: dict | None = None
error: dict | None = None
```
**Step 4: Run test to verify it passes**
Run: `pytest tests/test_models.py -v`
Expected: PASS (4 passed)
**Step 5: Commit**
```bash
git add src/agentic_workbench/models.py tests/test_models.py
git commit -m "feat: add pydantic models for catalog structure"
```
---
## Task 2: Catalog Loader
**Files:**
- Create: `src/agentic_workbench/catalog.py`
- Test: `tests/test_catalog.py`
**Step 1: Write the failing test**
```python
# tests/test_catalog.py
import pytest
from pathlib import Path
from agentic_workbench.catalog import CatalogLoader
@pytest.fixture
def catalog_path():
return Path(__file__).parent.parent / "catalog"
def test_load_categories(catalog_path):
loader = CatalogLoader(catalog_path)
categories = loader.list_categories()
assert "communication" in categories
assert "database" in categories
assert "files" in categories
def test_load_category_description(catalog_path):
loader = CatalogLoader(catalog_path)
category = loader.get_category("communication")
assert category.name == "communication"
assert "Messaging" in category.description or "Chat" in category.description
def test_load_services(catalog_path):
loader = CatalogLoader(catalog_path)
services = loader.list_services("communication")
assert "slack" in services
def test_load_service_definition(catalog_path):
loader = CatalogLoader(catalog_path)
service = loader.get_service("communication", "slack")
assert service.name == "slack"
assert service.mcp_server == "slack-mcp"
def test_load_tools(catalog_path):
loader = CatalogLoader(catalog_path)
tools = loader.list_tools("communication", "slack")
assert "send_message" in tools
def test_load_tool_definition(catalog_path):
loader = CatalogLoader(catalog_path)
tool = loader.get_tool("communication", "slack", "send_message")
assert tool.name == "send_message"
assert any(p.name == "channel" for p in tool.parameters)
assert any(p.name == "text" for p in tool.parameters)
```
**Step 2: Run test to verify it fails**
Run: `pytest tests/test_catalog.py -v`
Expected: FAIL with "ModuleNotFoundError: No module named 'agentic_workbench.catalog'"
**Step 3: Write minimal implementation**
```python
# src/agentic_workbench/catalog.py
"""Catalog loader for hierarchical tool structure."""
from pathlib import Path
import yaml
from .models import CategoryDefinition, ServiceDefinition, ToolDefinition, ToolParameter
class CatalogLoader:
"""Loads tool catalog from directory structure."""
def __init__(self, catalog_path: Path):
self.catalog_path = Path(catalog_path)
def list_categories(self) -> list[str]:
"""List all category names."""
return [
d.name
for d in self.catalog_path.iterdir()
if d.is_dir() and not d.name.startswith("_")
]
def get_category(self, name: str) -> CategoryDefinition:
"""Load a category definition."""
category_path = self.catalog_path / name
config_file = category_path / "_category.yaml"
if config_file.exists():
with open(config_file) as f:
data = yaml.safe_load(f)
else:
data = {"description": name}
return CategoryDefinition(
name=name,
description=data.get("description", name),
services=[],
)
def list_services(self, category: str) -> list[str]:
"""List all service names in a category."""
category_path = self.catalog_path / category
return [
d.name
for d in category_path.iterdir()
if d.is_dir() and not d.name.startswith("_")
]
def get_service(self, category: str, name: str) -> ServiceDefinition:
"""Load a service definition."""
service_path = self.catalog_path / category / name
config_file = service_path / "_service.yaml"
with open(config_file) as f:
data = yaml.safe_load(f)
return ServiceDefinition(
name=name,
description=data.get("description", name),
mcp_server=data["mcp_server"],
tools=[],
)
def list_tools(self, category: str, service: str) -> list[str]:
"""List all tool names in a service."""
service_path = self.catalog_path / category / service
return [
f.stem
for f in service_path.glob("*.yaml")
if not f.name.startswith("_")
]
def get_tool(self, category: str, service: str, name: str) -> ToolDefinition:
"""Load a tool definition."""
tool_file = self.catalog_path / category / service / f"{name}.yaml"
with open(tool_file) as f:
data = yaml.safe_load(f)
parameters = []
for param_name, param_data in data.get("parameters", {}).items():
parameters.append(
ToolParameter(
name=param_name,
type=param_data.get("type", "string"),
required=param_data.get("required", False),
description=param_data.get("description", ""),
)
)
return ToolDefinition(
name=name,
description=data.get("description", name),
parameters=parameters,
)
```
**Step 4: Run test to verify it passes**
Run: `pytest tests/test_catalog.py -v`
Expected: PASS (6 passed)
**Step 5: Commit**
```bash
git add src/agentic_workbench/catalog.py tests/test_catalog.py
git commit -m "feat: add catalog loader for hierarchical tool structure"
```
---
## Task 3: LLM Client Abstraction
**Files:**
- Create: `src/agentic_workbench/llm.py`
- Test: `tests/test_llm.py`
**Step 1: Write the failing test**
```python
# tests/test_llm.py
import pytest
from agentic_workbench.llm import LLMClient, LLMResponse
def test_llm_response_model():
response = LLMResponse(content="communication", raw={})
assert response.content == "communication"
def test_llm_client_abstract():
"""LLMClient should be abstract."""
with pytest.raises(TypeError):
LLMClient()
def test_mock_llm_client():
from agentic_workbench.llm import MockLLMClient
client = MockLLMClient(responses=["communication", "slack", "send_message"])
assert client.ask("prompt1") == "communication"
assert client.ask("prompt2") == "slack"
assert client.ask("prompt3") == "send_message"
```
**Step 2: Run test to verify it fails**
Run: `pytest tests/test_llm.py -v`
Expected: FAIL with "ModuleNotFoundError"
**Step 3: Write minimal implementation**
```python
# src/agentic_workbench/llm.py
"""LLM client abstraction for tool selection."""
from abc import ABC, abstractmethod
import httpx
from pydantic import BaseModel
class LLMResponse(BaseModel):
"""Response from LLM."""
content: str
raw: dict = {}
class LLMClient(ABC):
"""Abstract LLM client."""
@abstractmethod
def ask(self, prompt: str) -> str:
"""Ask the LLM a question and get a response."""
pass
class MockLLMClient(LLMClient):
"""Mock LLM client for testing."""
def __init__(self, responses: list[str]):
self.responses = responses
self.index = 0
def ask(self, prompt: str) -> str:
response = self.responses[self.index]
self.index += 1
return response
class CerebrasClient(LLMClient):
"""Cerebras LLM client."""
def __init__(self, api_key: str, model: str = "llama-3.3-70b"):
self.api_key = api_key
self.model = model
self.base_url = "https://api.cerebras.ai/v1"
def ask(self, prompt: str) -> str:
with httpx.Client() as client:
response = client.post(
f"{self.base_url}/chat/completions",
headers={"Authorization": f"Bearer {self.api_key}"},
json={
"model": self.model,
"messages": [{"role": "user", "content": prompt}],
"max_tokens": 100,
},
timeout=30.0,
)
response.raise_for_status()
data = response.json()
return data["choices"][0]["message"]["content"].strip()
class GeminiClient(LLMClient):
"""Google Gemini LLM client."""
def __init__(self, api_key: str, model: str = "gemini-2.0-flash"):
self.api_key = api_key
self.model = model
self.base_url = "https://generativelanguage.googleapis.com/v1beta"
def ask(self, prompt: str) -> str:
with httpx.Client() as client:
response = client.post(
f"{self.base_url}/models/{self.model}:generateContent",
params={"key": self.api_key},
json={"contents": [{"parts": [{"text": prompt}]}]},
timeout=30.0,
)
response.raise_for_status()
data = response.json()
return data["candidates"][0]["content"]["parts"][0]["text"].strip()
def create_llm_client(provider: str, api_key: str, model: str | None = None) -> LLMClient:
"""Factory function to create LLM client."""
if provider == "cerebras":
return CerebrasClient(api_key, model or "llama-3.3-70b")
elif provider == "gemini":
return GeminiClient(api_key, model or "gemini-2.0-flash")
else:
raise ValueError(f"Unknown provider: {provider}")
```
**Step 4: Run test to verify it passes**
Run: `pytest tests/test_llm.py -v`
Expected: PASS (3 passed)
**Step 5: Commit**
```bash
git add src/agentic_workbench/llm.py tests/test_llm.py
git commit -m "feat: add LLM client abstraction with Cerebras and Gemini support"
```
---
## Task 4: Navigator Component
**Files:**
- Create: `src/agentic_workbench/navigator.py`
- Test: `tests/test_navigator.py`
**Step 1: Write the failing test**
```python
# tests/test_navigator.py
import pytest
from pathlib import Path
from agentic_workbench.navigator import Navigator
from agentic_workbench.catalog import CatalogLoader
from agentic_workbench.llm import MockLLMClient
@pytest.fixture
def catalog_path():
return Path(__file__).parent.parent / "catalog"
@pytest.fixture
def navigator(catalog_path):
catalog = CatalogLoader(catalog_path)
# Mock responses: category, service, tool, parameters JSON
llm = MockLLMClient([
"communication",
"slack",
"send_message",
'{"channel": "#general", "text": "Hello!"}'
])
return Navigator(catalog, llm)
def test_navigate_to_tool(navigator):
result = navigator.navigate("Send a Slack message to #general saying Hello!")
assert result.category == "communication"
assert result.service == "slack"
assert result.tool == "send_message"
assert result.mcp_server == "slack-mcp"
def test_navigate_extracts_parameters(navigator):
result = navigator.navigate("Send a Slack message to #general saying Hello!")
assert result.parameters["channel"] == "#general"
assert result.parameters["text"] == "Hello!"
def test_navigate_skips_single_option(catalog_path):
"""When only one option exists, skip LLM call."""
catalog = CatalogLoader(catalog_path)
# Only need: category, parameters (service and tool auto-selected if single)
llm = MockLLMClient([
"database",
'{"sql": "SELECT * FROM users"}'
])
navigator = Navigator(catalog, llm)
result = navigator.navigate("Run a PostgreSQL query")
# postgres is only service, query is only tool
assert result.category == "database"
assert result.service == "postgres"
assert result.tool == "query"
```
**Step 2: Run test to verify it fails**
Run: `pytest tests/test_navigator.py -v`
Expected: FAIL with "ModuleNotFoundError"
**Step 3: Write minimal implementation**
```python
# src/agentic_workbench/navigator.py
"""Navigator for hierarchical tool selection."""
import json
from .catalog import CatalogLoader
from .llm import LLMClient
from .models import SelectedTool
class Navigator:
"""Navigates through tool hierarchy using LLM."""
def __init__(self, catalog: CatalogLoader, llm: LLMClient):
self.catalog = catalog
self.llm = llm
def navigate(self, task: str) -> SelectedTool:
"""Navigate to the right tool for a task."""
# Step 1: Select category
category = self._select_category(task)
# Step 2: Select service (or auto-select if only one)
service = self._select_service(task, category)
# Step 3: Select tool (or auto-select if only one)
tool = self._select_tool(task, category, service)
# Step 4: Get MCP server
service_def = self.catalog.get_service(category, service)
mcp_server = service_def.mcp_server
# Step 5: Extract parameters
tool_def = self.catalog.get_tool(category, service, tool)
parameters = self._extract_parameters(task, tool_def)
return SelectedTool(
category=category,
service=service,
tool=tool,
mcp_server=mcp_server,
parameters=parameters,
)
def _select_category(self, task: str) -> str:
"""Select category using LLM."""
categories = self.catalog.list_categories()
if len(categories) == 1:
return categories[0]
options = []
for cat_name in categories:
cat = self.catalog.get_category(cat_name)
options.append(f"• {cat_name}: {cat.description}")
prompt = f"""Task: {task}
Which category fits best?
{chr(10).join(options)}
Reply with ONLY the category name, nothing else."""
response = self.llm.ask(prompt)
return response.strip().lower()
def _select_service(self, task: str, category: str) -> str:
"""Select service using LLM."""
services = self.catalog.list_services(category)
if len(services) == 1:
return services[0]
options = []
for svc_name in services:
svc = self.catalog.get_service(category, svc_name)
tools = self.catalog.list_tools(category, svc_name)
options.append(f"• {svc_name}: {svc.description} ({len(tools)} tools)")
prompt = f"""Task: {task}
Category: {category}
Which service fits best?
{chr(10).join(options)}
Reply with ONLY the service name, nothing else."""
response = self.llm.ask(prompt)
return response.strip().lower()
def _select_tool(self, task: str, category: str, service: str) -> str:
"""Select tool using LLM."""
tools = self.catalog.list_tools(category, service)
if len(tools) == 1:
return tools[0]
options = []
for tool_name in tools:
tool = self.catalog.get_tool(category, service, tool_name)
options.append(f"• {tool_name}: {tool.description}")
prompt = f"""Task: {task}
Service: {service}
Which tool fits best?
{chr(10).join(options)}
Reply with ONLY the tool name, nothing else."""
response = self.llm.ask(prompt)
return response.strip().lower()
def _extract_parameters(self, task: str, tool) -> dict:
"""Extract parameters from task using LLM."""
if not tool.parameters:
return {}
params_desc = []
for p in tool.parameters:
req = "(required)" if p.required else "(optional)"
params_desc.append(f"• {p.name}: {p.description} {req}")
prompt = f"""Task: {task}
Tool: {tool.name} - {tool.description}
Extract these parameters from the task:
{chr(10).join(params_desc)}
Reply with ONLY valid JSON, nothing else. Example: {{"param": "value"}}"""
response = self.llm.ask(prompt)
return json.loads(response.strip())
```
**Step 4: Run test to verify it passes**
Run: `pytest tests/test_navigator.py -v`
Expected: PASS (3 passed)
**Step 5: Commit**
```bash
git add src/agentic_workbench/navigator.py tests/test_navigator.py
git commit -m "feat: add navigator for hierarchical tool selection"
```
---
## Task 5: Executor Component (Stub)
**Files:**
- Create: `src/agentic_workbench/executor.py`
- Test: `tests/test_executor.py`
**Step 1: Write the failing test**
```python
# tests/test_executor.py
import pytest
from agentic_workbench.executor import Executor
from agentic_workbench.models import SelectedTool, ExecutionResult
@pytest.fixture
def executor():
mcp_configs = {
"mock-mcp": {
"command": "echo",
"args": ["mock"],
}
}
return Executor(mcp_configs)
def test_executor_init(executor):
assert executor.connections == {}
def test_execution_result_success():
result = ExecutionResult(
success=True,
tool_used="slack.send_message",
result={"ok": True, "ts": "123"}
)
assert result.success is True
assert result.error is None
def test_execution_result_failure():
result = ExecutionResult(
success=False,
tool_used="slack.send_message",
error={"type": "api_error", "message": "Channel not found"}
)
assert result.success is False
assert result.error["type"] == "api_error"
```
**Step 2: Run test to verify it fails**
Run: `pytest tests/test_executor.py -v`
Expected: FAIL with "ModuleNotFoundError"
**Step 3: Write minimal implementation**
```python
# src/agentic_workbench/executor.py
"""Executor for MCP tool calls."""
from .models import ExecutionResult, SelectedTool
class MCPConnection:
"""Connection to an MCP server (stub)."""
def __init__(self, command: str, args: list[str], env: dict | None = None):
self.command = command
self.args = args
self.env = env or {}
self.connected = False
async def connect(self):
"""Connect to the MCP server."""
# TODO: Implement actual MCP connection
self.connected = True
async def call_tool(self, name: str, arguments: dict) -> dict:
"""Call a tool on the MCP server."""
# TODO: Implement actual MCP tool call
raise NotImplementedError("MCP connection not yet implemented")
async def close(self):
"""Close the connection."""
self.connected = False
class Executor:
"""Executes tools on MCP servers."""
def __init__(self, mcp_configs: dict):
self.mcp_configs = mcp_configs
self.connections: dict[str, MCPConnection] = {}
async def execute(self, selected: SelectedTool) -> ExecutionResult:
"""Execute the selected tool."""
try:
connection = await self._get_connection(selected.mcp_server)
result = await connection.call_tool(
name=selected.tool,
arguments=selected.parameters,
)
return ExecutionResult(
success=True,
tool_used=f"{selected.service}.{selected.tool}",
result=result,
)
except Exception as e:
return ExecutionResult(
success=False,
tool_used=f"{selected.service}.{selected.tool}",
error={"type": "execution_error", "message": str(e)},
)
async def _get_connection(self, server_name: str) -> MCPConnection:
"""Get or create connection to MCP server."""
if server_name not in self.connections:
config = self.mcp_configs.get(server_name)
if not config:
raise ValueError(f"Unknown MCP server: {server_name}")
connection = MCPConnection(
command=config["command"],
args=config.get("args", []),
env=config.get("env"),
)
await connection.connect()
self.connections[server_name] = connection
return self.connections[server_name]
async def close_all(self):
"""Close all connections."""
for connection in self.connections.values():
await connection.close()
self.connections.clear()
```
**Step 4: Run test to verify it passes**
Run: `pytest tests/test_executor.py -v`
Expected: PASS (3 passed)
**Step 5: Commit**
```bash
git add src/agentic_workbench/executor.py tests/test_executor.py
git commit -m "feat: add executor stub for MCP tool calls"
```
---
## Task 6: Main Workbench Class
**Files:**
- Create: `src/agentic_workbench/workbench.py`
- Test: `tests/test_workbench.py`
**Step 1: Write the failing test**
```python
# tests/test_workbench.py
import pytest
from pathlib import Path
from agentic_workbench.workbench import Workbench
from agentic_workbench.llm import MockLLMClient
@pytest.fixture
def catalog_path():
return Path(__file__).parent.parent / "catalog"
@pytest.fixture
def workbench(catalog_path):
llm = MockLLMClient([
"communication",
"slack",
"send_message",
'{"channel": "#general", "text": "Hello!"}'
])
return Workbench(catalog_path, llm)
def test_workbench_init(workbench):
assert workbench.catalog is not None
assert workbench.navigator is not None
assert workbench.executor is not None
def test_workbench_navigate(workbench):
result = workbench.navigate("Send a Slack message to #general")
assert result.tool == "send_message"
assert result.mcp_server == "slack-mcp"
```
**Step 2: Run test to verify it fails**
Run: `pytest tests/test_workbench.py -v`
Expected: FAIL with "ModuleNotFoundError"
**Step 3: Write minimal implementation**
```python
# src/agentic_workbench/workbench.py
"""Main Workbench class orchestrating all components."""
from pathlib import Path
import yaml
from .catalog import CatalogLoader
from .executor import Executor
from .llm import LLMClient, create_llm_client
from .models import ExecutionResult, SelectedTool
from .navigator import Navigator
class Workbench:
"""Main orchestrator for Agentic Workbench."""
def __init__(self, catalog_path: Path, llm: LLMClient | None = None):
self.catalog_path = Path(catalog_path)
self.catalog = CatalogLoader(self.catalog_path)
# Load config
config_file = self.catalog_path / "_config.yaml"
if config_file.exists():
with open(config_file) as f:
self.config = yaml.safe_load(f)
else:
self.config = {}
# Initialize LLM
if llm:
self.llm = llm
else:
llm_config = self.config.get("llm", {})
self.llm = create_llm_client(
provider=llm_config.get("provider", "cerebras"),
api_key=llm_config.get("api_key", ""),
model=llm_config.get("model"),
)
# Initialize components
self.navigator = Navigator(self.catalog, self.llm)
self.executor = Executor(self.config.get("mcp_servers", {}))
def navigate(self, task: str) -> SelectedTool:
"""Navigate to the right tool for a task."""
return self.navigator.navigate(task)
async def execute(self, task: str) -> ExecutionResult:
"""Execute a task end-to-end."""
selected = self.navigate(task)
return await self.executor.execute(selected)
async def close(self):
"""Clean up resources."""
await self.executor.close_all()
```
**Step 4: Run test to verify it passes**
Run: `pytest tests/test_workbench.py -v`
Expected: PASS (2 passed)
**Step 5: Commit**
```bash
git add src/agentic_workbench/workbench.py tests/test_workbench.py
git commit -m "feat: add main Workbench class orchestrating components"
```
---
## Task 7: CLI Implementation
**Files:**
- Modify: `src/agentic_workbench/cli.py`
- Test: `tests/test_cli.py`
**Step 1: Write the failing test**
```python
# tests/test_cli.py
import pytest
from typer.testing import CliRunner
from agentic_workbench.cli import app
runner = CliRunner()
def test_cli_help():
result = runner.invoke(app, ["--help"])
assert result.exit_code == 0
assert "Agentic Workbench" in result.stdout
def test_cli_catalog_command():
result = runner.invoke(app, ["catalog"])
assert result.exit_code == 0
assert "communication" in result.stdout.lower() or "Tool Catalog" in result.stdout
```
**Step 2: Run test to verify it fails**
Run: `pytest tests/test_cli.py -v`
Expected: FAIL (catalog command not implemented)
**Step 3: Write implementation**
```python
# src/agentic_workbench/cli.py
"""CLI interface for Agentic Workbench."""
from pathlib import Path
import typer
from .catalog import CatalogLoader
app = typer.Typer(
name="awb",
help="Agentic Workbench - MCP-based tool orchestrator",
)
DEFAULT_CATALOG = Path.cwd() / "catalog"
@app.command()
def serve(
catalog: Path = typer.Option(DEFAULT_CATALOG, help="Path to catalog directory"),
):
"""Start the MCP server."""
typer.echo(f"Starting Agentic Workbench MCP server...")
typer.echo(f"Catalog: {catalog}")
# TODO: Implement MCP server
raise typer.Exit(1)
@app.command()
def run(
task: str = typer.Argument(..., help="Task to execute"),
catalog: Path = typer.Option(DEFAULT_CATALOG, help="Path to catalog directory"),
):
"""Execute a task directly."""
typer.echo(f"Executing task: {task}")
typer.echo(f"Catalog: {catalog}")
# TODO: Implement task execution
raise typer.Exit(1)
@app.command()
def catalog(
path: Path = typer.Option(DEFAULT_CATALOG, help="Path to catalog directory"),
):
"""Show the tool catalog."""
typer.echo("Tool Catalog:")
typer.echo("=" * 40)
try:
loader = CatalogLoader(path)
categories = loader.list_categories()
for cat_name in sorted(categories):
cat = loader.get_category(cat_name)
typer.echo(f"\n{cat_name.upper()}")
typer.echo(f" {cat.description}")
services = loader.list_services(cat_name)
for svc_name in sorted(services):
svc = loader.get_service(cat_name, svc_name)
tools = loader.list_tools(cat_name, svc_name)
typer.echo(f" └─ {svc_name}: {len(tools)} tools")
for tool_name in sorted(tools):
tool = loader.get_tool(cat_name, svc_name, tool_name)
typer.echo(f" └─ {tool_name}: {tool.description}")
except Exception as e:
typer.echo(f"Error loading catalog: {e}", err=True)
raise typer.Exit(1)
if __name__ == "__main__":
app()
```
**Step 4: Run test to verify it passes**
Run: `pytest tests/test_cli.py -v`
Expected: PASS (2 passed)
**Step 5: Commit**
```bash
git add src/agentic_workbench/cli.py tests/test_cli.py
git commit -m "feat: implement catalog CLI command"
```
---
## Task 8: MCP Server Interface (FastMCP)
**Files:**
- Create: `src/agentic_workbench/server.py`
- Test: `tests/test_server.py`
**Step 1: Write the failing test**
```python
# tests/test_server.py
import pytest
from agentic_workbench.server import mcp, execute_task
def test_mcp_server_exists():
"""MCP server should be initialized."""
assert mcp is not None
assert mcp.name == "agentic-workbench"
def test_execute_task_function_exists():
"""execute_task should be a callable."""
assert callable(execute_task)
@pytest.mark.asyncio
async def test_execute_task_requires_task():
"""execute_task should require a task parameter."""
import inspect
sig = inspect.signature(execute_task)
assert "task" in sig.parameters
```
**Step 2: Run test to verify it fails**
Run: `pytest tests/test_server.py -v`
Expected: FAIL with "ModuleNotFoundError"
**Step 3: Write implementation**
```python
# src/agentic_workbench/server.py
"""MCP Server for Agentic Workbench using FastMCP.
This is the modern approach (2025) for building MCP servers.
FastMCP uses decorators and type hints to auto-generate tool schemas.
"""
import logging
from pathlib import Path
from mcp.server.fastmcp import FastMCP
# Configure logging to stderr (critical for stdio transport)
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
)
logger = logging.getLogger(__name__)
# Initialize FastMCP server
mcp = FastMCP("agentic-workbench")
# Default catalog path
DEFAULT_CATALOG = Path.cwd() / "catalog"
# Store catalog path (can be configured before running)
_catalog_path: Path = DEFAULT_CATALOG
def set_catalog_path(path: Path) -> None:
"""Set the catalog path before starting the server."""
global _catalog_path
_catalog_path = path
@mcp.tool()
async def execute_task(task: str) -> str:
"""Execute a task by finding and calling the appropriate tool.
This tool navigates through the hierarchical tool catalog to find
the right tool for your task, extracts parameters, and executes it.
Args:
task: Natural language description of what you want to do.
Examples:
- "Send a Slack message to #general saying Hello!"
- "Run a PostgreSQL query: SELECT * FROM users"
- "Upload report.pdf to Google Drive"
"""
from .workbench import Workbench
logger.info(f"Executing task: {task}")
workbench = Workbench(_catalog_path)
try:
# Navigate to find the right tool
selected = workbench.navigate(task)
logger.info(f"Selected tool: {selected.service}.{selected.tool}")
# Execute the tool
result = await workbench.execute(task)
if result.success:
return f"Success: {result.tool_used}\nResult: {result.result}"
else:
return f"Error: {result.tool_used}\nError: {result.error}"
except Exception as e:
logger.error(f"Task execution failed: {e}")
return f"Error executing task: {e}"
finally:
await workbench.close()
def run_server(catalog_path: Path | None = None) -> None:
"""Run the MCP server with stdio transport.
Args:
catalog_path: Path to the catalog directory. Defaults to ./catalog
"""
if catalog_path:
set_catalog_path(catalog_path)
logger.info(f"Starting Agentic Workbench MCP server...")
logger.info(f"Catalog path: {_catalog_path}")
# Run with stdio transport (standard for Claude Desktop integration)
mcp.run(transport="stdio")
```
**Step 4: Run test to verify it passes**
Run: `pytest tests/test_server.py -v`
Expected: PASS (3 passed)
**Step 5: Commit**
```bash
git add src/agentic_workbench/server.py tests/test_server.py
git commit -m "feat: add FastMCP server with execute_task tool"
```
---
## Task 8b: Update CLI Serve Command
**Files:**
- Modify: `src/agentic_workbench/cli.py`
**Step 1: Update the serve command**
```python
# Update the serve command in cli.py
@app.command()
def serve(
catalog: Path = typer.Option(DEFAULT_CATALOG, help="Path to catalog directory"),
):
"""Start the MCP server (stdio transport)."""
from .server import run_server
typer.echo(f"Starting Agentic Workbench MCP server...")
typer.echo(f"Catalog: {catalog}")
typer.echo("Listening on stdio...")
run_server(catalog)
```
**Step 2: Test manually**
Run: `awb serve --help`
Expected: Shows help for serve command
**Step 3: Commit**
```bash
git add src/agentic_workbench/cli.py
git commit -m "feat: implement serve command with FastMCP"
```
---
## Task 9: Package Exports
**Files:**
- Modify: `src/agentic_workbench/__init__.py`
**Step 1: Update exports**
```python
# src/agentic_workbench/__init__.py
"""Agentic Workbench - MCP-based tool orchestrator."""
from .catalog import CatalogLoader
from .executor import Executor
from .llm import LLMClient, CerebrasClient, GeminiClient, create_llm_client
from .models import (
CategoryDefinition,
ExecutionResult,
SelectedTool,
ServiceDefinition,
ToolDefinition,
ToolParameter,
)
from .navigator import Navigator
from .server import mcp, execute_task, run_server
from .workbench import Workbench
__version__ = "0.1.0"
__all__ = [
# Core components
"CatalogLoader",
"Executor",
"Navigator",
"Workbench",
# Models
"CategoryDefinition",
"ExecutionResult",
"SelectedTool",
"ServiceDefinition",
"ToolDefinition",
"ToolParameter",
# LLM
"CerebrasClient",
"GeminiClient",
"LLMClient",
"create_llm_client",
# MCP Server (FastMCP)
"mcp",
"execute_task",
"run_server",
]
```
**Step 2: Run all tests**
Run: `pytest -v`
Expected: All tests pass
**Step 3: Commit**
```bash
git add src/agentic_workbench/__init__.py
git commit -m "feat: export all public APIs from package"
```
---
## Task 10: Claude Code Integration
**Files:**
- Create: `.mcp.json` (Projekt-Level MCP Config für Claude Code)
**Step 1: Create project MCP config**
```json
{
"mcpServers": {
"workbench": {
"command": "uv",
"args": ["run", "awb", "serve"],
"cwd": "/Users/philippbriese/Documents/Setup"
}
}
}
```
**Step 2: Alternative - Globale Claude Code Settings**
Falls du den Server global verfügbar machen willst (nicht nur in diesem Projekt):
```bash
# Claude Code Settings öffnen
claude config
# Oder direkt in ~/.claude/settings.json
```
```json
{
"mcpServers": {
"workbench": {
"command": "/Users/philippbriese/Documents/Setup/.venv/bin/awb",
"args": ["serve"]
}
}
}
```
**Step 3: Test in Claude Code**
Nach dem Neustart von Claude Code sollte das Tool verfügbar sein:
```
Du: "Nutze die Workbench um eine Slack-Nachricht zu senden"
Claude: [ruft execute_task auf]
```
**Step 4: Commit**
```bash
git add .mcp.json README.md
git commit -m "docs: add Claude Code MCP integration"
```
---
## Summary
| Task | Component | Status |
|------|-----------|--------|
| 1 | Pydantic Models | ⬜ |
| 2 | Catalog Loader | ⬜ |
| 3 | LLM Client | ⬜ |
| 4 | Navigator | ⬜ |
| 5 | Executor (Stub) | ⬜ |
| 6 | Workbench | ⬜ |
| 7 | CLI | ⬜ |
| 8 | MCP Server (FastMCP) | ⬜ |
| 8b | CLI Serve Command | ⬜ |
| 9 | Package Exports | ⬜ |
| 10 | Claude Code Integration | ⬜ |
**Danach TODO:**
- Task 11: Echte MCP-Client-Verbindung implementieren (für Tool-Ausführung)
- Task 12: Integration Tests
- Task 13: Environment Variables für API Keys