Agentic Workbench

Overview Schema Related Servers Score Discussions

2025-12-03-agentic-workbench-implementation.md•36.8 KiB

# Agentic Workbench Implementation Plan > **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. **Goal:** Einen MCP-Server bauen, der als Tool-Orchestrator fungiert und Aufgaben mit hierarchischer Navigation an das richtige Tool delegiert. **Architecture:** Drei Kernkomponenten (Catalog, Navigator, Executor) die zusammenarbeiten. Der Catalog lädt die Tool-Hierarchie aus dem Dateisystem. Der Navigator nutzt ein schnelles LLM (Cerebras/Gemini) für schrittweise Navigation. Der Executor verbindet sich zu MCP-Servern und führt Tools aus. **Tech Stack:** Python 3.11+, MCP SDK, httpx, pydantic, typer, pytest --- ## Task 1: Catalog Models (Pydantic) **Files:** - Create: `src/agentic_workbench/models.py` - Test: `tests/test_models.py` **Step 1: Write the failing test** ```python # tests/test_models.py import pytest from agentic_workbench.models import ToolParameter, ToolDefinition, ServiceDefinition, CategoryDefinition def test_tool_parameter_required(): param = ToolParameter( name="channel", type="string", required=True, description="Channel name" ) assert param.name == "channel" assert param.required is True def test_tool_definition(): tool = ToolDefinition( name="send_message", description="Send a message", parameters=[ ToolParameter(name="text", type="string", required=True, description="Message text") ] ) assert tool.name == "send_message" assert len(tool.parameters) == 1 def test_service_definition(): service = ServiceDefinition( name="slack", description="Slack communication", mcp_server="slack-mcp", tools=[] ) assert service.mcp_server == "slack-mcp" def test_category_definition(): category = CategoryDefinition( name="communication", description="Messaging services", services=[] ) assert category.name == "communication" ``` **Step 2: Run test to verify it fails** Run: `pytest tests/test_models.py -v` Expected: FAIL with "ModuleNotFoundError: No module named 'agentic_workbench.models'" **Step 3: Write minimal implementation** ```python # src/agentic_workbench/models.py """Data models for Agentic Workbench.""" from pydantic import BaseModel class ToolParameter(BaseModel): """A parameter for a tool.""" name: str type: str required: bool description: str class ToolDefinition(BaseModel): """A tool that can be executed.""" name: str description: str parameters: list[ToolParameter] = [] class ServiceDefinition(BaseModel): """A service containing multiple tools.""" name: str description: str mcp_server: str tools: list[ToolDefinition] = [] class CategoryDefinition(BaseModel): """A category containing multiple services.""" name: str description: str services: list[ServiceDefinition] = [] class SelectedTool(BaseModel): """Result of navigation - the selected tool with parameters.""" category: str service: str tool: str mcp_server: str parameters: dict = {} class ExecutionResult(BaseModel): """Result of tool execution.""" success: bool tool_used: str result: dict | None = None error: dict | None = None ``` **Step 4: Run test to verify it passes** Run: `pytest tests/test_models.py -v` Expected: PASS (4 passed) **Step 5: Commit** ```bash git add src/agentic_workbench/models.py tests/test_models.py git commit -m "feat: add pydantic models for catalog structure" ``` --- ## Task 2: Catalog Loader **Files:** - Create: `src/agentic_workbench/catalog.py` - Test: `tests/test_catalog.py` **Step 1: Write the failing test** ```python # tests/test_catalog.py import pytest from pathlib import Path from agentic_workbench.catalog import CatalogLoader @pytest.fixture def catalog_path(): return Path(__file__).parent.parent / "catalog" def test_load_categories(catalog_path): loader = CatalogLoader(catalog_path) categories = loader.list_categories() assert "communication" in categories assert "database" in categories assert "files" in categories def test_load_category_description(catalog_path): loader = CatalogLoader(catalog_path) category = loader.get_category("communication") assert category.name == "communication" assert "Messaging" in category.description or "Chat" in category.description def test_load_services(catalog_path): loader = CatalogLoader(catalog_path) services = loader.list_services("communication") assert "slack" in services def test_load_service_definition(catalog_path): loader = CatalogLoader(catalog_path) service = loader.get_service("communication", "slack") assert service.name == "slack" assert service.mcp_server == "slack-mcp" def test_load_tools(catalog_path): loader = CatalogLoader(catalog_path) tools = loader.list_tools("communication", "slack") assert "send_message" in tools def test_load_tool_definition(catalog_path): loader = CatalogLoader(catalog_path) tool = loader.get_tool("communication", "slack", "send_message") assert tool.name == "send_message" assert any(p.name == "channel" for p in tool.parameters) assert any(p.name == "text" for p in tool.parameters) ``` **Step 2: Run test to verify it fails** Run: `pytest tests/test_catalog.py -v` Expected: FAIL with "ModuleNotFoundError: No module named 'agentic_workbench.catalog'" **Step 3: Write minimal implementation** ```python # src/agentic_workbench/catalog.py """Catalog loader for hierarchical tool structure.""" from pathlib import Path import yaml from .models import CategoryDefinition, ServiceDefinition, ToolDefinition, ToolParameter class CatalogLoader: """Loads tool catalog from directory structure.""" def __init__(self, catalog_path: Path): self.catalog_path = Path(catalog_path) def list_categories(self) -> list[str]: """List all category names.""" return [ d.name for d in self.catalog_path.iterdir() if d.is_dir() and not d.name.startswith("_") ] def get_category(self, name: str) -> CategoryDefinition: """Load a category definition.""" category_path = self.catalog_path / name config_file = category_path / "_category.yaml" if config_file.exists(): with open(config_file) as f: data = yaml.safe_load(f) else: data = {"description": name} return CategoryDefinition( name=name, description=data.get("description", name), services=[], ) def list_services(self, category: str) -> list[str]: """List all service names in a category.""" category_path = self.catalog_path / category return [ d.name for d in category_path.iterdir() if d.is_dir() and not d.name.startswith("_") ] def get_service(self, category: str, name: str) -> ServiceDefinition: """Load a service definition.""" service_path = self.catalog_path / category / name config_file = service_path / "_service.yaml" with open(config_file) as f: data = yaml.safe_load(f) return ServiceDefinition( name=name, description=data.get("description", name), mcp_server=data["mcp_server"], tools=[], ) def list_tools(self, category: str, service: str) -> list[str]: """List all tool names in a service.""" service_path = self.catalog_path / category / service return [ f.stem for f in service_path.glob("*.yaml") if not f.name.startswith("_") ] def get_tool(self, category: str, service: str, name: str) -> ToolDefinition: """Load a tool definition.""" tool_file = self.catalog_path / category / service / f"{name}.yaml" with open(tool_file) as f: data = yaml.safe_load(f) parameters = [] for param_name, param_data in data.get("parameters", {}).items(): parameters.append( ToolParameter( name=param_name, type=param_data.get("type", "string"), required=param_data.get("required", False), description=param_data.get("description", ""), ) ) return ToolDefinition( name=name, description=data.get("description", name), parameters=parameters, ) ``` **Step 4: Run test to verify it passes** Run: `pytest tests/test_catalog.py -v` Expected: PASS (6 passed) **Step 5: Commit** ```bash git add src/agentic_workbench/catalog.py tests/test_catalog.py git commit -m "feat: add catalog loader for hierarchical tool structure" ``` --- ## Task 3: LLM Client Abstraction **Files:** - Create: `src/agentic_workbench/llm.py` - Test: `tests/test_llm.py` **Step 1: Write the failing test** ```python # tests/test_llm.py import pytest from agentic_workbench.llm import LLMClient, LLMResponse def test_llm_response_model(): response = LLMResponse(content="communication", raw={}) assert response.content == "communication" def test_llm_client_abstract(): """LLMClient should be abstract.""" with pytest.raises(TypeError): LLMClient() def test_mock_llm_client(): from agentic_workbench.llm import MockLLMClient client = MockLLMClient(responses=["communication", "slack", "send_message"]) assert client.ask("prompt1") == "communication" assert client.ask("prompt2") == "slack" assert client.ask("prompt3") == "send_message" ``` **Step 2: Run test to verify it fails** Run: `pytest tests/test_llm.py -v` Expected: FAIL with "ModuleNotFoundError" **Step 3: Write minimal implementation** ```python # src/agentic_workbench/llm.py """LLM client abstraction for tool selection.""" from abc import ABC, abstractmethod import httpx from pydantic import BaseModel class LLMResponse(BaseModel): """Response from LLM.""" content: str raw: dict = {} class LLMClient(ABC): """Abstract LLM client.""" @abstractmethod def ask(self, prompt: str) -> str: """Ask the LLM a question and get a response.""" pass class MockLLMClient(LLMClient): """Mock LLM client for testing.""" def __init__(self, responses: list[str]): self.responses = responses self.index = 0 def ask(self, prompt: str) -> str: response = self.responses[self.index] self.index += 1 return response class CerebrasClient(LLMClient): """Cerebras LLM client.""" def __init__(self, api_key: str, model: str = "llama-3.3-70b"): self.api_key = api_key self.model = model self.base_url = "https://api.cerebras.ai/v1" def ask(self, prompt: str) -> str: with httpx.Client() as client: response = client.post( f"{self.base_url}/chat/completions", headers={"Authorization": f"Bearer {self.api_key}"}, json={ "model": self.model, "messages": [{"role": "user", "content": prompt}], "max_tokens": 100, }, timeout=30.0, ) response.raise_for_status() data = response.json() return data["choices"][0]["message"]["content"].strip() class GeminiClient(LLMClient): """Google Gemini LLM client.""" def __init__(self, api_key: str, model: str = "gemini-2.0-flash"): self.api_key = api_key self.model = model self.base_url = "https://generativelanguage.googleapis.com/v1beta" def ask(self, prompt: str) -> str: with httpx.Client() as client: response = client.post( f"{self.base_url}/models/{self.model}:generateContent", params={"key": self.api_key}, json={"contents": [{"parts": [{"text": prompt}]}]}, timeout=30.0, ) response.raise_for_status() data = response.json() return data["candidates"][0]["content"]["parts"][0]["text"].strip() def create_llm_client(provider: str, api_key: str, model: str | None = None) -> LLMClient: """Factory function to create LLM client.""" if provider == "cerebras": return CerebrasClient(api_key, model or "llama-3.3-70b") elif provider == "gemini": return GeminiClient(api_key, model or "gemini-2.0-flash") else: raise ValueError(f"Unknown provider: {provider}") ``` **Step 4: Run test to verify it passes** Run: `pytest tests/test_llm.py -v` Expected: PASS (3 passed) **Step 5: Commit** ```bash git add src/agentic_workbench/llm.py tests/test_llm.py git commit -m "feat: add LLM client abstraction with Cerebras and Gemini support" ``` --- ## Task 4: Navigator Component **Files:** - Create: `src/agentic_workbench/navigator.py` - Test: `tests/test_navigator.py` **Step 1: Write the failing test** ```python # tests/test_navigator.py import pytest from pathlib import Path from agentic_workbench.navigator import Navigator from agentic_workbench.catalog import CatalogLoader from agentic_workbench.llm import MockLLMClient @pytest.fixture def catalog_path(): return Path(__file__).parent.parent / "catalog" @pytest.fixture def navigator(catalog_path): catalog = CatalogLoader(catalog_path) # Mock responses: category, service, tool, parameters JSON llm = MockLLMClient([ "communication", "slack", "send_message", '{"channel": "#general", "text": "Hello!"}' ]) return Navigator(catalog, llm) def test_navigate_to_tool(navigator): result = navigator.navigate("Send a Slack message to #general saying Hello!") assert result.category == "communication" assert result.service == "slack" assert result.tool == "send_message" assert result.mcp_server == "slack-mcp" def test_navigate_extracts_parameters(navigator): result = navigator.navigate("Send a Slack message to #general saying Hello!") assert result.parameters["channel"] == "#general" assert result.parameters["text"] == "Hello!" def test_navigate_skips_single_option(catalog_path): """When only one option exists, skip LLM call.""" catalog = CatalogLoader(catalog_path) # Only need: category, parameters (service and tool auto-selected if single) llm = MockLLMClient([ "database", '{"sql": "SELECT * FROM users"}' ]) navigator = Navigator(catalog, llm) result = navigator.navigate("Run a PostgreSQL query") # postgres is only service, query is only tool assert result.category == "database" assert result.service == "postgres" assert result.tool == "query" ``` **Step 2: Run test to verify it fails** Run: `pytest tests/test_navigator.py -v` Expected: FAIL with "ModuleNotFoundError" **Step 3: Write minimal implementation** ```python # src/agentic_workbench/navigator.py """Navigator for hierarchical tool selection.""" import json from .catalog import CatalogLoader from .llm import LLMClient from .models import SelectedTool class Navigator: """Navigates through tool hierarchy using LLM.""" def __init__(self, catalog: CatalogLoader, llm: LLMClient): self.catalog = catalog self.llm = llm def navigate(self, task: str) -> SelectedTool: """Navigate to the right tool for a task.""" # Step 1: Select category category = self._select_category(task) # Step 2: Select service (or auto-select if only one) service = self._select_service(task, category) # Step 3: Select tool (or auto-select if only one) tool = self._select_tool(task, category, service) # Step 4: Get MCP server service_def = self.catalog.get_service(category, service) mcp_server = service_def.mcp_server # Step 5: Extract parameters tool_def = self.catalog.get_tool(category, service, tool) parameters = self._extract_parameters(task, tool_def) return SelectedTool( category=category, service=service, tool=tool, mcp_server=mcp_server, parameters=parameters, ) def _select_category(self, task: str) -> str: """Select category using LLM.""" categories = self.catalog.list_categories() if len(categories) == 1: return categories[0] options = [] for cat_name in categories: cat = self.catalog.get_category(cat_name) options.append(f"• {cat_name}: {cat.description}") prompt = f"""Task: {task} Which category fits best? {chr(10).join(options)} Reply with ONLY the category name, nothing else.""" response = self.llm.ask(prompt) return response.strip().lower() def _select_service(self, task: str, category: str) -> str: """Select service using LLM.""" services = self.catalog.list_services(category) if len(services) == 1: return services[0] options = [] for svc_name in services: svc = self.catalog.get_service(category, svc_name) tools = self.catalog.list_tools(category, svc_name) options.append(f"• {svc_name}: {svc.description} ({len(tools)} tools)") prompt = f"""Task: {task} Category: {category} Which service fits best? {chr(10).join(options)} Reply with ONLY the service name, nothing else.""" response = self.llm.ask(prompt) return response.strip().lower() def _select_tool(self, task: str, category: str, service: str) -> str: """Select tool using LLM.""" tools = self.catalog.list_tools(category, service) if len(tools) == 1: return tools[0] options = [] for tool_name in tools: tool = self.catalog.get_tool(category, service, tool_name) options.append(f"• {tool_name}: {tool.description}") prompt = f"""Task: {task} Service: {service} Which tool fits best? {chr(10).join(options)} Reply with ONLY the tool name, nothing else.""" response = self.llm.ask(prompt) return response.strip().lower() def _extract_parameters(self, task: str, tool) -> dict: """Extract parameters from task using LLM.""" if not tool.parameters: return {} params_desc = [] for p in tool.parameters: req = "(required)" if p.required else "(optional)" params_desc.append(f"• {p.name}: {p.description} {req}") prompt = f"""Task: {task} Tool: {tool.name} - {tool.description} Extract these parameters from the task: {chr(10).join(params_desc)} Reply with ONLY valid JSON, nothing else. Example: {{"param": "value"}}""" response = self.llm.ask(prompt) return json.loads(response.strip()) ``` **Step 4: Run test to verify it passes** Run: `pytest tests/test_navigator.py -v` Expected: PASS (3 passed) **Step 5: Commit** ```bash git add src/agentic_workbench/navigator.py tests/test_navigator.py git commit -m "feat: add navigator for hierarchical tool selection" ``` --- ## Task 5: Executor Component (Stub) **Files:** - Create: `src/agentic_workbench/executor.py` - Test: `tests/test_executor.py` **Step 1: Write the failing test** ```python # tests/test_executor.py import pytest from agentic_workbench.executor import Executor from agentic_workbench.models import SelectedTool, ExecutionResult @pytest.fixture def executor(): mcp_configs = { "mock-mcp": { "command": "echo", "args": ["mock"], } } return Executor(mcp_configs) def test_executor_init(executor): assert executor.connections == {} def test_execution_result_success(): result = ExecutionResult( success=True, tool_used="slack.send_message", result={"ok": True, "ts": "123"} ) assert result.success is True assert result.error is None def test_execution_result_failure(): result = ExecutionResult( success=False, tool_used="slack.send_message", error={"type": "api_error", "message": "Channel not found"} ) assert result.success is False assert result.error["type"] == "api_error" ``` **Step 2: Run test to verify it fails** Run: `pytest tests/test_executor.py -v` Expected: FAIL with "ModuleNotFoundError" **Step 3: Write minimal implementation** ```python # src/agentic_workbench/executor.py """Executor for MCP tool calls.""" from .models import ExecutionResult, SelectedTool class MCPConnection: """Connection to an MCP server (stub).""" def __init__(self, command: str, args: list[str], env: dict | None = None): self.command = command self.args = args self.env = env or {} self.connected = False async def connect(self): """Connect to the MCP server.""" # TODO: Implement actual MCP connection self.connected = True async def call_tool(self, name: str, arguments: dict) -> dict: """Call a tool on the MCP server.""" # TODO: Implement actual MCP tool call raise NotImplementedError("MCP connection not yet implemented") async def close(self): """Close the connection.""" self.connected = False class Executor: """Executes tools on MCP servers.""" def __init__(self, mcp_configs: dict): self.mcp_configs = mcp_configs self.connections: dict[str, MCPConnection] = {} async def execute(self, selected: SelectedTool) -> ExecutionResult: """Execute the selected tool.""" try: connection = await self._get_connection(selected.mcp_server) result = await connection.call_tool( name=selected.tool, arguments=selected.parameters, ) return ExecutionResult( success=True, tool_used=f"{selected.service}.{selected.tool}", result=result, ) except Exception as e: return ExecutionResult( success=False, tool_used=f"{selected.service}.{selected.tool}", error={"type": "execution_error", "message": str(e)}, ) async def _get_connection(self, server_name: str) -> MCPConnection: """Get or create connection to MCP server.""" if server_name not in self.connections: config = self.mcp_configs.get(server_name) if not config: raise ValueError(f"Unknown MCP server: {server_name}") connection = MCPConnection( command=config["command"], args=config.get("args", []), env=config.get("env"), ) await connection.connect() self.connections[server_name] = connection return self.connections[server_name] async def close_all(self): """Close all connections.""" for connection in self.connections.values(): await connection.close() self.connections.clear() ``` **Step 4: Run test to verify it passes** Run: `pytest tests/test_executor.py -v` Expected: PASS (3 passed) **Step 5: Commit** ```bash git add src/agentic_workbench/executor.py tests/test_executor.py git commit -m "feat: add executor stub for MCP tool calls" ``` --- ## Task 6: Main Workbench Class **Files:** - Create: `src/agentic_workbench/workbench.py` - Test: `tests/test_workbench.py` **Step 1: Write the failing test** ```python # tests/test_workbench.py import pytest from pathlib import Path from agentic_workbench.workbench import Workbench from agentic_workbench.llm import MockLLMClient @pytest.fixture def catalog_path(): return Path(__file__).parent.parent / "catalog" @pytest.fixture def workbench(catalog_path): llm = MockLLMClient([ "communication", "slack", "send_message", '{"channel": "#general", "text": "Hello!"}' ]) return Workbench(catalog_path, llm) def test_workbench_init(workbench): assert workbench.catalog is not None assert workbench.navigator is not None assert workbench.executor is not None def test_workbench_navigate(workbench): result = workbench.navigate("Send a Slack message to #general") assert result.tool == "send_message" assert result.mcp_server == "slack-mcp" ``` **Step 2: Run test to verify it fails** Run: `pytest tests/test_workbench.py -v` Expected: FAIL with "ModuleNotFoundError" **Step 3: Write minimal implementation** ```python # src/agentic_workbench/workbench.py """Main Workbench class orchestrating all components.""" from pathlib import Path import yaml from .catalog import CatalogLoader from .executor import Executor from .llm import LLMClient, create_llm_client from .models import ExecutionResult, SelectedTool from .navigator import Navigator class Workbench: """Main orchestrator for Agentic Workbench.""" def __init__(self, catalog_path: Path, llm: LLMClient | None = None): self.catalog_path = Path(catalog_path) self.catalog = CatalogLoader(self.catalog_path) # Load config config_file = self.catalog_path / "_config.yaml" if config_file.exists(): with open(config_file) as f: self.config = yaml.safe_load(f) else: self.config = {} # Initialize LLM if llm: self.llm = llm else: llm_config = self.config.get("llm", {}) self.llm = create_llm_client( provider=llm_config.get("provider", "cerebras"), api_key=llm_config.get("api_key", ""), model=llm_config.get("model"), ) # Initialize components self.navigator = Navigator(self.catalog, self.llm) self.executor = Executor(self.config.get("mcp_servers", {})) def navigate(self, task: str) -> SelectedTool: """Navigate to the right tool for a task.""" return self.navigator.navigate(task) async def execute(self, task: str) -> ExecutionResult: """Execute a task end-to-end.""" selected = self.navigate(task) return await self.executor.execute(selected) async def close(self): """Clean up resources.""" await self.executor.close_all() ``` **Step 4: Run test to verify it passes** Run: `pytest tests/test_workbench.py -v` Expected: PASS (2 passed) **Step 5: Commit** ```bash git add src/agentic_workbench/workbench.py tests/test_workbench.py git commit -m "feat: add main Workbench class orchestrating components" ``` --- ## Task 7: CLI Implementation **Files:** - Modify: `src/agentic_workbench/cli.py` - Test: `tests/test_cli.py` **Step 1: Write the failing test** ```python # tests/test_cli.py import pytest from typer.testing import CliRunner from agentic_workbench.cli import app runner = CliRunner() def test_cli_help(): result = runner.invoke(app, ["--help"]) assert result.exit_code == 0 assert "Agentic Workbench" in result.stdout def test_cli_catalog_command(): result = runner.invoke(app, ["catalog"]) assert result.exit_code == 0 assert "communication" in result.stdout.lower() or "Tool Catalog" in result.stdout ``` **Step 2: Run test to verify it fails** Run: `pytest tests/test_cli.py -v` Expected: FAIL (catalog command not implemented) **Step 3: Write implementation** ```python # src/agentic_workbench/cli.py """CLI interface for Agentic Workbench.""" from pathlib import Path import typer from .catalog import CatalogLoader app = typer.Typer( name="awb", help="Agentic Workbench - MCP-based tool orchestrator", ) DEFAULT_CATALOG = Path.cwd() / "catalog" @app.command() def serve( catalog: Path = typer.Option(DEFAULT_CATALOG, help="Path to catalog directory"), ): """Start the MCP server.""" typer.echo(f"Starting Agentic Workbench MCP server...") typer.echo(f"Catalog: {catalog}") # TODO: Implement MCP server raise typer.Exit(1) @app.command() def run( task: str = typer.Argument(..., help="Task to execute"), catalog: Path = typer.Option(DEFAULT_CATALOG, help="Path to catalog directory"), ): """Execute a task directly.""" typer.echo(f"Executing task: {task}") typer.echo(f"Catalog: {catalog}") # TODO: Implement task execution raise typer.Exit(1) @app.command() def catalog( path: Path = typer.Option(DEFAULT_CATALOG, help="Path to catalog directory"), ): """Show the tool catalog.""" typer.echo("Tool Catalog:") typer.echo("=" * 40) try: loader = CatalogLoader(path) categories = loader.list_categories() for cat_name in sorted(categories): cat = loader.get_category(cat_name) typer.echo(f"\n{cat_name.upper()}") typer.echo(f" {cat.description}") services = loader.list_services(cat_name) for svc_name in sorted(services): svc = loader.get_service(cat_name, svc_name) tools = loader.list_tools(cat_name, svc_name) typer.echo(f" └─ {svc_name}: {len(tools)} tools") for tool_name in sorted(tools): tool = loader.get_tool(cat_name, svc_name, tool_name) typer.echo(f" └─ {tool_name}: {tool.description}") except Exception as e: typer.echo(f"Error loading catalog: {e}", err=True) raise typer.Exit(1) if __name__ == "__main__": app() ``` **Step 4: Run test to verify it passes** Run: `pytest tests/test_cli.py -v` Expected: PASS (2 passed) **Step 5: Commit** ```bash git add src/agentic_workbench/cli.py tests/test_cli.py git commit -m "feat: implement catalog CLI command" ``` --- ## Task 8: MCP Server Interface (FastMCP) **Files:** - Create: `src/agentic_workbench/server.py` - Test: `tests/test_server.py` **Step 1: Write the failing test** ```python # tests/test_server.py import pytest from agentic_workbench.server import mcp, execute_task def test_mcp_server_exists(): """MCP server should be initialized.""" assert mcp is not None assert mcp.name == "agentic-workbench" def test_execute_task_function_exists(): """execute_task should be a callable.""" assert callable(execute_task) @pytest.mark.asyncio async def test_execute_task_requires_task(): """execute_task should require a task parameter.""" import inspect sig = inspect.signature(execute_task) assert "task" in sig.parameters ``` **Step 2: Run test to verify it fails** Run: `pytest tests/test_server.py -v` Expected: FAIL with "ModuleNotFoundError" **Step 3: Write implementation** ```python # src/agentic_workbench/server.py """MCP Server for Agentic Workbench using FastMCP. This is the modern approach (2025) for building MCP servers. FastMCP uses decorators and type hints to auto-generate tool schemas. """ import logging from pathlib import Path from mcp.server.fastmcp import FastMCP # Configure logging to stderr (critical for stdio transport) logging.basicConfig( level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", ) logger = logging.getLogger(__name__) # Initialize FastMCP server mcp = FastMCP("agentic-workbench") # Default catalog path DEFAULT_CATALOG = Path.cwd() / "catalog" # Store catalog path (can be configured before running) _catalog_path: Path = DEFAULT_CATALOG def set_catalog_path(path: Path) -> None: """Set the catalog path before starting the server.""" global _catalog_path _catalog_path = path @mcp.tool() async def execute_task(task: str) -> str: """Execute a task by finding and calling the appropriate tool. This tool navigates through the hierarchical tool catalog to find the right tool for your task, extracts parameters, and executes it. Args: task: Natural language description of what you want to do. Examples: - "Send a Slack message to #general saying Hello!" - "Run a PostgreSQL query: SELECT * FROM users" - "Upload report.pdf to Google Drive" """ from .workbench import Workbench logger.info(f"Executing task: {task}") workbench = Workbench(_catalog_path) try: # Navigate to find the right tool selected = workbench.navigate(task) logger.info(f"Selected tool: {selected.service}.{selected.tool}") # Execute the tool result = await workbench.execute(task) if result.success: return f"Success: {result.tool_used}\nResult: {result.result}" else: return f"Error: {result.tool_used}\nError: {result.error}" except Exception as e: logger.error(f"Task execution failed: {e}") return f"Error executing task: {e}" finally: await workbench.close() def run_server(catalog_path: Path | None = None) -> None: """Run the MCP server with stdio transport. Args: catalog_path: Path to the catalog directory. Defaults to ./catalog """ if catalog_path: set_catalog_path(catalog_path) logger.info(f"Starting Agentic Workbench MCP server...") logger.info(f"Catalog path: {_catalog_path}") # Run with stdio transport (standard for Claude Desktop integration) mcp.run(transport="stdio") ``` **Step 4: Run test to verify it passes** Run: `pytest tests/test_server.py -v` Expected: PASS (3 passed) **Step 5: Commit** ```bash git add src/agentic_workbench/server.py tests/test_server.py git commit -m "feat: add FastMCP server with execute_task tool" ``` --- ## Task 8b: Update CLI Serve Command **Files:** - Modify: `src/agentic_workbench/cli.py` **Step 1: Update the serve command** ```python # Update the serve command in cli.py @app.command() def serve( catalog: Path = typer.Option(DEFAULT_CATALOG, help="Path to catalog directory"), ): """Start the MCP server (stdio transport).""" from .server import run_server typer.echo(f"Starting Agentic Workbench MCP server...") typer.echo(f"Catalog: {catalog}") typer.echo("Listening on stdio...") run_server(catalog) ``` **Step 2: Test manually** Run: `awb serve --help` Expected: Shows help for serve command **Step 3: Commit** ```bash git add src/agentic_workbench/cli.py git commit -m "feat: implement serve command with FastMCP" ``` --- ## Task 9: Package Exports **Files:** - Modify: `src/agentic_workbench/__init__.py` **Step 1: Update exports** ```python # src/agentic_workbench/__init__.py """Agentic Workbench - MCP-based tool orchestrator.""" from .catalog import CatalogLoader from .executor import Executor from .llm import LLMClient, CerebrasClient, GeminiClient, create_llm_client from .models import ( CategoryDefinition, ExecutionResult, SelectedTool, ServiceDefinition, ToolDefinition, ToolParameter, ) from .navigator import Navigator from .server import mcp, execute_task, run_server from .workbench import Workbench __version__ = "0.1.0" __all__ = [ # Core components "CatalogLoader", "Executor", "Navigator", "Workbench", # Models "CategoryDefinition", "ExecutionResult", "SelectedTool", "ServiceDefinition", "ToolDefinition", "ToolParameter", # LLM "CerebrasClient", "GeminiClient", "LLMClient", "create_llm_client", # MCP Server (FastMCP) "mcp", "execute_task", "run_server", ] ``` **Step 2: Run all tests** Run: `pytest -v` Expected: All tests pass **Step 3: Commit** ```bash git add src/agentic_workbench/__init__.py git commit -m "feat: export all public APIs from package" ``` --- ## Task 10: Claude Code Integration **Files:** - Create: `.mcp.json` (Projekt-Level MCP Config für Claude Code) **Step 1: Create project MCP config** ```json { "mcpServers": { "workbench": { "command": "uv", "args": ["run", "awb", "serve"], "cwd": "/Users/philippbriese/Documents/Setup" } } } ``` **Step 2: Alternative - Globale Claude Code Settings** Falls du den Server global verfügbar machen willst (nicht nur in diesem Projekt): ```bash # Claude Code Settings öffnen claude config # Oder direkt in ~/.claude/settings.json ``` ```json { "mcpServers": { "workbench": { "command": "/Users/philippbriese/Documents/Setup/.venv/bin/awb", "args": ["serve"] } } } ``` **Step 3: Test in Claude Code** Nach dem Neustart von Claude Code sollte das Tool verfügbar sein: ``` Du: "Nutze die Workbench um eine Slack-Nachricht zu senden" Claude: [ruft execute_task auf] ``` **Step 4: Commit** ```bash git add .mcp.json README.md git commit -m "docs: add Claude Code MCP integration" ``` --- ## Summary | Task | Component | Status | |------|-----------|--------| | 1 | Pydantic Models | ⬜ | | 2 | Catalog Loader | ⬜ | | 3 | LLM Client | ⬜ | | 4 | Navigator | ⬜ | | 5 | Executor (Stub) | ⬜ | | 6 | Workbench | ⬜ | | 7 | CLI | ⬜ | | 8 | MCP Server (FastMCP) | ⬜ | | 8b | CLI Serve Command | ⬜ | | 9 | Package Exports | ⬜ | | 10 | Claude Code Integration | ⬜ | **Danach TODO:** - Task 11: Echte MCP-Client-Verbindung implementieren (für Tool-Ausführung) - Task 12: Integration Tests - Task 13: Environment Variables für API Keys

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/0ui-labs/agentic-workbench'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

2025-12-03-agentic-workbench-implementation.md•36.8 KiB