Codebase MCP Server

tool-annotations.md•20.1 KiB

# Tool Annotations Implementation Plan ## Overview **Purpose**: Add MCP tool hints (`openWorldHint`, `readOnlyHint`, `idempotentHint`) to all tools in the codebase-mcp server to help LLMs understand tool behavior patterns and make better decisions about tool usage. **MCP Best Practice Alignment**: This aligns with MCP best practices for tool annotation. Hints help LLMs understand: - **openWorldHint**: Tool interacts with external systems (filesystem, databases, sessions) - **readOnlyHint**: Tool doesn't modify state (safe to call repeatedly) - **idempotentHint**: Tool can be called multiple times with same result **Expected Benefits**: - LLMs make smarter tool selection decisions - Reduced unnecessary tool calls (LLMs know which tools are safe to retry) - Better parallel execution planning (LLMs know which tools can run concurrently) - Improved error recovery (LLMs know which operations are safe to retry) - Better user experience through optimized tool orchestration ## Current State Analysis ### What Exists Today - Three MCP tools implemented with `@mcp.tool()` decorator: - `index_repository`: Indexes code repository - `search_code`: Performs semantic search - `set_working_directory`: Sets session working directory - Tools defined in: - `src/mcp/tools/indexing.py` (index_repository) - `src/mcp/tools/search.py` (search_code) - `src/mcp/tools/project.py` (set_working_directory) - No MCP hints currently applied ### Gaps/Limitations - LLMs must infer tool behavior from docstrings alone - No explicit indication of: - Which tools interact with external state (filesystem, database) - Which tools are read-only vs. write operations - Which tools are idempotent (safe to retry) - LLMs may make suboptimal decisions about: - When to retry failed operations - Which tools to run in parallel - Which tools are safe to call speculatively ## Proposed Solution ### High-Level Approach Add MCP hints to the `@mcp.tool()` decorator for all three tools based on their actual behavior: 1. **index_repository**: - `openWorldHint=True`: Interacts with filesystem (reads files) and database (writes chunks) - `readOnlyHint=False`: Modifies database state (creates repository, chunks) - `idempotentHint=False`: Multiple calls may have side effects (re-indexing logic) 2. **search_code**: - `openWorldHint=True`: Interacts with database (reads chunks) and Ollama (embeddings) - `readOnlyHint=True`: Only reads from database, no state modifications - `idempotentHint=True`: Same query always returns same results (deterministic) 3. **set_working_directory**: - `openWorldHint=True`: Modifies session state (stores working directory context) - `readOnlyHint=False`: Modifies session manager state - `idempotentHint=True`: Setting same directory multiple times has same effect ### Key Design Decisions **Decision 1: Conservative Hint Application** - **Choice**: Only apply hints when behavior is clear and deterministic - **Rationale**: Wrong hints are worse than no hints (mislead LLMs) - **Trade-off**: May miss optimization opportunities, but ensures correctness **Decision 2: Document Hint Rationale** - **Choice**: Add inline comments explaining each hint's reasoning - **Rationale**: Helps future maintainers understand hint choices - **Trade-off**: Slightly more verbose code, but much clearer intent **Decision 3: Validate Against Actual Behavior** - **Choice**: Audit actual tool implementation before adding hints - **Rationale**: Hints must match reality, not assumptions - **Trade-off**: Requires thorough code review, but ensures accuracy **Decision 4: Test Hint Effectiveness** - **Choice**: Create integration tests validating LLM behavior with hints - **Rationale**: Verify hints actually improve LLM decision-making - **Trade-off**: Harder to test, but proves value ### Trade-offs Considered | Approach | Pros | Cons | Decision | |----------|------|------|----------| | Add hints to all tools | Complete coverage | Risk of wrong hints | ✅ **Selected** (after audit) | | Add hints incrementally | Low risk | Incomplete optimization | ❌ Rejected | | No hints | No maintenance | LLMs guess behavior | ❌ Rejected | | Document in docstrings | Human-readable | LLMs may miss | ❌ Rejected | | Infer from code | Automatic | Complex, error-prone | ❌ Rejected | | Manual annotation | Explicit, auditable | Requires review | ✅ **Selected** | ## Technical Design ### Hint Definitions for Each Tool #### 1. index_repository **Current Decorator**: ```python @mcp.tool() async def index_repository( repo_path: str, project_id: str | None = None, force_reindex: bool = False, ctx: Context | None = None, ) -> dict[str, Any]: ``` **Updated Decorator with Hints**: ```python @mcp.tool( # Hint: Interacts with filesystem (reads files) and database (writes chunks/embeddings) openWorldHint=True, # Hint: Modifies database state by creating/updating repository and chunk records readOnlyHint=False, # Hint: NOT idempotent - force_reindex=True causes different behavior on repeated calls # First call creates records, subsequent calls may update or skip based on state idempotentHint=False, ) async def index_repository( repo_path: str, project_id: str | None = None, force_reindex: bool = False, ctx: Context | None = None, ) -> dict[str, Any]: ``` **Rationale**: - **openWorldHint=True**: Tool reads from filesystem (via `Path(repo_path)`) and writes to PostgreSQL - **readOnlyHint=False**: Creates Repository, CodeFile, and CodeChunk records in database - **idempotentHint=False**: Behavior changes based on existing state: - First call: Creates all records - Subsequent call without `force_reindex`: May skip already-indexed files - Subsequent call with `force_reindex=True`: Re-indexes everything - Not deterministic across calls **Expected LLM Behavior Change**: - LLMs will recognize this as a "heavy" operation with side effects - Won't retry on failure without explicit user confirmation - Won't run in parallel with other database-modifying operations - Will check if indexing already occurred before calling again #### 2. search_code **Current Decorator**: ```python @mcp.tool() async def search_code( query: str, project_id: str | None = None, repository_id: str | None = None, file_type: str | None = None, directory: str | None = None, limit: int = 10, ctx: Context | None = None, ) -> dict[str, Any]: ``` **Updated Decorator with Hints**: ```python @mcp.tool( # Hint: Interacts with database (reads chunks via pgvector) and Ollama (generates embeddings) openWorldHint=True, # Hint: Read-only operation - only queries database, doesn't modify state readOnlyHint=True, # Hint: Idempotent - same query parameters always return same results # (Deterministic given same database state and embedding model) idempotentHint=True, ) async def search_code( query: str, project_id: str | None = None, repository_id: str | None = None, file_type: str | None = None, directory: str | None = None, limit: int = 10, ctx: Context | None = None, ) -> dict[str, Any]: ``` **Rationale**: - **openWorldHint=True**: Queries PostgreSQL database and Ollama embedding service - **readOnlyHint=True**: Only performs SELECT queries, no INSERT/UPDATE/DELETE - **idempotentHint=True**: Same query with same parameters returns same results (deterministic) - Embedding generation is deterministic for same query text - pgvector similarity search is deterministic for same embedding - Filters (repository_id, file_type, directory) are deterministic **Expected LLM Behavior Change**: - LLMs will recognize this as safe to retry on transient failures - Can run multiple searches in parallel (read-only, no conflicts) - Safe to call speculatively to explore results - Can cache results for same query parameters #### 3. set_working_directory **Current Decorator**: ```python @mcp.tool() async def set_working_directory( directory: str, ctx: Context | None = None, ) -> dict[str, Any]: ``` **Updated Decorator with Hints**: ```python @mcp.tool( # Hint: Modifies session state in SessionContextManager (stores working directory) openWorldHint=True, # Hint: Modifies session state - not a read-only operation readOnlyHint=False, # Hint: Idempotent - setting same directory multiple times has same effect # (Last-write-wins semantics, no accumulation of state) idempotentHint=True, ) async def set_working_directory( directory: str, ctx: Context | None = None, ) -> dict[str, Any]: ``` **Rationale**: - **openWorldHint=True**: Modifies session context state in SessionContextManager - **readOnlyHint=False**: Updates session state (working_directory field) - **idempotentHint=True**: Calling with same directory multiple times results in same final state - Last write wins - No accumulation or side effects - Deterministic outcome regardless of call count **Expected LLM Behavior Change**: - LLMs will recognize this as a state-modifying operation - Won't retry unnecessarily (since it's idempotent) - Can call multiple times without concern (idempotent) - Will recognize this as a "setup" operation to call early in workflow ### FastMCP Hint Syntax FastMCP uses a `hints` parameter in the `@mcp.tool()` decorator. The full syntax is: ```python from fastmcp import FastMCP from fastmcp.hints import ToolHints @mcp.tool( hints=ToolHints( openWorld=True, readOnly=False, idempotent=False, ) ) async def my_tool(...) -> ...: pass ``` **Note**: FastMCP 0.4.0+ supports inline hint parameters: ```python @mcp.tool( openWorldHint=True, readOnlyHint=False, idempotentHint=False, ) async def my_tool(...) -> ...: pass ``` We'll use the inline parameter syntax for cleaner code. ### Expected LLM Behavior Improvements #### Before Hints ``` LLM: "User wants to search for authentication code. Let me call search_code." [search_code fails with connection error] LLM: "Hmm, not sure if I should retry. This might modify state. I'll ask the user." User: "Just retry it" LLM: [retries successfully] ``` #### After Hints ``` LLM: "User wants to search for authentication code. Let me call search_code." [search_code fails with connection error] LLM: "This is a read-only, idempotent operation. Safe to retry automatically." LLM: [retries successfully without asking] User: "Found it, thanks!" ``` #### Parallel Execution Before Hints ``` LLM: "User wants to search in Python and JavaScript files. Not sure if I can run these in parallel." LLM: [runs searches sequentially] [Takes 1000ms total] ``` #### Parallel Execution After Hints ``` LLM: "User wants to search in Python and JavaScript files. Both are read-only and idempotent." LLM: [runs searches in parallel] [Takes 500ms total] ``` ### Testing Approach for Hint Effectiveness Create `tests/integration/test_tool_hints.py`: ```python """Integration tests validating MCP tool hints improve LLM behavior. Tests scenarios where hints should improve tool selection and orchestration. """ import pytest class TestToolHintsBehavior: """Test that tool hints correctly describe actual behavior.""" async def test_index_repository_openworld_hint(self): """Verify index_repository actually interacts with filesystem and database.""" # Verify filesystem interaction (reads files) # Verify database interaction (creates records) pass async def test_index_repository_not_readonly(self): """Verify index_repository modifies database state.""" # Call index_repository # Verify new records exist in database pass async def test_index_repository_not_idempotent(self): """Verify index_repository behavior changes on repeated calls.""" # First call: Creates records # Second call without force_reindex: May skip files # Second call with force_reindex: Re-indexes pass async def test_search_code_openworld_hint(self): """Verify search_code interacts with database and Ollama.""" # Verify database query executed # Verify Ollama embedding call made pass async def test_search_code_readonly(self): """Verify search_code doesn't modify database state.""" # Take database snapshot # Call search_code # Verify database unchanged pass async def test_search_code_idempotent(self): """Verify search_code returns same results on repeated calls.""" # Call search_code with same parameters # Verify results identical pass async def test_set_working_directory_openworld_hint(self): """Verify set_working_directory modifies session state.""" # Verify session context updated pass async def test_set_working_directory_not_readonly(self): """Verify set_working_directory modifies state.""" # Get initial session state # Call set_working_directory # Verify state changed pass async def test_set_working_directory_idempotent(self): """Verify set_working_directory can be called multiple times safely.""" # Call set_working_directory 3 times with same directory # Verify final state same as after first call pass ``` ### Error Handling **Scenario 1: FastMCP Doesn't Support Hints** ```python # Fallback for older FastMCP versions try: @mcp.tool(openWorldHint=True, readOnlyHint=True, idempotentHint=True) async def search_code(...): pass except TypeError: # FastMCP version doesn't support hints, use decorator without hints logger.warning("FastMCP version doesn't support tool hints - upgrade to 0.4.0+") @mcp.tool() async def search_code(...): pass ``` **Scenario 2: Incorrect Hints** ```python # If behavior changes, update hints immediately # Example: If search_code becomes non-deterministic due to caching invalidation @mcp.tool( openWorldHint=True, readOnlyHint=True, idempotentHint=False, # CHANGED: No longer deterministic ) async def search_code(...): pass ``` ## Implementation Steps ### Step 1: Audit Tool Implementations - Review `index_repository` implementation in `src/services/indexer.py` - Review `search_code` implementation in `src/services/searcher.py` - Review `set_working_directory` implementation in `src/mcp/tools/project.py` - Document actual behavior for each tool (filesystem, database, state changes) - **Dependencies**: None - **Testing**: Manual code review ### Step 2: Add Hints to index_repository - Update `@mcp.tool()` decorator in `src/mcp/tools/indexing.py` - Add inline comments explaining each hint - Update docstring to mention hint behavior - **Dependencies**: Step 1 - **Testing**: mypy --strict validation, server starts successfully ### Step 3: Add Hints to search_code - Update `@mcp.tool()` decorator in `src/mcp/tools/search.py` - Add inline comments explaining each hint - Update docstring to mention hint behavior - **Dependencies**: Step 1 - **Testing**: mypy --strict validation, server starts successfully ### Step 4: Add Hints to set_working_directory - Update `@mcp.tool()` decorator in `src/mcp/tools/project.py` - Add inline comments explaining each hint - Update docstring to mention hint behavior - **Dependencies**: Step 1 - **Testing**: mypy --strict validation, server starts successfully ### Step 5: Create Hint Validation Tests - Create `tests/integration/test_tool_hints.py` - Implement tests validating each hint matches actual behavior - Test openWorldHint (filesystem/database interaction) - Test readOnlyHint (no state modification) - Test idempotentHint (deterministic results) - **Dependencies**: Steps 2-4 - **Testing**: 100% pass rate on new tests ### Step 6: Update Documentation - Update tool docstrings with hint explanations - Add "Tool Hints" section to README.md - Document expected LLM behavior improvements - Add troubleshooting guide for hint-related issues - **Dependencies**: Step 5 - **Testing**: Manual documentation review ## Success Criteria ### Measurable Outcomes 1. **Coverage**: All 3 tools have appropriate hints applied 2. **Accuracy**: All hints match actual tool behavior (validated by tests) 3. **Documentation**: Each hint has inline comment explaining rationale 4. **Testing**: Integration tests validate each hint claim 5. **Type Safety**: mypy --strict passes with no errors ### How to Validate Completion 1. Run `grep -r "@mcp.tool(" src/mcp/tools/` - verify hints on all decorators 2. Run integration tests - verify all hint validation tests pass 3. Run mypy --strict - verify no type errors 4. Start server - verify tools register with hints (check MCP protocol) 5. Review documentation - verify hints documented in README.md ### Quality Gates - All 3 tools have openWorldHint, readOnlyHint, idempotentHint defined - Each hint has inline comment explaining rationale - Integration tests validate hint accuracy - mypy --strict compliance maintained - Server startup successful with hints applied ## Risks & Mitigations ### Risk 1: FastMCP Version Incompatibility **Potential Issue**: Older FastMCP versions may not support hint parameters **Mitigation**: - Document minimum FastMCP version (0.4.0+) in requirements.txt - Add version check in server startup - Provide fallback decorator without hints for older versions ### Risk 2: Incorrect Hints Misleading LLMs **Potential Issue**: Wrong hints could cause LLMs to make bad decisions **Mitigation**: - Thorough code audit before adding hints - Integration tests validating hint accuracy - Monitor LLM behavior after deployment - Quick rollback plan if hints cause issues ### Risk 3: Hints Becoming Stale **Potential Issue**: Code changes may invalidate hints over time **Mitigation**: - Add hint validation to CI pipeline (integration tests) - Document hints in code review checklist - Quarterly audit of all tool hints ### Risk 4: LLMs Ignoring Hints **Potential Issue**: Some LLMs may not use hints effectively **Mitigation**: - Hints are additive - no harm if ignored - Document expected behavior improvements - Monitor metrics to validate effectiveness ## Alternative Approaches Considered ### Approach 1: No Hints (Status Quo) **Considered**: Leave tools without hints **Why Rejected**: Misses optimization opportunities, LLMs must guess behavior ### Approach 2: Infer Hints from Code Analysis **Considered**: Automatically detect hints via static analysis **Why Rejected**: Too complex, error-prone; manual annotation more reliable ### Approach 3: Document in Docstrings Only **Considered**: Add hint information to docstrings instead of decorator **Why Rejected**: LLMs may miss docstring details; formal hints more reliable ### Approach 4: Add Hints to Resources Too **Considered**: Also add hints to `health://` and `metrics://` resources **Why Rejected**: MCP hints primarily designed for tools; defer for now ### Approach 5: Runtime Hint Validation **Considered**: Validate hints match behavior at runtime **Why Rejected**: Complex, adds overhead; integration tests sufficient ## Constitutional Compliance Checklist - ✅ **Principle I (Simplicity)**: Hints add minimal complexity, just decorator parameters - ✅ **Principle II (Local-First)**: Hints don't affect offline operation - ✅ **Principle III (Protocol Compliance)**: Hints are standard MCP feature - ✅ **Principle IV (Performance)**: Hints have zero runtime overhead - ✅ **Principle V (Production Quality)**: Comprehensive testing validates hint accuracy - ✅ **Principle VI (Specification-First)**: This plan created before implementation - ✅ **Principle VII (TDD)**: Integration tests written to validate hints - ✅ **Principle VIII (Type Safety)**: Hints fully typed, mypy --strict compliant - ✅ **Principle IX (Orchestration)**: N/A - single-developer task - ✅ **Principle X (Git Micro-Commits)**: Implementation follows micro-commit strategy - ✅ **Principle XI (FastMCP)**: Uses FastMCP hint feature as designed ## Next Steps After Completion 1. **Metrics Collection**: Track LLM retry behavior before/after hints 2. **Prompt Optimization**: Use hints to generate better tool selection prompts 3. **Resource Hints**: Add hints to health/metrics resources if beneficial 4. **Documentation**: Create "Best Practices for Tool Hints" guide 5. **Monitoring**: Track hint-related LLM behavior in production analytics

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Ravenight13/codebase-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

tool-annotations.md•20.1 KiB