Skip to main content
Glama
CLAUDE.md9.1 kB
# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview HexDocs MCP is a project that provides semantic search capabilities for Hex package documentation, designed specifically for AI applications. It downloads, processes, and generates embeddings from Hex package documentation and provides a Model Context Protocol (MCP) server for searching. ## Architecture ### Core Components 1. **Application Entry Points** - `HexdocsMcp.Application` - OTP application for development/test environments - `HexdocsMcp.CLI` - Main CLI entry point for both Mix task and standalone executable - `Mix.Tasks.Hex.Docs.Mcp` - Mix task wrapper 2. **Core Modules** - `HexdocsMcp.Embeddings` - Generates and searches vector embeddings - `HexdocsMcp.Docs` - Fetches documentation from Hex - `HexdocsMcp.Markdown` - Converts HTML to Markdown - `HexdocsMcp.Repo` - Ecto repository for SQLite database - `HexdocsMcp.Config` - Centralized configuration and dependency injection - `HexdocsMcp.Ollama` - Wrapper for Ollama AI integration - `HexdocsMcp.MixDeps` - Parses mix.exs dependencies - `HexdocsMcp.Version` - Semantic version comparison and filtering 3. **CLI Modules** - `HexdocsMcp.CLI.FetchDocs` - Implements fetch_docs command - `HexdocsMcp.CLI.SemanticSearch` - Implements semantic_search command - `HexdocsMcp.CLI.HexSearch` - Implements hex_search command - `HexdocsMcp.CLI.FulltextSearch` - Implements fulltext_search command - `HexdocsMcp.CLI.CheckEmbeddings` - Implements check_embeddings command - `HexdocsMcp.CLI.Progress` - Rich progress indicators - `HexdocsMcp.CLI.Utils` - Shared CLI utilities ### Data Flow 1. **Document Fetching**: `mix hex.docs fetch_docs` → HTML files 2. **Conversion**: HTML → Markdown using Floki 3. **Chunking**: Markdown → Semantic chunks (2000 chars, 200 overlap) 4. **Embedding**: Chunks → 384-dimensional vectors via Ollama 5. **Storage**: Embeddings → SQLite with vector extension 6. **Search**: Query → Embedding → Vector similarity search → Version filtering (latest by default) ### Database Schema **embeddings table**: - `id`: Primary key - `package`, `version`: Package identification - `source_file`: Original HTML file path - `text`: Full chunk text - `text_snippet`: First 100 chars preview - `content_hash`: SHA-256 for deduplication - `url`: Direct documentation link - `embedding`: 384-dimensional float vector - `timestamps`: inserted_at, updated_at Indexes on: `(package, version)`, `(package, version, content_hash)` ### External Dependencies **Core Dependencies**: - `text_chunker` - Semantic text chunking - `ollama` - AI embedding generation - `ecto_sqlite3` + `sqlite_vec` - Vector database - `floki` - HTML parsing - `burrito` - Standalone executable packaging **MCP Server** (Node.js): - `@modelcontextprotocol/sdk` - MCP protocol implementation - Manages Elixir binary lifecycle - Exposes fetch/search tools to AI assistants ## Build/Run/Test Commands ### Development - Mix compile: `mix compile` - Setup dependencies: `mix setup` - Format code: `mix format` - Run linter: `mix credo` - Watch tests: `mix test.watch` ### CLI Commands **Fetch Docs Command**: ```bash mix hex.docs.mcp fetch_docs PACKAGE [VERSION] [options] --model MODEL # Ollama model (default: nomic-embed-text) --force # Force re-fetch even if embeddings exist --project PATH # Fetch all deps from mix.exs --help, -h # Show help # Examples mix hex.docs.mcp fetch_docs phoenix mix hex.docs.mcp fetch_docs phoenix 1.7.0 --model all-minilm mix hex.docs.mcp fetch_docs --project mix.exs --force ``` **Semantic Search Command**: ```bash mix hex.docs.mcp semantic_search [PACKAGE] [options] --query QUERY # Search query (required) --model MODEL # Ollama model (default: nomic-embed-text) --limit LIMIT # Max results (default: 3) --version VERSION # Search only in specific version --all-versions # Include results from all indexed versions (default: latest only) --help, -h # Show help # Examples mix hex.docs.mcp semantic_search --query "how to create channels" # Search latest versions mix hex.docs.mcp semantic_search phoenix --query "configuration options" --limit 10 mix hex.docs.mcp semantic_search phoenix --query "channels" --version 1.7.0 mix hex.docs.mcp semantic_search phoenix --query "channels" --all-versions ``` **Hex Search Command**: ```bash mix hex.docs.mcp hex_search [PACKAGE] [VERSION] [options] --query QUERY # Search query (required) --sort SORT # Sort results by: downloads, recent, or name --limit LIMIT # Max results (default: 10) --help, -h # Show help # Examples mix hex.docs.mcp hex_search --query "json parser" --limit 5 mix hex.docs.mcp hex_search phoenix --query "1.7" # Search phoenix versions mix hex.docs.mcp hex_search phoenix 1.7.0 --query "info" # Get specific version info ``` **Full-text Search Command**: ```bash mix hex.docs.mcp fulltext_search [PACKAGE] [VERSION] [options] --query QUERY # Search query using Typesense syntax (required) --limit LIMIT # Max results (default: 10, max: 100) --help, -h # Show help # Examples mix hex.docs.mcp fulltext_search --query "GenServer.handle_call" mix hex.docs.mcp fulltext_search phoenix --query "router" --limit 5 mix hex.docs.mcp fulltext_search ecto 3.10.0 --query "changeset" ``` **Check Embeddings Command**: ```bash mix hex.docs.mcp check_embeddings PACKAGE [VERSION] --help, -h # Show help # Examples mix hex.docs.mcp check_embeddings phoenix # Check latest version mix hex.docs.mcp check_embeddings phoenix 1.7.0 # Check specific version ``` ### Testing - Run all tests: `mix test` - Run single test: `mix test test/hexdocs_mcp_test.exs:LINE_NUMBER` - Run specific module: `mix test test/hexdocs_mcp/embeddings_test.exs` ### Building - Build docs: `mix docs` - Build hex package: `mix hex.build` - Build standalone: `mix release` ## Code Style Guidelines - **Formatting**: Follow Elixir formatter rules defined in .formatter.exs - **Naming**: Use snake_case for functions and variables, PascalCase for modules - **Imports**: Group and alphabetize imports; prefer alias for external modules - **Documentation**: Use @moduledoc and @doc with examples in doctest format - **Error Handling**: Use with/else pattern for chained operations that may fail - **Function Organization**: Private functions follow public functions they support - **Logging**: Use Logger with appropriate levels (debug, info, error) - **Module Structure**: Follow conventional mix project structure - **Types**: Prefer not to document function types using @spec - **Comments**: Avoid code comments unless absolutely necessary - **Cross-Platform Compatibility**: Ensure all changes are compatible with mac, windows and linux ## Testing Patterns ### Behavior-based Architecture All external dependencies are wrapped in behaviors for testability: - `HexdocsMcp.Behaviours.Ollama` - AI service interface - `HexdocsMcp.Behaviours.Docs` - Documentation fetching - `HexdocsMcp.Behaviours.Embeddings` - Embedding operations ### Mocking Strategy - Uses Mox for behavior-based mocks - Mocks configured in test_helper.exs - Custom mock implementations for complex scenarios - Process message testing for internal verification ### Database Testing - Uses SQL Sandbox for test isolation - DataCase module for database test setup - Automatic cleanup between tests - SQLite vector extension loaded in tests ### CLI Testing - IO capture for output verification - Mock injection via application config - Tests cover success paths, errors, and help output ## Configuration ### Environment Variables - `HEXDOCS_MCP_DATA_PATH` - Data storage location (default: ~/.hexdocs_mcp) - `HEXDOCS_MCP_DEFAULT_EMBEDDING_MODEL` - Default Ollama model - `HEXDOCS_MCP_MIX_PROJECT_PATHS` - Comma-separated mix.exs paths ### Application Config Configuration is centralized in `HexdocsMcp.Config` for: - Module dependency injection - Path configuration - Default values - Test vs production behavior ## MCP Integration The project includes a Node.js MCP server that: 1. Downloads and manages the Elixir binary 2. Exposes MCP tools: - `fetch_docs` - Download and process documentation with embeddings - `semantic_search` - Search using semantic embeddings - `hex_search` - Search packages on Hex.pm - `fulltext_search` - Full-text search on HexDocs - `check_embeddings` - Verify if embeddings exist for a package 3. Handles binary updates automatically 4. Communicates via stdio transport Deploy with: `node dist/index.js` (after `npm run build`) ## Performance Considerations - **Batch Processing**: Embeddings generated in batches of 10 - **Concurrency**: Max 4 concurrent Ollama requests - **Deduplication**: Content-based hashing prevents duplicate embeddings - **Incremental Updates**: Only new/changed content is processed - **Vector Search**: Efficient similarity search using SQLite vec extension

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bradleygolden/hexdocs-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server