mcp-vector-search
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@mcp-vector-searchsearch for function that calculates distance"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
MCP Vector Search
š CLI-first semantic code search with MCP integration
ā ļø Production Release (v2.5.56): Stable and actively maintained. LanceDB is now the default backend for better performance and stability.
A modern, fast, and intelligent code search tool that understands your codebase through semantic analysis and AST parsing. Built with Python, powered by LanceDB, and designed for developer productivity.
⨠Features
š Core Capabilities
Semantic Search: Find code by meaning, not just keywords
AST-Aware Parsing: Understands code structure (functions, classes, methods)
Multi-Language Support: 13 languages - Python, JavaScript, TypeScript, C#, Dart/Flutter, PHP, Ruby, Java, Go, Rust, HTML, and Markdown/Text (with extensible architecture)
Knowledge Graph: Temporal knowledge graph with KuzuDB for entity extraction and relationship mapping (
kg build,kg status,kg query)Interactive Visualization: D3.js-powered visualization with 5+ views (Treemap, Sunburst, Force Graph, Knowledge Graph, Heatmap)
Development Narratives: Generate git history narratives with
storycommand (markdown, JSON, HTML output)Real-time Indexing: File watching with automatic index updates
Automatic Version Tracking: Smart reindexing on tool upgrades
Local-First: Complete privacy with on-device processing
Zero Configuration: Auto-detects project structure and languages
š ļø Developer Experience
CLI-First Design: Simple commands for immediate productivity
Rich Output: Syntax highlighting, similarity scores, context
Fast Performance: Sub-second search responses, efficient indexing with pipeline parallelism (37% faster); IVF-PQ vector index delivers 4.9x faster queries (3.4ms vs 16.7ms)
Modern Architecture: Async-first, type-safe, modular design
Semi-Automatic Reindexing: Multiple strategies without daemon processes
17 MCP Tools: Comprehensive MCP integration for AI assistants (search, analysis, documentation, KG, story generation)
Chat Mode: LLM-powered code Q&A with iterative refinement (up to 30 queries), deep search, and KG query tools
CodeT5+ Embeddings: Code-specific embeddings via
index-codecommand (Salesforce/codet5p-110m-embedding)
š§ Technical Features
Vector Database: LanceDB (serverless, file-based)
Embedding Models: Configurable sentence transformers with GPU acceleration
Smart Reindexing: Search-triggered, Git hooks, scheduled tasks, and manual options
Extensible Parsers: Plugin architecture for new languages
Configuration Management: Project-specific settings
Production Ready: Write buffering, auto-indexing, comprehensive error handling
Performance: Apple Silicon M4 Max optimizations (2-4x speedup with MPS)
Related MCP server: ChunkHound
š Quick Start
Installation
# Install from PyPI (recommended)
pip install mcp-vector-search
# Or with UV (faster)
uv pip install mcp-vector-search
# Or install from source
git clone https://github.com/bobmatnyc/mcp-vector-search.git
cd mcp-vector-search
uv sync && uv pip install -e .Verify Installation:
# Check that all dependencies are installed correctly
mcp-vector-search doctor
# Should show all ā marks
# If you see missing dependencies, try:
pip install --upgrade mcp-vector-searchZero-Config Setup (Recommended)
The fastest way to get started - completely hands-off, just one command:
# Smart zero-config setup (recommended)
mcp-vector-search setupWhat setup does automatically:
ā Detects your project's languages and file types
ā Initializes semantic search with optimal settings
ā Indexes your entire codebase
ā Configures ALL installed MCP platforms (Claude Code, Cursor, etc.)
ā Uses native Claude CLI integration (
claude mcp add) when availableā Falls back to
.mcp.jsonif Claude CLI not availableā Sets up file watching for auto-reindex
ā Zero user input required!
Behind the scenes:
Server name:
mcp(for consistency with other MCP projects)Command:
uv run python -m mcp_vector_search.mcp.server {PROJECT_ROOT}File watching: Enabled via
MCP_ENABLE_FILE_WATCHING=trueIntegration method: Native
claude mcp add(or.mcp.jsonfallback)
Example output:
š Smart Setup for mcp-vector-search
š Detecting project...
ā
Found 3 language(s): Python, JavaScript, TypeScript
ā
Detected 8 file type(s)
ā
Found 2 platform(s): claude-code, cursor
āļø Configuring...
ā
Embedding model: sentence-transformers/all-MiniLM-L6-v2
š Initializing...
ā
Vector database created
ā
Configuration saved
š Indexing codebase...
ā
Indexing completed in 12.3s
š Configuring MCP integrations...
ā
Using Claude CLI for automatic setup
ā
Registered with Claude CLI
ā
Configured 2 platform(s)
š Setup Complete!Options:
# Force re-setup
mcp-vector-search setup --force
# Verbose output for debugging (shows Claude CLI commands)
mcp-vector-search setup --verboseAdvanced Setup Options
For more control over the installation process:
# Manual setup with MCP integration
mcp-vector-search install --with-mcp
# Custom file extensions
mcp-vector-search install --extensions .py,.js,.ts,.dart
# Skip automatic indexing
mcp-vector-search install --no-auto-index
# Just initialize (no indexing or MCP)
mcp-vector-search initAdd MCP Integration for AI Tools
Automatic (Recommended):
# One command sets up all detected platforms
mcp-vector-search setupManual Platform Installation:
# Add Claude Code integration (project-scoped)
mcp-vector-search install claude-code
# Add Cursor IDE integration (global)
mcp-vector-search install cursor
# See all available platforms
mcp-vector-search install listNote: The setup command uses native claude mcp add when Claude CLI is available, providing better integration than manual .mcp.json creation.
Remove MCP Integrations
# Remove specific platform
mcp-vector-search uninstall claude-code
# Remove all integrations
mcp-vector-search uninstall --all
# List configured integrations
mcp-vector-search uninstall listBasic Usage
# Search your code
mcp-vector-search search "authentication logic"
mcp-vector-search search "database connection setup"
mcp-vector-search search "error handling patterns"
# Index your codebase (if not done during setup)
mcp-vector-search index
# Index with code-specific embeddings (CodeT5+)
mcp-vector-search index-code
# Check project status
mcp-vector-search status
# Start file watching (auto-update index)
mcp-vector-search watch
# Interactive visualization (5+ views)
mcp-vector-search visualize
# Generate development narrative from git history
mcp-vector-search story
# Knowledge graph operations
mcp-vector-search kg build
mcp-vector-search kg status
mcp-vector-search kg query "find all Python functions"
# Chat mode with LLM
mcp-vector-search chat "explain the authentication flow"
# Code analysis
mcp-vector-search analyze complexity
mcp-vector-search analyze dead-codeSmart CLI with "Did You Mean" Suggestions
The CLI includes intelligent command suggestions for typos:
# Typos are automatically detected and corrected
$ mcp-vector-search serach "auth"
No such command 'serach'. Did you mean 'search'?
$ mcp-vector-search indx
No such command 'indx'. Did you mean 'index'?See docs/guides/cli-usage.md for more details.
Versioning & Releasing
This project uses semantic versioning with an automated release workflow.
Quick Commands
make version-show- Display current versionmake release-patch- Create patch releasemake publish- Publish to PyPI
See docs/development/versioning.md for complete documentation.
š AI Code Review
Context-aware code review using your entire codebase as context ā Not just diff analysis!
What Makes It Different
Traditional code review tools only see individual files or diffs. MCP Vector Search analyzes code with full codebase context by:
š Semantic Search: Finding related patterns and similar implementations
šøļø Knowledge Graph: Understanding dependencies and callers
š¤ LLM Analysis: Deep analysis with language-specific standards
ā” Smart Caching: 5x speedup with intelligent result caching
Quick Examples
# Security review of your codebase
mvs analyze review security
# Review a pull request with full context
mvs analyze review-pr --baseline main --head feature-branch
# Review only changed files (fast!)
mvs analyze review security --changed-only --baseline main
# Run multiple review types at once
mvs analyze review --types security,quality,architectureReview Types
Type | Focus | Key Checks |
security | OWASP Top 10, CWE | SQL injection, XSS, auth flaws, hardcoded secrets |
architecture | SOLID principles | Coupling, circular deps, god classes, SRP violations |
performance | Efficiency | N+1 queries, O(n²) algorithms, blocking I/O |
quality | Maintainability | Code smells, duplication, magic numbers, dead code |
testing | Test coverage | Missing tests, edge cases, test quality |
documentation | Code docs | Missing docstrings, TODOs, outdated comments |
PR Review with Context
The killer feature ā review PRs using the entire codebase as context:
# Review PR with context-aware analysis
mvs analyze review-pr --baseline main --format github-json
# For each changed file, finds:
# ā Similar patterns in codebase (consistency checking)
# ā Callers and dependencies (impact analysis)
# ā Existing tests (coverage gaps)
# ā Language-specific idioms (12 languages supported)Context Strategy:
Changed File ā Vector Search (similar patterns)
ā Knowledge Graph (callers, deps)
ā Test Discovery (coverage)
ā LLM Analysis (with full context)
ā Actionable CommentsMulti-Language Support
12 languages with language-specific idioms, anti-patterns, and security checks:
Python ⢠TypeScript ⢠JavaScript ⢠Java ⢠C# ⢠Ruby ⢠Go ⢠Rust ⢠PHP ⢠Swift ⢠Kotlin ⢠Scala
Each language has tailored standards:
Python: PEP 8, type hints, context managers, SQL injection patterns
TypeScript: Strict mode, no
any, XSS patternsJava: SOLID principles, Optional over null, XXE patterns
Ruby: Guard clauses, blocks, RuboCop standards
Go: Error handling, goroutines, interfaces
Custom Instructions
Create .mcp-vector-search/review-instructions.yaml:
language_standards:
python:
- "Enforce type hints on all public functions"
- "Use Pydantic for data validation"
scope_standards:
src/auth:
- "All auth functions must have audit logging"
custom_review_focus:
security:
- "Flag any hardcoded credentials"Auto-Discovery
Automatically reads and applies standards from your existing config files:
Python:
pyproject.toml,.flake8,mypy.ini,ruff.tomlTypeScript:
tsconfig.json,.eslintrc.jsonRuby:
.rubocop.ymlJava:
checkstyle.xml,pom.xml+8 more languages
CI/CD Integration
# .github/workflows/code-review.yml
- name: Review PR
run: |
mvs analyze review-pr \
--baseline ${{ github.base_ref }} \
--format sarif \
--output review.sarif
- name: Upload to Security tab
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: review.sarifOutput Formats
console: Rich, colored output for humans
json: Machine-readable structured data
sarif: GitHub Security tab integration
markdown: Reports for documentation
github-json: PR comments (summary + inline)
Performance
Vector Search: <0.5s (find relevant code)
KG Queries: <0.2s (relationships)
LLM Analysis: 10-15s (deep analysis)
Cache Hit: 5x speedup on repeat reviews
Smart Caching: Unchanged code chunks return cached findings instantly.
Learn More
š Complete Documentation ā Architecture, examples, best practices
š CI/CD Integration Guide ā GitHub Actions, GitLab CI, pre-commit hooks
š Multi-Language Support ā 12 languages with standards
Privacy Policy Auditor
Audit codebases against their stated privacy policies using semantic code search and knowledge graph analysis.
Quick Start
# Install with auditor dependencies
pip install 'mcp-vector-search[auditor]'
# Run a privacy audit
mvs audit run --target /path/to/repo --policy /path/to/repo/PRIVACY.md
# Check for policy/code drift
mvs audit drift-check --target /path/to/repo --policy /path/to/repo/PRIVACY.md
# Verify a certification
mvs audit verify audits/<target>/latest/
# List audit history
mvs audit listHow It Works
Extract Claims ā Parses privacy policies into testable assertions (hybrid text analysis + LLM)
Collect Evidence ā Queries the codebase via vector search, hybrid search, and knowledge graph
Judge Verdicts ā LLM evaluates each claim against evidence (PASS / FAIL / INSUFFICIENT / MANUAL_REVIEW)
Certify ā Produces signed certification documents with per-claim verdicts and evidence
Features
Dual output ā Certification saved in both auditor repo and target repo
Multiple LLM backends ā OpenRouter and Anthropic (auto-detected)
GitHub Actions ā On-demand audit workflow + daily drift detection
GPG signing ā Optional cryptographic signatures for certification integrity
Auto-issue creation ā GitHub issues for claims needing review
.audit-ignore.yml ā Suppress specific claims with documented justifications
See docs/features/privacy-auditor.md for full documentation.
š Documentation
Commands
setup - Zero-Config Smart Setup (Recommended)
# One command to do everything (recommended)
mcp-vector-search setup
# What it does automatically:
# - Detects project languages and file types
# - Initializes semantic search
# - Indexes entire codebase
# - Configures all detected MCP platforms
# - Sets up file watching
# - Zero configuration needed!
# Force re-setup
mcp-vector-search setup --force
# Verbose output for debugging
mcp-vector-search setup --verboseKey Features:
Zero Configuration: No user input required
Smart Detection: Automatically discovers languages and platforms
Comprehensive: Handles init + index + MCP setup in one command
Idempotent: Safe to run multiple times
Fast: Timeout-protected scanning (won't hang on large projects)
Team-Friendly: Commit
.mcp.jsonto share configuration
When to use:
ā First-time project setup
ā Team onboarding
ā Quick testing in new codebases
ā Setting up multiple MCP platforms at once
install - Install Project and MCP Integrations (Advanced)
# Manual setup with more control
mcp-vector-search install
# Install with all MCP integrations
mcp-vector-search install --with-mcp
# Custom file extensions
mcp-vector-search install --extensions .py,.js,.ts
# Skip automatic indexing
mcp-vector-search install --no-auto-index
# Platform-specific MCP integration
mcp-vector-search install claude-code # Project-scoped
mcp-vector-search install cursor # Global
mcp-vector-search install windsurf # Global
mcp-vector-search install vscode # Global
# List available platforms
mcp-vector-search install listWhen to use:
Use
installwhen you need fine-grained control over extensions, models, or MCP platformsUse
setupfor quick, zero-config onboarding (recommended)
uninstall - Remove MCP Integrations
# Remove specific platform
mcp-vector-search uninstall claude-code
# Remove all integrations
mcp-vector-search uninstall --all
# List configured integrations
mcp-vector-search uninstall list
# Skip backup creation
mcp-vector-search uninstall claude-code --no-backup
# Alias (same as uninstall)
mcp-vector-search remove claude-codeinit - Initialize Project (Simple)
# Basic initialization (no indexing or MCP)
mcp-vector-search init
# Custom configuration
mcp-vector-search init --extensions .py,.js,.ts --embedding-model sentence-transformers/all-MiniLM-L6-v2
# Force re-initialization
mcp-vector-search init --forceNote: For most users, use setup instead of init. The init command is for advanced users who want manual control.
index - Index Codebase
# Index all files
mcp-vector-search index
# Index specific directory
mcp-vector-search index /path/to/code
# Force re-indexing
mcp-vector-search index --force
# Reindex entire project
mcp-vector-search index reindex
# Reindex entire project (explicit)
mcp-vector-search index reindex --all
# Reindex entire project without confirmation
mcp-vector-search index reindex --force
# Reindex specific file
mcp-vector-search index reindex path/to/file.pysearch - Semantic Search
# Basic search
mcp-vector-search search "function that handles user authentication"
# Adjust similarity threshold
mcp-vector-search search "database queries" --threshold 0.7
# Limit results
mcp-vector-search search "error handling" --limit 10
# Search in specific context
mcp-vector-search search similar "path/to/function.py:25"auto-index - Automatic Reindexing
# Setup all auto-indexing strategies
mcp-vector-search auto-index setup --method all
# Setup specific strategies
mcp-vector-search auto-index setup --method git-hooks
mcp-vector-search auto-index setup --method scheduled --interval 60
# Check for stale files and auto-reindex
mcp-vector-search auto-index check --auto-reindex --max-files 10
# View auto-indexing status
mcp-vector-search auto-index status
# Remove auto-indexing setup
mcp-vector-search auto-index teardown --method allwatch - File Watching
# Start watching for changes
mcp-vector-search watch
# Check watch status
mcp-vector-search watch status
# Enable/disable watching
mcp-vector-search watch enable
mcp-vector-search watch disablestatus - Project Information
# Basic status
mcp-vector-search status
# Detailed information
mcp-vector-search status --verboseconfig - Configuration Management
# View configuration
mcp-vector-search config show
# Update settings
mcp-vector-search config set similarity_threshold 0.8
mcp-vector-search config set embedding_model microsoft/codebert-base
# Configure indexing behavior
mcp-vector-search config set skip_dotfiles true # Skip dotfiles (default)
mcp-vector-search config set respect_gitignore true # Respect .gitignore (default)
# Get specific setting
mcp-vector-search config get skip_dotfiles
mcp-vector-search config get respect_gitignore
# List available models
mcp-vector-search config models
# List all configuration keys
mcp-vector-search config list-keysindex-code - Code-Specific Embeddings
# Index with CodeT5+ embeddings (code-optimized)
mcp-vector-search index-code
# Feature-flagged via environment variable
export MCP_CODE_ENRICHMENT=true
mcp-vector-search index-codevisualize - Interactive D3.js Visualization
# Launch visualization server
mcp-vector-search visualize
# Start on custom port
mcp-vector-search visualize --port 8080
# Available views:
# - Treemap: Hierarchical view with size/complexity encoding
# - Sunburst: Radial hierarchical view
# - Force Graph: Network visualization of code relationships
# - Knowledge Graph: Entity and relationship visualization
# - Heatmap: Complexity and quality heatmapstory - Development Narrative Generation
# Generate development narrative from git history
mcp-vector-search story
# Output formats
mcp-vector-search story --format markdown
mcp-vector-search story --format json
mcp-vector-search story --format html
# Serve as HTTP endpoint
mcp-vector-search story --serve
# Extract-only mode (no LLM)
mcp-vector-search story --no-llm
# Custom LLM model
mcp-vector-search story --model gpt-4okg - Knowledge Graph Operations
# Build knowledge graph
mcp-vector-search kg build
# Check knowledge graph status
mcp-vector-search kg status
# Query knowledge graph
mcp-vector-search kg query "find all Python functions"
mcp-vector-search kg query "show classes in module auth"
# Browse document ontology (file-level document classification)
mcp-vector-search kg ontology
mcp-vector-search kg ontology --category guide # filter by category
mcp-vector-search kg ontology --verbose # include file paths
# Knowledge graph entities:
# - CodeFile, Function, Class, Person
# - ProgrammingLanguage, ProgrammingFramework
# - Document (file-level, with doc_category classification)
# - Topic (hierarchical taxonomy)chat - LLM-Powered Code Q&A
# Ask questions about your codebase
mcp-vector-search chat "explain the authentication flow"
mcp-vector-search chat "how does error handling work?"
# Iterative refinement (up to 30 queries)
# Automatically uses deep search and KG query tools
# Advanced reasoning mode
mcp-vector-search chat "architectural patterns" --think
# Filter by files
mcp-vector-search chat "validation logic" --files "src/*.py"analyze - Code Analysis
# Complexity analysis
mcp-vector-search analyze complexity
# Dead code detection
mcp-vector-search analyze dead-code
# Output formats
mcp-vector-search analyze complexity --json
mcp-vector-search analyze complexity --sarif
mcp-vector-search analyze complexity --output-format markdown
# CI/CD integration
mcp-vector-search analyze complexity --fail-on-smellš Performance Features
Search Optimizations
MCP Vector Search includes several query-time optimizations that are automatically enabled as your index grows.
IVF-PQ Index is built automatically after indexing more than 256 rows. It uses Inverted File with Product Quantization to partition vectors into clusters, so queries scan only a relevant subset rather than the full index. The index parameters adapt to your data: num_partitions = clamp(sqrt(N), 16, 512) and num_sub_vectors = dim // 4.
Two-stage retrieval improves precision on top of the IVF-PQ scan: the engine probes 20 IVF partitions (nprobes=20) and fetches 5x the requested candidates, then reranks them with exact cosine similarity (refine_factor=5). Applied to both the LanceDB and legacy vector backends.
Contextual chunking prepends a compact metadata header to each chunk before embedding, so the vector captures file, language, class, and function context rather than code text alone. Format: File: core/search.py | Lang: python | Class: Engine | Fn: search | Uses: lancedb. Based on Anthropic research showing 35-49% fewer retrieval failures.
Optimization | Impact |
IVF-PQ index + two-stage retrieval | 4.9x faster queries (3.4ms vs 16.7ms median) |
Contextual chunking | 35-49% fewer retrieval failures |
Pipeline parallelism | 37% faster indexing |
Apple Silicon MPS | 2-4x faster embedding generation |
See docs/performance/search-optimizations.md for technical details and benchmark methodology.
LanceDB Backend (Default in v2.1+)
LanceDB is now the default vector database for better performance and stability:
Serverless Architecture: No separate server process needed
Better Scaling: Superior performance for large codebases (>100k chunks)
File-Based Storage: Simple directory-based persistence
Fewer Corruption Issues: More stable than ChromaDB's HNSW indices
Write Buffering: 2-4x faster indexing with accumulated batch writes
To use ChromaDB (legacy), set environment variable:
export MCP_VECTOR_SEARCH_BACKEND=chromadbMigrate existing ChromaDB database:
mcp-vector-search migrate db chromadb-to-lancedbSee docs/LANCEDB_BACKEND.md for detailed documentation.
Apple Silicon M4 Max Optimizations
2-4x speedup on Apple Silicon with automatic hardware detection:
MPS Backend: Metal Performance Shaders GPU acceleration for embeddings
Intelligent Batch Sizing: Auto-detects GPU memory (384-512 for M4 Max with 128GB RAM)
Multi-Core Optimization: Utilizes all 12 performance cores efficiently
Zero Configuration: Automatically enabled on Apple Silicon Macs
Environment variables for tuning:
export MCP_VECTOR_SEARCH_MPS_BATCH_SIZE=512 # Override MPS batch size
export MCP_VECTOR_SEARCH_BATCH_SIZE=128 # Override all backendsSemi-Automatic Reindexing
Multiple strategies to keep your index up-to-date without daemon processes:
Search-Triggered: Automatically checks for stale files during searches
Git Hooks: Triggers reindexing after commits, merges, checkouts
Scheduled Tasks: System-level cron jobs or Windows tasks
Manual Checks: On-demand via CLI commands
Periodic Checker: In-process periodic checks for long-running apps
# Setup all strategies
mcp-vector-search auto-index setup --method all
# Check status
mcp-vector-search auto-index statusConfiguration
Projects are configured via .mcp-vector-search/config.json:
{
"project_root": "/path/to/project",
"file_extensions": [".py", ".js", ".ts"],
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
"similarity_threshold": 0.75,
"languages": ["python", "javascript", "typescript"],
"watch_files": true,
"cache_embeddings": true,
"skip_dotfiles": true,
"respect_gitignore": true
}Indexing Configuration Options
skip_dotfiles (default: true)
Controls whether files and directories starting with "." are skipped during indexing
Whitelisted directories are always indexed regardless of this setting:
.github/- GitHub workflows and actions.gitlab-ci/- GitLab CI configuration.circleci/- CircleCI configuration
When
false: All dotfiles are indexed (subject to gitignore rules ifrespect_gitignoreistrue)
respect_gitignore (default: true)
Controls whether
.gitignorepatterns are respected during indexingWhen
false: Files in.gitignoreare indexed (subject toskip_dotfilesif enabled)
force_include_patterns (default: [])
Glob patterns to force-include files/directories even if they are gitignored
Patterns support
**for recursive matching (e.g.,repos/**/*.javamatches all Java files inrepos/and subdirectories)Force-include patterns override
.gitignorerules, allowing selective indexing of gitignored directoriesExample use case: Index specific file types in a gitignored
repos/directory
Example: Force-include Java files from gitignored directory
# Set force_include_patterns via JSON list
mcp-vector-search config set force_include_patterns '["repos/**/*.java", "repos/**/*.kt"]'
# Or add patterns one at a time (requires custom CLI command)
# This allows .gitignore to exclude repos/ from git, but mcp-vector-search still indexes Java/Kotlin filesExample config.json with force_include_patterns:
{
"respect_gitignore": true,
"force_include_patterns": [
"repos/**/*.java",
"repos/**/*.kt",
"vendor/internal/**/*.go"
]
}Configuration Use Cases
Default Behavior (Recommended for most projects):
# Skip dotfiles AND respect .gitignore
mcp-vector-search config set skip_dotfiles true
mcp-vector-search config set respect_gitignore trueIndex Everything (Useful for deep code analysis):
# Index all files including dotfiles and gitignored files
mcp-vector-search config set skip_dotfiles false
mcp-vector-search config set respect_gitignore falseIndex Dotfiles but Respect .gitignore:
# Index configuration files but skip build artifacts
mcp-vector-search config set skip_dotfiles false
mcp-vector-search config set respect_gitignore trueSkip Dotfiles but Ignore .gitignore:
# Useful when you want to index files in .gitignore but skip hidden config files
mcp-vector-search config set skip_dotfiles true
mcp-vector-search config set respect_gitignore falseSelective Gitignore Override with Force-Include Patterns:
# Index specific file types from gitignored directories
# Example: .gitignore excludes repos/, but you want to index Java/Kotlin files
mcp-vector-search config set respect_gitignore true
mcp-vector-search config set force_include_patterns '["repos/**/*.java", "repos/**/*.kt"]'
# This allows:
# - .gitignore to exclude repos/ from git (keeps your repo clean)
# - mcp-vector-search to index Java/Kotlin files in repos/ (semantic search)
# - Other files in repos/ (e.g., .class, .jar) remain excludedšļø Architecture
Core Components
Parser Registry: Extensible system for language-specific parsing
Semantic Indexer: Efficient code chunking and embedding generation
Vector Database: LanceDB for similarity search
File Watcher: Real-time monitoring and incremental updates
CLI Interface: Rich, user-friendly command-line experience
Supported Languages
MCP Vector Search supports 13 programming languages with full semantic search capabilities:
Language | Extensions | Status | Features |
Python |
| ā Full | Functions, classes, methods, docstrings |
JavaScript |
| ā Full | Functions, classes, JSDoc, ES6+ syntax |
TypeScript |
| ā Full | Interfaces, types, generics, decorators |
C# |
| ā Full | Classes, interfaces, structs, enums, methods, XML docs, attributes |
Dart |
| ā Full | Functions, classes, widgets, async, dartdoc |
PHP |
| ā Full | Classes, methods, traits, PHPDoc, Laravel patterns |
Ruby |
| ā Full | Modules, classes, methods, RDoc, Rails patterns |
Java |
| ā Full | Classes, methods, annotations, interfaces |
Go |
| ā Full | Functions, structs, interfaces, packages |
Rust |
| ā Full | Functions, structs, traits, implementations |
HTML |
| ā Full | Semantic content extraction, heading hierarchy, text chunking |
Text/Markdown |
| ā Basic | Semantic chunking for documentation |
New Language Support
HTML Support (Unreleased):
Semantic Extraction: Content from h1-h6, p, section, article, main, aside, nav, header, footer
Intelligent Chunking: Based on heading hierarchy (h1-h6)
Context Preservation: Maintains class and id attributes for searchability
Script/Style Filtering: Ignores non-content elements
Use Cases: Static sites, documentation, web templates, HTML fragments
Dart/Flutter Support (v0.4.15):
Widget Detection: StatelessWidget, StatefulWidget recognition
State Classes: Automatic parsing of
_WidgetNameStatepatternsAsync Support: Future and async function handling
Dartdoc: Triple-slash comment extraction
Tree-sitter AST: Fast, accurate parsing with regex fallback
PHP Support (v0.5.0):
Class Detection: Classes, interfaces, traits
Method Extraction: Public, private, protected, static methods
Magic Methods: __construct, __get, __set, __call, etc.
PHPDoc: Full comment extraction
Laravel Patterns: Controllers, Models, Eloquent support
Tree-sitter AST: Fast parsing with regex fallback
Ruby Support (v0.5.0):
Module/Class Detection: Full namespace support (::)
Method Extraction: Instance and class methods
Special Syntax: Method names with ?, ! support
Attribute Macros: attr_accessor, attr_reader, attr_writer
RDoc: Comment extraction (# and =begin...=end)
Rails Patterns: ActiveRecord, Controllers support
Tree-sitter AST: Fast parsing with regex fallback
š¤ Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Setup
# Clone the repository
git clone https://github.com/bobmatnyc/mcp-vector-search.git
cd mcp-vector-search
# Install development environment (includes dependencies + editable install)
make dev
# Test CLI from source (recommended during development)
./scripts/dev-mcp version # Shows [DEV] indicator
./scripts/dev-mcp search "test" # No reinstall needed after code changes
# Run tests and quality checks
make test-unit # Run unit tests
make quality # Run linting and type checking
make fix # Auto-fix formatting issues
# View all available targets
make helpFor detailed development workflow and dev-mcp usage, see the Development section below.
Adding Language Support
Create a new parser in
src/mcp_vector_search/parsers/Extend the
BaseParserclassRegister the parser in
parsers/registry.pyAdd tests and documentation
š Performance
Indexing Speed: ~1000 files/minute (typical Python project)
Search Latency: 3.4ms median with IVF-PQ index (4.9x faster than without)
Memory Usage: ~50MB baseline + ~1MB per 1000 code chunks
Storage: ~1KB per code chunk (compressed embeddings)
ā ļø Known Limitations (Alpha)
Tree-sitter Integration: Currently using regex fallback parsing (Tree-sitter setup needs improvement)
Search Relevance: Embedding model may need tuning for code-specific queries
Error Handling: Some edge cases may not be gracefully handled
Documentation: API documentation is minimal
Testing: Limited test coverage, needs real-world validation
š Feedback Needed
We're actively seeking feedback on:
Search Quality: How relevant are the search results for your codebase?
Performance: How does indexing and search speed feel in practice?
Usability: Is the CLI interface intuitive and helpful?
Language Support: Which languages would you like to see added next?
Features: What functionality is missing for your workflow?
Please open an issue or start a discussion to share your experience!
š® Roadmap
v2.5: Production (Current) ā
Core CLI interface
Multi-language parsing (13 languages: Python, JavaScript, TypeScript, C#, Dart, PHP, Ruby, Java, Go, Rust, HTML, Markdown, Text)
LanceDB default backend (ChromaDB legacy support)
Apple Silicon optimizations (2-4x speedup with MPS)
File watching and auto-reindexing
MCP server implementation with 17 tools
Advanced search modes (semantic, contextual, similar code)
Code analysis tools (complexity, dead code detection, code smells)
Interactive D3.js visualization (5+ views: Treemap, Sunburst, Force Graph, KG, Heatmap)
Knowledge Graph with KuzuDB (entity extraction, relationship mapping)
Development narrative generation (
storycommand)Chat mode with LLM integration (iterative refinement, up to 30 queries)
CodeT5+ code-specific embeddings
Pipeline parallelism (37% faster indexing)
Production-ready performance (write buffering, GPU acceleration, async pipeline)
IVF-PQ vector index with two-stage retrieval (4.9x faster queries)
Contextual chunking (metadata-enriched embeddings, 35-49% fewer retrieval failures)
CodeRankEmbed model support (
nomic-ai/CodeRankEmbed, 768d, 8K context)Document ontology with 23 categories (
kg ontologycommand)
v2.6+: Enhancements š®
Hybrid search (vector + keyword + BM25)
Additional language support (more languages beyond 13)
IDE extensions (VS Code, JetBrains)
Team collaboration features
Advanced code refactoring suggestions
Real-time collaboration on knowledge graph
Multi-project knowledge graph federation
š ļø Development
Three-Stage Development Workflow
Stage A: Local Development & Testing
# Setup development environment
make dev
# Run development tests
make test-unit
# Run CLI from source (recommended during development)
./dev-mcp version # Visual [DEV] indicator
./dev-mcp status # Any command works
./dev-mcp search "auth" # Immediate feedback on changes
# Run quality checks
make quality
# Alternative: use uv run directly
uv run mcp-vector-search versionUsing the dev-mcp Development Helper
The ./dev-mcp script provides a streamlined way to run the CLI from source code during development, eliminating the need for repeated installations.
Key Features:
Visual [DEV] Indicator: Shows
[DEV]prefix to distinguish from installed versionNo Reinstall Required: Reflects code changes immediately
Complete Argument Forwarding: Works with all CLI commands and options
Verbose Mode: Debug output with
--verboseflagBuilt-in Help: Script usage with
--help
Usage Examples:
# Basic commands (note the [DEV] prefix in output)
./dev-mcp version
./dev-mcp status
./dev-mcp index
./dev-mcp search "authentication logic"
# With CLI options
./dev-mcp search "error handling" --limit 10
./dev-mcp index --force
# Script verbose mode (shows Python interpreter, paths)
./dev-mcp --verbose search "database"
# Script help (shows dev-mcp usage, not CLI help)
./dev-mcp --help
# CLI command help (forwards --help to the CLI)
./dev-mcp search --help
./dev-mcp index --helpWhen to Use:
./dev-mcpā Development workflow (runs from source code)mcp-vector-searchā Production usage (runs installed version via pipx/pip)
Benefits:
Instant Feedback: Changes to source code are reflected immediately
No Build Step: Skip the reinstall cycle during active development
Clear Context: Visual
[DEV]indicator prevents confusion about which version is runningError Handling: Built-in checks for uv installation and project structure
Requirements:
Must have
uvinstalled (pip install uv)Must run from project root directory
Requires
pyproject.tomlin current directory
Stage B: Local Deployment Testing
# Build and test clean deployment
./scripts/deploy-test.sh
# Test on other projects
cd ~/other-project
mcp-vector-search init && mcp-vector-search indexStage C: PyPI Publication
# Publish to PyPI
./scripts/publish.sh
# Verify published version
pip install mcp-vector-search --upgradeQuick Reference
./scripts/workflow.sh # Show workflow overviewSee DEVELOPMENT.md for detailed development instructions.
š Documentation
For comprehensive documentation, see docs/index.md - the complete documentation hub.
Getting Started
Installation Guide - Complete installation instructions
First Steps - Quick start tutorial
Configuration - Basic configuration
User Guides
Searching Guide - Master semantic code search
Indexing Guide - Indexing strategies and optimization
CLI Usage - Advanced CLI features
MCP Integration - AI tool integration
File Watching - Real-time index updates
Reference
CLI Commands - Complete command reference
Configuration Options - All configuration settings
Features - Feature overview
Architecture - System architecture
Development
Contributing - How to contribute
Testing - Testing guide
Code Quality - Linting and formatting
API Reference - Internal API docs
Deployment - Release and deployment guide
Advanced
Troubleshooting - Common issues and solutions
Performance - Performance optimization
Extending - Adding new features
š¤ Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
š License
Elastic License 2.0 - see LICENSE file for details.
Note: This software may not be provided to third parties as a hosted or managed service.
š Acknowledgments
LanceDB for vector database
Tree-sitter for parsing infrastructure
Sentence Transformers for embeddings
Typer for CLI framework
Rich for beautiful terminal output
Built with ā¤ļø for developers who love efficient code search
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/bobmatnyc/mcp-vector-search'
If you have feedback or need assistance with the MCP directory API, please join our Discord server