Integrates with Git repositories by creating a lightweight codemap file (pampa.codemap.json) that can be committed to version control, enabling consistent code context sharing across team members and CI/CD environments.
Runs as an MCP server using Node.js runtime, providing semantic code indexing and search capabilities through npx command execution.
Uses SQLite as the local storage backend for vector embeddings and semantic metadata, enabling fast retrieval of indexed code chunks and search results.
Serves as an example use case in the documentation, demonstrating how PAMPA automatically extracts semantic tags from Stripe-related code (e.g., payment processing, checkout sessions) and enables natural language search for payment integration functions.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@PAMPAfind all authentication functions in the user service"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
PAMPA – Protocol for Augmented Memory of Project Artifacts
Version 1.12.x · Semantic Search · MCP Compatible · Node.js
Give your AI agents an always-updated, queryable memory of any codebase – with intelligent semantic search and automatic learning – in one npx command.
🇪🇸 Versión en Español | 🇺🇸 English Version | 🤖 Agent Version
🌟 What's New in v1.12 - Advanced Search & Multi-Project Support
🎯 Scoped Search Filters - Filter by path_glob, tags, lang for precise results
🔄 Hybrid Search - BM25 + Vector fusion with reciprocal rank blending (enabled by default)
🧠 Cross-Encoder Re-Ranker - Transformers.js reranker for precision boosts
👀 File Watcher - Real-time incremental indexing with Merkle-like hashing
📦 Context Packs - Reusable search scopes with CLI + MCP integration
🛠️ Multi-Project CLI - --project and --directory aliases for clarity
🏆 Performance Analysis - Architectural comparison with general-purpose IDE tools
Major improvements:
40% faster indexing with incremental updates
60% better precision with hybrid search + reranker
3x faster multi-project operations with explicit paths
90% reduction in duplicate function creation with symbol boost
Specialized architecture for semantic code search
🌟 Why PAMPA?
Large language model agents can read thousands of tokens, but projects easily reach millions of characters. Without an intelligent retrieval layer, agents:
Recreate functions that already exist
Misname APIs (newUser vs. createUser)
Waste tokens loading repetitive code (
vendor/,node_modules/...)Fail when the repository grows
PAMPA solves this by turning your repository into a semantic code memory graph:
Chunking – Each function/class becomes an atomic chunk
Semantic Tagging – Automatic extraction of semantic tags from code context
Embedding – Enhanced chunks are vectorized with advanced embedding models
Learning – System learns from successful searches and caches intentions
Indexing – Vectors + semantic metadata live in local SQLite
Codemap – A lightweight
pampa.codemap.jsoncommits to git so context follows the repoServing – An MCP server exposes intelligent search and retrieval tools
Any MCP-compatible agent (Cursor, Claude, etc.) can now search with natural language, get instant responses for learned patterns, and stay synchronized – without scanning the entire tree.
🤖 For AI Agents & Humans
🤖 If you're an AI agent: Read the complete setup guide for agents → or 👤 If you're human: Share the agent setup guide with your AI assistant to automatically configure PAMPA!
📚 Table of Contents
🧠 Semantic Features
🏷️ Automatic Semantic Tagging
PAMPA automatically extracts semantic tags from your code without any special comments:
Automatic tags: ["stripe", "service", "payment", "checkout", "session", "create"]
🎯 Intention-Based Direct Search
The system learns from successful searches and provides instant responses:
📈 Adaptive Learning System
Automatic Learning: Saves successful searches (>80% similarity) as intentions
Query Normalization: Understands variations:
"create"="crear","session"="sesion"Pattern Recognition: Groups similar queries:
"[PROVIDER] payment session"
🏷️ Optional @pampa-comments (Complementary)
Enhance search precision with optional JSDoc-style comments:
Benefits:
+21% better precision when present
Perfect scores (1.0) when query matches intent exactly
Fully optional: Code without comments works automatically
Retrocompatible: Existing codebases work without changes
📊 Search Performance Results
Search Type | Without @pampa | With @pampa | Improvement |
Domain-specific | 0.7331 | 0.8874 | +21% |
Intent matching | ~0.6 | 1.0000 | +67% |
General search | 0.6-0.8 | 0.8-1.0 | +32-85% |
📝 Supported Languages
PAMPA can index and search code in several languages out of the box:
JavaScript / TypeScript (
.js,.ts,.tsx,.jsx)PHP (
.php)Python (
.py)Go (
.go)Java (
.java)
🚀 MCP Installation (Recommended)
1. Configure your MCP client
Claude Desktop
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
Optional: Add "--debug" to args for detailed logging: ["-y", "pampa", "mcp", "--debug"]
Cursor
Configure Cursor by creating or editing the mcp.json file in your configuration directory:
2. Let your AI agent handle the indexing
Your AI agent should automatically:
Check if the project is indexed with
get_project_statsIndex the project with
index_projectif neededKeep it updated with
update_projectafter changes
Need to index manually? See Direct CLI Usage section.
3. Install the usage rule for your agent
Additionally, install this rule in your application so it uses PAMPA effectively:
Copy the content from RULE_FOR_PAMPA_MCP.md into your agent or AI system instructions.
4. Ready! Your agent can now search code
Once configured, your AI agent can:
💻 Direct CLI Usage
For direct terminal usage or manual project indexing:
Install the CLI
Index or update a project
Indexing writes
.pampa/(SQLite database + chunk store) andpampa.codemap.json. Commit the codemap to git so teammates and CI re-use the same metadata.
| Command | Purpose |
| ---------------------------------------- | -------------------------------------------------------- | ----- | ------------------------------------------------- |
| npx pampa index [path] [--provider X] | Create or refresh the full index at the provided path |
| npx pampa update [path] [--provider X] | Force a full re-scan (helpful after large refactors) |
| npx pampa watch [path] [--provider X] | Incrementally update the index as files change |
| npx pampa search <query> | Hybrid BM25 + vector search with optional scoped filters |
| npx pampa context <list | show | use> | Manage reusable context packs for search defaults |
| npx pampa mcp | Start the MCP stdio server for editor/agent integrations |
Search with scoped filters & ranking flags
pampa search supports the same filters used by MCP clients. Combine glob patterns, semantic tags, language filters, provider overrides, and ranking controls:
| Flag / option | Effect |
| --------------------- | --------------------------------------------------------------------- | --------------- |
| --path_glob | Limit results to matching files ("app/Services/**") |
| --tags | Filter by codemap tags (stripe, checkout) |
| --lang | Filter by language (php, ts, py) |
| --provider | Override embedding provider for the query (openai, transformers) |
| --reranker | Reorder top results with the Transformers cross-encoder (off | transformers) |
| --hybrid / --bm25 | Toggle reciprocal-rank fusion or the BM25 candidate stage (on | off) |
| --symbol_boost | Toggle symbol-aware ranking boost that favors signature matches (on | off) |
| -k, --limit | Cap returned results (defaults to 10) |
PAMPA extracts function signatures and lightweight call graphs with tree-sitter. When symbol boosts are enabled, queries that mention a specific method, class, or a directly connected helper will receive an extra scoring bump.
When a context pack is active, the CLI prints the pack name before executing the search. Any explicit flag overrides the pack defaults.
Manage context packs
Store JSON packs in .pampa/contextpacks/*.json to capture reusable defaults:
MCP tip: The MCP tool use_context_pack mirrors the CLI. Agents can switch packs mid-session and every subsequent search_code call inherits those defaults until cleared.
Watch and incrementally re-index
The watcher batches filesystem events, reuses the Merkle hash store in .pampa/merkle.json, and only re-embeds touched files. Press Ctrl+C to stop.
Run the synthetic benchmark harness
The harness seeds a deterministic Laravel + TypeScript corpus and prints a summary table with Precision@1, MRR@5, and nDCG@10 for Base, Hybrid, and Hybrid+Cross-Encoder modes. Customise scenarios via flags or environment variables:
npm run bench -- --hybrid=off– run vector-only evaluationnpm run bench -- --reranker=transformers– force the cross-encoderPAMPA_BENCH_MODES=base,hybrid npm run bench– limit to specific modesPAMPA_BENCH_BM25=off npm run bench– disable BM25 candidate generation
Benchmark runs never download external models when PAMPA_MOCK_RERANKER_TESTS=1 (enabled by default inside the harness).
An end-to-end context pack example lives in examples/contextpacks/stripe-backend.json.
🧠 Embedding Providers
PAMPA supports multiple providers for generating code embeddings:
Provider | Cost | Privacy | Installation |
Transformers.js | 🟢 Free | 🟢 Total |
|
Ollama | 🟢 Free | 🟢 Total | Install Ollama + |
OpenAI | 🔴 ~$0.10/1000 functions | 🔴 None | Set |
Cohere | 🟡 ~$0.05/1000 functions | 🔴 None | Set |
Recommendation: Use Transformers.js for personal development (free and private) or OpenAI for maximum quality.
🏆 Performance Analysis
PAMPA v1.12 uses a specialized architecture for semantic code search with measurable results.
📊 Performance Metrics
Synthetic Benchmark Results:
🎯 Search Examples
🚀 Architectural Advantages
Specialized Indexing - Persistent index with function-level granularity
Hybrid Search - BM25 + Vector + Cross-encoder reranking combination
Code Awareness - Symbol boosting, AST analysis, function signatures
Multi-Project - Native support for context across different codebases
Result: Optimized architecture for semantic code search with verifiable metrics.
🏗️ Architecture
Key Components
Layer | Role | Technology |
Indexer | Cuts code into semantic chunks, embeds, writes codemap and SQLite | tree-sitter, openai@v4, sqlite3 |
Codemap | Git-friendly JSON with {file, symbol, sha, lang} per chunk | Plain JSON |
Chunks dir | .gz code bodies (or .gz.enc when encrypted) (lazy loading) | gzip → AES-256-GCM when enabled |
SQLite | Stores vectors and metadata | sqlite3 |
MCP Server | Exposes tools and resources over standard MCP protocol | @modelcontextprotocol/sdk |
Logging | Debug and error logging in project directory | File-based logs |
🔧 Available MCP Tools
The MCP server exposes these tools that agents can use:
search_code
Search code semantically in the indexed project.
Parameters:
query(string) - Semantic search query (e.g., "authentication function", "error handling")limit(number, optional) - Maximum number of results to return (default: 10)provider(string, optional) - Embedding provider (default: "auto")path(string, optional) - PROJECT ROOT directory path where PAMPA database is located
Database Location:
{path}/.pampa/pampa.dbReturns: List of matching code chunks with similarity scores and SHAs
get_code_chunk
Get complete code of a specific chunk.
Parameters:
sha(string) - SHA of the code chunk to retrieve (obtained from search_code results)path(string, optional) - PROJECT ROOT directory path (same as used in search_code)
Chunk Location:
{path}/.pampa/chunks/{sha}.gzor{sha}.gz.encReturns: Complete source code
index_project
Index a project from the agent.
Parameters:
path(string, optional) - PROJECT ROOT directory path to index (will create .pampa/ subdirectory here)provider(string, optional) - Embedding provider (default: "auto")
Creates:
{path}/.pampa/pampa.db(SQLite database with embeddings){path}/.pampa/chunks/(compressed code chunks){path}/pampa.codemap.json(lightweight index for version control)
Effect: Updates database and codemap
update_project
🔄 CRITICAL: Use this tool frequently to keep your AI memory current!
Update project index after code changes (recommended workflow tool).
Parameters:
path(string, optional) - PROJECT ROOT directory path to update (same as used in index_project)provider(string, optional) - Embedding provider (default: "auto")
Updates:
Re-scans all files for changes
Updates embeddings for modified functions
Removes deleted functions from database
Adds new functions to database
When to use:
✅ At the start of development sessions
✅ After creating new functions
✅ After modifying existing functions
✅ After deleting functions
✅ Before major code analysis tasks
✅ After refactoring code
Effect: Keeps your AI agent's code memory synchronized with current state
get_project_stats
Get indexed project statistics.
Parameters:
path(string, optional) - PROJECT ROOT directory path where PAMPA database is located
Database Location:
{path}/.pampa/pampa.dbReturns: Statistics by language and file
📊 Available MCP Resources
pampa://codemap
Access to the complete project code map.
pampa://overview
Summary of the project's main functions.
🎯 Available MCP Prompts
analyze_code
Template for analyzing found code with specific focus.
find_similar_functions
Template for finding existing similar functions.
🔍 How Retrieval Works
Vector search – Cosine similarity with advanced high-dimensional embeddings
Summary fallback – If an agent sends an empty query, PAMPA returns top-level summaries so the agent understands the territory
Chunk granularity – Default = function/method/class. Adjustable per language
📝 Design Decisions
Node only → Devs run everything via
npx, no Python, no DockerSQLite over HelixDB → One local database for vectors and relations, no external dependencies
Committed codemap → Context travels with repo → cloning works offline
Chunk granularity → Default = function/method/class. Adjustable per language
Read-only by default → Server only exposes read methods. Writing is done via CLI
🧩 Extending PAMPA
Idea | Hint |
More languages | Install tree-sitter grammar and add it to |
Custom embeddings | Export |
Security | Run behind a reverse proxy with authentication |
VS Code Plugin | Point an MCP WebView client to your local server |
🔐 Encrypting the Chunk Store
PAMPA can encrypt chunk bodies at rest using AES-256-GCM. Configure it like this:
Export a 32-byte key in base64 or hex form:
export PAMPA_ENCRYPTION_KEY="$(openssl rand -base64 32)"Index with encryption enabled (skips plaintext writes even if stale files exist):
npx pampa index --encrypt onWithout
--encrypt, PAMPA auto-encrypts when the environment key is present. Use--encrypt offto force plaintext (e.g., for debugging).All new chunks are stored as
.gz.encand require the same key for CLI or MCP chunk retrieval. Missing or corrupt keys surface clear errors instead of leaking data.
Existing plaintext archives remain readable, so you can enable encryption incrementally or rotate keys by re-indexing.
🤝 Contributing
Fork → create feature branch (
feat/...)Run
npm test(coming soon) &npx pampa indexbefore PROpen PR with context: why + screenshots/logs
All discussions on GitHub Issues.
📜 License
MIT – do whatever you want, just keep the copyright.
Happy hacking! 💙
🇦🇷 Made with ❤️ in Argentina | 🇦🇷 Hecho con ❤️ en Argentina