Which integrations are available for this server?

Uses [Cheerio](/mcp/servers/integrations/cheerio) as a pinned extractor for HTML content extraction and text processing in the pipeline. Generates [Docker](/mcp/servers/integrations/docker)file configurations for containerized deployment of exported RAG projects. Supports pushing exported project repositories to [GitHub](/mcp/servers/integrations/github) for version control and deployment workflows. Supports [markdown](/mcp/servers/integrations/markdown) document extraction and processing as part of the generic document extraction phase. Provides vector database integration for upserting embeddings and storing indexed content in [Milvus](/mcp/servers/integrations/milvus). Generates embeddings using [OpenAI](/mcp/servers/integrations/openai)'s models (e.g., text-embedding-3-small) for vector search and retrieval. Generates [railway](/mcp/servers/integrations/railway).toml deployment configurations for automated deployment of exported RAG projects to Railway. Generates [TypeScript](/mcp/servers/integrations/typescript)-based MCP servers with search tools as part of project export for deployment.

IndexFoundry-MCP

Deterministic Vector Index Factory - An MCP server for automated, auditable vector database creation from any content source, with deployable project-based workflows.

Tools don't think, they act.

Every tool in this server is:

Deterministic: Same inputs → same outputs
Idempotent: Re-running produces identical artifacts (unless force: true)
Auditable: Every operation produces manifests, hashes, and logs
Composable: Tools can be run independently or chained

Architecture

IndexFoundry provides two complementary workflows:

1. Run-Based Pipeline (Fine-Grained Control)

Individual pipeline runs with isolated artifacts, suitable for experimentation and detailed auditing.

2. Project-Based Workflow (Deployable RAG Applications)

Self-contained projects that generate deployment-ready repositories with MCP server, Dockerfile, and Railway configuration.

Pipeline Phases (Run-Based)

Connect → Extract → Normalize → Index → Serve ↓ ↓ ↓ ↓ ↓ raw/ extracted/ normalized/ indexed/ served/

Phase 1: Connect

Fetch content from URLs, sitemaps, folders, or PDFs. Every artifact gets a content hash.

Phase 2: Extract

Convert raw bytes to text using pinned extractors (pdfminer, cheerio, etc.).

Phase 3: Normalize

Chunk text deterministically, enrich metadata (no LLM), and deduplicate.

Phase 4: Index

Generate embeddings with a pinned model, upsert to vector DB.

Phase 5: Serve

Generate OpenAPI spec and optionally start a retrieval API.

Quick Start

# Install dependencies npm install # Build npm run build # Run on stdio (for Claude Desktop, Cline, etc.) npm start # Run as HTTP server npm run start:http

Workflow Options

Option 1: Run-Based Pipeline (Detailed Control)

Use individual pipeline tools for fine-grained control over each phase:

// Create a new run const runId = crypto.randomUUID(); await client.callTool("indexfoundry_connect_folder", { run_id: runId, path: "/path/to/documents", glob: "**/*.pdf" }); // Extract PDF content await client.callTool("indexfoundry_extract_pdf", { run_id: runId, pdf_path: "raw/<sha256>.pdf", mode: "layout" }); // Chunk: text await client.callTool("indexfoundry_normalize_chunk", { run_id: runId, input_paths: ["extracted/<sha256>.pages.jsonl"], strategy: "recursive", max_chars: 1500, overlap_chars: 150 }); // Generate embeddings await client.callTool("indexfoundry_index_embed", { run_id: runId, model: { provider: "openai", model_name: "text-embedding-3-small", api_key_env: "OPENAI_API_KEY" } }); // Upsert to vector DB await client.callTool("indexfoundry_index_upsert", { run_id: runId, provider: "local", connection: { collection: "my_docs" } });

Option 2: Project-Based Workflow (Deployable RAG)

Create a self-contained, deployable RAG application:

// Create a new project await client.callTool("indexfoundry_project_create", { project_id: "my-rag-app", name: "My RAG Search", description: "Searchable knowledge base for documentation", embedding_model: { provider: "openai", model_name: "text-embedding-3-small", api_key_env: "OPENAI_API_KEY" }, chunk_config: { strategy: "recursive", max_chars: 1500, overlap_chars: 150 } }); // Add data sources await client.callTool("indexfoundry_project_add_source", { project_id: "my-rag-app", url: "https://docs.example.com", source_name: "Documentation Site", tags: ["docs", "api"] }); // Build: vector database await client.callTool("indexfoundry_project_build", { project_id: "my-rag-app" }); // Query: built index await client.callTool("indexfoundry_project_query", { project_id: "my-rag-app", query: "How do I configure authentication?", mode: "hybrid", top_k: 5 }); // Export for deployment await client.callTool("indexfoundry_project_export", { project_id: "my-rag-app", server_name: "my-rag-server", include_http: true, railway_config: true });

After export, a project directory contains a complete deployable repository:

Dockerfile - Container configuration
railway.toml - Railway deployment config
src/index.ts - Generated MCP server with search tools
README.md - Project-specific documentation

Push to GitHub and deploy:

cd projects/my-rag-app git init git add . git commit -m "Initial RAG application" git push # Then connect to Railway and deploy

Tool Overview

Run-Based Pipeline Tools

Connect Phase

indexfoundry_connect_url - Fetch a single URL with domain allowlisting
indexfoundry_connect_sitemap - Crawl a sitemap with URL filtering
indexfoundry_connect_folder - Load local files with glob patterns
indexfoundry_connect_pdf - Fetch PDF with metadata extraction

Extract Phase

indexfoundry_extract_pdf - PDF to text (layout/plain/OCR modes)
indexfoundry_extract_html - HTML to clean text with structure preservation
indexfoundry_extract_document - Generic document extraction (markdown, txt, CSV, JSON)

Normalize Phase

indexfoundry_normalize_chunk - Split text into chunks (recursive/paragraph/heading/page/sentence/fixed)
indexfoundry_normalize_enrich - Add metadata (language detection, regex tags, section classification)
indexfoundry_normalize_dedupe - Remove duplicates (exact/simhash/minhash)

Index Phase

indexfoundry_index_embed - Generate embeddings (OpenAI/Cohere/sentence-transformers/local)
indexfoundry_index_upsert - Write to vector DB (Pinecone/Weaviate/Qdrant/Milvus/Chroma/local)
indexfoundry_index_build_profile - Configure retrieval (top_k, hybrid search, reranking)

Serve Phase

indexfoundry_serve_openapi - Generate OpenAPI 3.1 specification
indexfoundry_serve_start - Start HTTP search API server
indexfoundry_serve_stop - Stop running API server
indexfoundry_serve_status - Get server status
indexfoundry_serve_query - Query running server directly

Run Utilities

indexfoundry_run_status - Get detailed status of a run
indexfoundry_run_list - List all runs with filtering
indexfoundry_run_diff - Compare two runs (config, chunks, timing)
indexfoundry_run_cleanup - Delete old runs with retention policies

Project-Based Workflow Tools

Project Management

indexfoundry_project_create - Create a new project with embedding and chunk config
indexfoundry_project_list - List all projects with optional statistics
indexfoundry_project_get - Get project details, manifest, and sources
indexfoundry_project_delete - Delete a project (requires confirm: true)

Source Management

indexfoundry_project_add_source - Add data source (url/sitemap/folder/pdf) with tags

Build & Query

indexfoundry_project_build - Process all pending sources (fetch, chunk, embed, upsert)
indexfoundry_project_query - Search project's vector database (semantic/keyword/hybrid)

Deployment

indexfoundry_project_export - Generate deployment files (Dockerfile, MCP server, railway.toml)

Directory Structures

Run-Based Structure

runs/<run_id>/ ├── manifest.json # Master audit trail ├── config.json # Frozen config ├── raw/ # Fetched artifacts ├── extracted/ # Text extraction ├── normalized/ # Chunks ├── indexed/ # Embeddings ├── served/ # API artifacts └── logs/ # Event logs

Project-Based Structure

projects/<project_id>/ ├── project.json # Project manifest (embedding config, stats) ├── sources.jsonl # Source records (url/sitemap/folder/pdf) ├── data/ │ ├── chunks.jsonl # Indexed chunks │ └── vectors.jsonl # Generated embeddings ├── runs/ # Per-source build runs ├── src/ │ └── index.ts # Generated MCP server ├── Dockerfile # Container configuration ├── railway.toml # Railway deployment config ├── package.json # Server dependencies ├── tsconfig.json # TypeScript config └── README.md # Project documentation

Configuration

Environment Variables

# Run-based pipeline INDEXFOUNDRY_RUNS_DIR=./runs # Where to store runs # Embeddings OPENAI_API_KEY=sk-... # For OpenAI embeddings EMBEDDING_API_KEY=sk-... # Generic env variable (configurable per project) # Server PORT=3000 # For HTTP transport TRANSPORT=stdio # stdio or http

Project Configuration

Projects store configuration in project.json:

{ "project_id": "my-rag", "name": "My RAG Search", "embedding_model": { "provider": "openai", "model_name": "text-embedding-3-small", "api_key_env": "OPENAI_API_KEY" }, "chunk_config": { "strategy": "recursive", "max_chars": 1500, "overlap_chars": 150 } }

Example Usage

Run-Based Pipeline Example

// Create a new run const runId = crypto.randomUUID(); // Connect: fetch from folder await client.callTool("indexfoundry_connect_folder", { run_id: runId, path: "/path/to/documents", glob: "**/*.pdf" }); // Extract: PDF to text await client.callTool("indexfoundry_extract_pdf", { run_id: runId, pdf_path: "raw/<sha256>.pdf", mode: "layout" }); // Normalize: chunk text await client.callTool("indexfoundry_normalize_chunk", { run_id: runId, input_paths: ["extracted/<sha256>.pages.jsonl"], strategy: "recursive", max_chars: 1500, overlap_chars: 150 }); // Index: generate embeddings await client.callTool("indexfoundry_index_embed", { run_id: runId, model: { provider: "openai", model_name: "text-embedding-3-small", api_key_env: "OPENAI_API_KEY" } }); // Upsert to local vector DB await client.callTool("indexfoundry_index_upsert", { run_id: runId, provider: "local", connection: { collection: "my_docs" } }); // Serve: start HTTP API await client.callTool("indexfoundry_serve_start", { run_id: runId, port: 8080 });

Project-Based Workflow Example

// Create a deployable RAG project await client.callTool("indexfoundry_project_create", { project_id: "my-docs-rag", name: "Company Documentation Search", description: "Searchable knowledge base for internal docs", embedding_model: { provider: "openai", model_name: "text-embedding-3-small", api_key_env": "OPENAI_API_KEY" }, chunk_config: { strategy: "recursive", max_chars: 1500, overlap_chars: 150 } }); // Add multiple sources await client.callTool("indexfoundry_project_add_source", { project_id: "my-docs-rag", url: "https://docs.company.com", source_name: "Main Docs", tags: ["docs", "internal"] }); await client.callTool("indexfoundry_project_add_source", { project_id: "my-docs-rag", folder_path: "/path/to/pdfs", source_name: "Policy Documents", tags: ["policy", "pdf"] }); // Build: vector database await client.callTool("indexfoundry_project_build", { project_id: "my-docs-rag" }); // Query: index const results = await client.callTool("indexfoundry_project_query", { project_id: "my-docs-rag", query: "What is the vacation policy?", mode: "hybrid", top_k: 5, filter_tags: ["policy"] }); // Export for deployment await client.callTool("indexfoundry_project_export", { project_id: "my-docs-rag", server_name: "docs-search-server", server_description: "Internal documentation search API", include_http: true, railway_config: true });

After export, a project directory contains a deployable repository:

cd projects/my-docs-rag git init git add . git commit -m "Initial RAG application" git push origin main # Deploy on Railway

Development

# Development with watch mode npm run dev # Run tests npm test # Lint npm run lint # Test with MCP Inspector npm run inspector

Testing

The MCP server has been validated with end-to-end testing:

✅ Project creation, listing, and retrieval
✅ Source addition (URL, folder, PDF, sitemap)
✅ Build pipeline (fetch → chunk → embed → upsert)
✅ Vector search with semantic, keyword, and hybrid modes
✅ Deployment file generation (Dockerfile, railway.toml, MCP server)

Deployment

Railway Deployment

Create and export a project:

await client.callTool("indexfoundry_project_export", { project_id: "my-rag", railway_config: true });

Push to GitHub and connect to Railway
Railway automatically detects railway.toml and deploys

Docker Deployment

cd projects/my-rag docker build -t my-rag-server . docker run -p 8080:8080 -e OPENAI_API_KEY=sk-... my-rag-server

Determinism Guarantees

Sorted inputs: File lists sorted before processing
Stable IDs: Chunk IDs derived from content + position
Content hashes: SHA256 on every artifact
Pinned versions: Extractor versions locked in config
No randomness: No sampling, shuffling, or non-deterministic algorithms

License

MIT

IndexFoundry-MCP

Architecture

1. Run-Based Pipeline (Fine-Grained Control)

2. Project-Based Workflow (Deployable RAG Applications)

Pipeline Phases (Run-Based)

Phase 1: Connect

Phase 2: Extract

Phase 3: Normalize

Phase 4: Index

Phase 5: Serve

Quick Start

Workflow Options

Option 1: Run-Based Pipeline (Detailed Control)

Option 2: Project-Based Workflow (Deployable RAG)

Tool Overview

Run-Based Pipeline Tools

Connect Phase

Extract Phase

Normalize Phase

Index Phase

Serve Phase

Run Utilities

Project-Based Workflow Tools

Project Management

Source Management

Build & Query

Deployment

Directory Structures

Run-Based Structure

Project-Based Structure

Configuration

Environment Variables

Project Configuration

Example Usage

Run-Based Pipeline Example

Project-Based Workflow Example

Development

Testing

Deployment

Railway Deployment

Docker Deployment

Determinism Guarantees

License

Resources

New MCP Servers

Latest Blog Posts

MCP directory API