Salesforce Metadata-Aware RAG MCP

README.md•9.05 KiB

# Salesforce Metadata-Aware RAG MCP A Model Context Protocol (MCP) server that provides advanced RAG capabilities for Salesforce metadata and code, enabling AI copilots to understand your Salesforce org configuration through intelligent chunking and vector search. ## Features ### Core Salesforce Integration - **Metadata API Integration**: Access layouts, flows, custom objects, profiles, and permission sets - **Tooling API Integration**: Retrieve Apex classes, triggers, and validation rules - **REST API Integration**: Object schema descriptions and SOQL execution - **Rate Limiting**: Built-in API quota management and retry logic - **Incremental Sync**: Efficient updates for large orgs ### Advanced RAG Capabilities - **Intelligent Chunking**: Metadata-aware chunking system that splits Apex classes by methods, objects by fields, etc. - **Vector Indexing**: PostgreSQL + pgvector for semantic similarity search - **Keyword Search**: Full-text search with BM25 ranking - **Symbol Search**: Exact matching for Salesforce objects, fields, and code symbols - **Hybrid Search**: Combined vector + keyword search with intelligent reranking ### MCP Integration - **Direct Claude Code Integration**: Real-time Salesforce org exploration - **Structured Metadata Access**: Type-aware retrieval and processing - **Symbol Extraction**: Automatic discovery of relationships and dependencies ## Available MCP Tools - `sf_metadata_list` - List metadata components of specified types - `sf_tooling_getApexClasses` - Retrieve all Apex classes from the org - `sf_describe_object` - Describe a Salesforce object schema - `rag_status` - Get system status and API usage stats ## Development Setup ### Prerequisites - Node.js 18+ and npm - Docker and Docker Compose (for PostgreSQL + pgvector) - Salesforce org access (sandbox recommended for testing) - Connected App or Username/Password authentication - Python 3.8+ with sentence-transformers (optional, for production embeddings) ### Installation 1. **Clone and install dependencies:** ```bash npm install ``` 2. **Configure Salesforce credentials in `.env`:** ```bash # Copy the example file cp .env.example .env # Edit .env with your Salesforce credentials SF_USERNAME="your_username@company.com" SF_PASSWORD="your_password" SF_SECURITY_TOKEN="your_security_token" SF_LOGIN_URL="https://test.salesforce.com" # Use for sandbox ``` 3. **Start PostgreSQL with pgvector:** ```bash docker compose up -d postgres ``` 4. **Build the project:** ```bash npm run build ``` ### Running the Server **MCP Server (for Claude Code):** ```bash npm run dev ``` **Vector Integration Testing:** ```bash # Test chunking system node dist/test-chunking.js # Test full vector integration node dist/test-vector-integration.js # Test with live Salesforce data node dist/test-mcp-chunking.js ``` **Type checking:** ```bash npm run typecheck ``` ### MCP Integration To integrate with VS Code or Claude Desktop, add this configuration to your MCP settings: **For Claude Desktop** (add to `claude_desktop_config.json`): ```json { "mcpServers": { "salesforce-rag": { "command": "node", "args": ["/path/to/sfdxrag/dist/index.js"], "cwd": "/path/to/sfdxrag", "env": { "NODE_ENV": "production", "SF_LOGIN_URL": "https://your-org.my.salesforce.com/", "SF_USERNAME": "your_username@company.com", "SF_PASSWORD": "your_password", "SF_SECURITY_TOKEN": "your_security_token", "DOTENV_SILENT": "true", "LOG_LEVEL": "error" } } } } ``` **For Claude Code** (add to `.mcp.json` in your workspace): ```json { "mcpServers": { "salesforce-rag": { "command": "npm", "args": ["run", "dev"], "cwd": "/path/to/sfdxrag", "env": { "SF_LOGIN_URL": "https://your-org.my.salesforce.com/", "SF_USERNAME": "your_username@company.com", "SF_PASSWORD": "your_password", "SF_SECURITY_TOKEN": "your_security_token", "DOTENV_SILENT": "true", "LOG_LEVEL": "error" } } } } ``` **Adding to Claude Code MCP:** 1. **Create or update `.mcp.json`** in your workspace root: ```bash # Navigate to your project directory cd /path/to/your-project # Create .mcp.json with the salesforce-rag server configuration above # Update the "cwd" path to point to your sfdxrag installation directory ``` 2. **Alternative: Use Claude Code MCP command**: ```bash claude mcp add salesforce-rag --env SF_LOGIN_URL=https://test.salesforce.com --env SF_USERNAME=your_salesforce_username --env SF_PASSWORD=your_salesforce_password --env SF_SECURITY_TOKEN=your_security_token --env NODE_ENV=development --env LOG_LEVEL=info -- npm run dev --cwd="/path to sfdxrag/" ``` After setting up: 1. **Restart Claude Code/Desktop** to reload MCP configuration 2. **Test MCP tools**: - `sf_describe_object` with `{"objectName": "Account"}` - `sf_metadata_list` with `{"types": ["ApexClass", "Layout"]}` - `rag_status` to check system health ### Project Structure ``` src/ ├── salesforce/ # Salesforce API clients │ ├── connection.ts # Authentication layer │ ├── metadataClient.ts # Metadata API wrapper │ ├── toolingClient.ts # Tooling API wrapper │ └── restClient.ts # REST API wrapper ├── chunking/ # Metadata chunking system │ ├── types.ts # Core interfaces and types │ ├── base.ts # Base chunker implementation │ ├── apexChunker.ts # Apex class method-level chunking │ ├── customObjectChunker.ts # Object field-level chunking │ ├── factory.ts # Chunker selection factory │ └── processor.ts # Main processing pipeline ├── vector/ # Vector storage and search │ ├── embedding.ts # Embedding model interface and implementations │ └── store.ts # PostgreSQL + pgvector client ├── utils/ # Utilities │ ├── logger.ts # Winston logging setup │ ├── errorHandler.ts # Global error handling │ ├── rateLimiter.ts # API rate limiting │ └── packageGenerator.ts # package.xml generation ├── config/ # Configuration management │ └── index.ts # Environment config loader ├── mcp/ # MCP server implementation │ └── server.ts # MCP tool handlers └── index.ts # Main entry point ``` ### Environment Variables Required for Salesforce connectivity: - `SF_USERNAME` - Salesforce username - `SF_PASSWORD` - Salesforce password - `SF_SECURITY_TOKEN` - Salesforce security token - `SF_LOGIN_URL` - Login URL (https://login.salesforce.com or https://test.salesforce.com) Optional configuration: - `NODE_ENV` - Environment (development/production) - `LOG_LEVEL` - Logging level (debug/info/warn/error) - `PORT` - Server port (default: 3000) - `DB_HOST` - PostgreSQL host (default: localhost) - `DB_PORT` - PostgreSQL port (default: 5433) - `DB_NAME` - Database name (default: sfdxrag) - `DB_USER` - Database user (default: postgres) - `DB_PASSWORD` - Database password (default: postgres) ## Architecture ### Data Flow 1. **Metadata Extraction**: Retrieve Salesforce metadata via API clients 2. **Intelligent Chunking**: Process metadata using type-specific chunkers 3. **Vector Indexing**: Generate embeddings and store in PostgreSQL + pgvector 4. **Search & Retrieval**: Multi-modal search (vector + keyword + symbol) ### Chunking System The system includes specialized chunkers for different metadata types: - **ApexChunker**: Splits classes by methods, preserving signatures and docblocks - **CustomObjectChunker**: Splits objects by fields, validation rules, and metadata - **GenericChunker**: Fallback for unsupported types ### Vector Search - **Vector Search**: Semantic similarity using sentence transformers - **Keyword Search**: Full-text search with BM25 ranking - **Symbol Search**: Exact matching for Salesforce symbols (objects, fields, classes) - **Hybrid Search**: Combined search with intelligent reranking (70% vector, 30% keyword) ## Testing ### Current Test Coverage ✅ **Chunking System**: Apex classes split into method-level chunks with symbol extraction ✅ **Vector Storage**: PostgreSQL + pgvector integration with batch operations ✅ **Search Functions**: Vector, keyword, symbol, and hybrid search working ✅ **MCP Integration**: Live Salesforce data retrieval and processing ✅ **Symbol Detection**: Automatic discovery of custom objects and dependencies ### Example Results From Apex class analysis: - **Method-level chunking** with separate chunks for class declaration and each method - **Symbol extraction** working for custom objects, standard objects, and system calls - **Search functionality** verified across all modes: semantic, keyword, symbol, hybrid ## Production Deployment For production use: - Configure real embedding models using `SentenceTransformerEmbedding` - Set up persistent PostgreSQL instance with appropriate resource allocation - Configure proper authentication and security for multi-tenant access - Implement monitoring and performance optimization for large metadata volumes

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/eddywebs/sfdxmcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•9.05 KiB