# Salesforce Metadata-Aware RAG MCP
A Model Context Protocol (MCP) server that provides advanced RAG capabilities for Salesforce metadata and code, enabling AI copilots to understand your Salesforce org configuration through intelligent chunking and vector search.
## Features
### Core Salesforce Integration
- **Metadata API Integration**: Access layouts, flows, custom objects, profiles, and permission sets
- **Tooling API Integration**: Retrieve Apex classes, triggers, and validation rules
- **REST API Integration**: Object schema descriptions and SOQL execution
- **Rate Limiting**: Built-in API quota management and retry logic
- **Incremental Sync**: Efficient updates for large orgs
### Advanced RAG Capabilities
- **Intelligent Chunking**: Metadata-aware chunking system that splits Apex classes by methods, objects by fields, etc.
- **Vector Indexing**: PostgreSQL + pgvector for semantic similarity search
- **Keyword Search**: Full-text search with BM25 ranking
- **Symbol Search**: Exact matching for Salesforce objects, fields, and code symbols
- **Hybrid Search**: Combined vector + keyword search with intelligent reranking
### MCP Integration
- **Direct Claude Code Integration**: Real-time Salesforce org exploration
- **Structured Metadata Access**: Type-aware retrieval and processing
- **Symbol Extraction**: Automatic discovery of relationships and dependencies
## Available MCP Tools
- `sf_metadata_list` - List metadata components of specified types
- `sf_tooling_getApexClasses` - Retrieve all Apex classes from the org
- `sf_describe_object` - Describe a Salesforce object schema
- `rag_status` - Get system status and API usage stats
## Development Setup
### Prerequisites
- Node.js 18+ and npm
- Docker and Docker Compose (for PostgreSQL + pgvector)
- Salesforce org access (sandbox recommended for testing)
- Connected App or Username/Password authentication
- Python 3.8+ with sentence-transformers (optional, for production embeddings)
### Installation
1. **Clone and install dependencies:**
```bash
npm install
```
2. **Configure Salesforce credentials in `.env`:**
```bash
# Copy the example file
cp .env.example .env
# Edit .env with your Salesforce credentials
SF_USERNAME="your_username@company.com"
SF_PASSWORD="your_password"
SF_SECURITY_TOKEN="your_security_token"
SF_LOGIN_URL="https://test.salesforce.com" # Use for sandbox
```
3. **Start PostgreSQL with pgvector:**
```bash
docker compose up -d postgres
```
4. **Build the project:**
```bash
npm run build
```
### Running the Server
**MCP Server (for Claude Code):**
```bash
npm run dev
```
**Vector Integration Testing:**
```bash
# Test chunking system
node dist/test-chunking.js
# Test full vector integration
node dist/test-vector-integration.js
# Test with live Salesforce data
node dist/test-mcp-chunking.js
```
**Type checking:**
```bash
npm run typecheck
```
### MCP Integration
To integrate with VS Code or Claude Desktop, add this configuration to your MCP settings:
**For Claude Desktop** (add to `claude_desktop_config.json`):
```json
{
"mcpServers": {
"salesforce-rag": {
"command": "node",
"args": ["/path/to/sfdxrag/dist/index.js"],
"cwd": "/path/to/sfdxrag",
"env": {
"NODE_ENV": "production",
"SF_LOGIN_URL": "https://your-org.my.salesforce.com/",
"SF_USERNAME": "your_username@company.com",
"SF_PASSWORD": "your_password",
"SF_SECURITY_TOKEN": "your_security_token",
"DOTENV_SILENT": "true",
"LOG_LEVEL": "error"
}
}
}
}
```
**For Claude Code** (add to `.mcp.json` in your workspace):
```json
{
"mcpServers": {
"salesforce-rag": {
"command": "npm",
"args": ["run", "dev"],
"cwd": "/path/to/sfdxrag",
"env": {
"SF_LOGIN_URL": "https://your-org.my.salesforce.com/",
"SF_USERNAME": "your_username@company.com",
"SF_PASSWORD": "your_password",
"SF_SECURITY_TOKEN": "your_security_token",
"DOTENV_SILENT": "true",
"LOG_LEVEL": "error"
}
}
}
}
```
**Adding to Claude Code MCP:**
1. **Create or update `.mcp.json`** in your workspace root:
```bash
# Navigate to your project directory
cd /path/to/your-project
# Create .mcp.json with the salesforce-rag server configuration above
# Update the "cwd" path to point to your sfdxrag installation directory
```
2. **Alternative: Use Claude Code MCP command**:
```bash
claude mcp add salesforce-rag --env SF_LOGIN_URL=https://test.salesforce.com --env SF_USERNAME=your_salesforce_username --env SF_PASSWORD=your_salesforce_password --env SF_SECURITY_TOKEN=your_security_token --env NODE_ENV=development --env LOG_LEVEL=info -- npm run dev --cwd="/path to sfdxrag/"
```
After setting up:
1. **Restart Claude Code/Desktop** to reload MCP configuration
2. **Test MCP tools**:
- `sf_describe_object` with `{"objectName": "Account"}`
- `sf_metadata_list` with `{"types": ["ApexClass", "Layout"]}`
- `rag_status` to check system health
### Project Structure
```
src/
├── salesforce/ # Salesforce API clients
│ ├── connection.ts # Authentication layer
│ ├── metadataClient.ts # Metadata API wrapper
│ ├── toolingClient.ts # Tooling API wrapper
│ └── restClient.ts # REST API wrapper
├── chunking/ # Metadata chunking system
│ ├── types.ts # Core interfaces and types
│ ├── base.ts # Base chunker implementation
│ ├── apexChunker.ts # Apex class method-level chunking
│ ├── customObjectChunker.ts # Object field-level chunking
│ ├── factory.ts # Chunker selection factory
│ └── processor.ts # Main processing pipeline
├── vector/ # Vector storage and search
│ ├── embedding.ts # Embedding model interface and implementations
│ └── store.ts # PostgreSQL + pgvector client
├── utils/ # Utilities
│ ├── logger.ts # Winston logging setup
│ ├── errorHandler.ts # Global error handling
│ ├── rateLimiter.ts # API rate limiting
│ └── packageGenerator.ts # package.xml generation
├── config/ # Configuration management
│ └── index.ts # Environment config loader
├── mcp/ # MCP server implementation
│ └── server.ts # MCP tool handlers
└── index.ts # Main entry point
```
### Environment Variables
Required for Salesforce connectivity:
- `SF_USERNAME` - Salesforce username
- `SF_PASSWORD` - Salesforce password
- `SF_SECURITY_TOKEN` - Salesforce security token
- `SF_LOGIN_URL` - Login URL (https://login.salesforce.com or https://test.salesforce.com)
Optional configuration:
- `NODE_ENV` - Environment (development/production)
- `LOG_LEVEL` - Logging level (debug/info/warn/error)
- `PORT` - Server port (default: 3000)
- `DB_HOST` - PostgreSQL host (default: localhost)
- `DB_PORT` - PostgreSQL port (default: 5433)
- `DB_NAME` - Database name (default: sfdxrag)
- `DB_USER` - Database user (default: postgres)
- `DB_PASSWORD` - Database password (default: postgres)
## Architecture
### Data Flow
1. **Metadata Extraction**: Retrieve Salesforce metadata via API clients
2. **Intelligent Chunking**: Process metadata using type-specific chunkers
3. **Vector Indexing**: Generate embeddings and store in PostgreSQL + pgvector
4. **Search & Retrieval**: Multi-modal search (vector + keyword + symbol)
### Chunking System
The system includes specialized chunkers for different metadata types:
- **ApexChunker**: Splits classes by methods, preserving signatures and docblocks
- **CustomObjectChunker**: Splits objects by fields, validation rules, and metadata
- **GenericChunker**: Fallback for unsupported types
### Vector Search
- **Vector Search**: Semantic similarity using sentence transformers
- **Keyword Search**: Full-text search with BM25 ranking
- **Symbol Search**: Exact matching for Salesforce symbols (objects, fields, classes)
- **Hybrid Search**: Combined search with intelligent reranking (70% vector, 30% keyword)
## Testing
### Current Test Coverage
✅ **Chunking System**: Apex classes split into method-level chunks with symbol extraction
✅ **Vector Storage**: PostgreSQL + pgvector integration with batch operations
✅ **Search Functions**: Vector, keyword, symbol, and hybrid search working
✅ **MCP Integration**: Live Salesforce data retrieval and processing
✅ **Symbol Detection**: Automatic discovery of custom objects and dependencies
### Example Results
From Apex class analysis:
- **Method-level chunking** with separate chunks for class declaration and each method
- **Symbol extraction** working for custom objects, standard objects, and system calls
- **Search functionality** verified across all modes: semantic, keyword, symbol, hybrid
## Production Deployment
For production use:
- Configure real embedding models using `SentenceTransformerEmbedding`
- Set up persistent PostgreSQL instance with appropriate resource allocation
- Configure proper authentication and security for multi-tenant access
- Implement monitoring and performance optimization for large metadata volumes