Salesforce Metadata-Aware RAG MCP

PRD.md•6.39 KiB

# Product Requirements Document (PRD) ## Product: Salesforce Metadata-Aware RAG MCP ### Author: Fascinating Concepts ### Date: August 2025 --- ## 1. Overview The Salesforce Metadata-Aware RAG MCP is a system that consumes Salesforce metadata and code using the Metadata API, Tooling API, and REST/Describe endpoints. It normalizes and chunks this content into RAG-ready artifacts stored in hybrid indexes (vector + BM25/FTS). An MCP interface exposes ingestion, retrieval, and where-used tools to developers and AI copilots, enabling Salesforce org-aware assistance with citations. --- ## 2. Goals & Objectives - **Enable metadata ingestion**: Apex, Triggers, Layouts, Flows, Validation Rules, Profiles, PermissionSets, Custom Objects & Fields. - **Support Salesforce-aware intelligence**: Ensure answers reflect org-specific configuration and data models. - **Improve developer productivity**: Reduce time spent searching metadata/code and resolving errors. - **Provide RAG foundation for copilots**: Supply grounded knowledge to VS Code, Slack, or in-org copilots via MCP. --- ## 3. Use Cases 1. **Where-Used Lookups**: “Where is `Account.Industry` used?” → Apex, layouts, flows, validation rules, profiles/permsets. 2. **Schema Exploration**: “What fields exist on Opportunity?” → REST `describeSObject` with citations. 3. **Error Tracing**: Developer pastes error → map stack trace to Apex classes and relevant metadata. 4. **SOQL Guidance**: Retrieve metadata to generate safe SOQL with field validation and picklist awareness. 5. **FLS/CRUD Checks**: Validate code snippets against profile and permission set access. --- ## 4. Functional Requirements ### 4.1 Metadata Ingestion - **Metadata API**: `listMetadata`, `readMetadata`, `retrieve(package.xml)` for Layouts, Flows, CustomObjects, Profiles, PermissionSets. - **Tooling API**: ApexClass, ApexTrigger, ValidationRule. - **REST/Describe APIs**: Object schema, FLS/CRUD, picklists. - **SOQL (optional)**: Sample data retrieval for context. ### 4.2 Normalization & Chunking - Apex → per method (include signature + docblock). - Triggers → per event section. - LWC → pair `.js + .html`; keep `*-meta.xml` separate. - Layouts → per section. - Validation Rules → per rule. - Flows → per element (with connectors). - Profiles/PermSets → per object CRUD/FLS block + ApexClassAccess. - CustomObject/Field → per field (include picklists). ### 4.3 Feature Enrichment - Extract symbols: `Object`, `Object.Field`, `Schema.SObjectType.X`. - Identify DML ops, SOQL tables, FLS/CRUD usage. - Build edges: `(Object.Field) → (Apex/Flow/Layout/Profile)`. ### 4.4 Indexing - **Vector store** (pgvector/Qdrant) for semantic retrieval. - **Keyword index** (Postgres FTS/OpenSearch) for symbol precision. - Hybrid retrieval (union → dedup → rerank). ### 4.5 MCP Tools - `sf.metadata.list/read/retrieve` - `sf.tooling.soql`, `sf.describe.object`, `sf.soql` - `rag.ingest_org`, `rag.search`, `rag.where_used`, `rag.open`, `rag.status` --- ## 5. Non-Functional Requirements - **Performance**: Retrieval in <1.0s (p95); ingestion batch in <15m for mid-sized org. - **Scalability**: Handle orgs with 100k+ metadata items. - **Reliability**: Retry failed API calls, exponential backoff, API quota respect. - **Security**: OAuth/JWT auth with read-only integration user; tenant isolation by org_id; encrypted storage; strip PII. --- ## 6. Roadmap ### Phase 1 (MVP) - Basic ingestion (Apex, Layouts, Validation Rules, Objects). - Chunk + embed + index. - Expose MCP: `sf.metadata.*`, `rag.ingest_org`, `rag.search`, `rag.status`. ### Phase 2 - Add Profiles, PermissionSets, Flows. - Implement hybrid retrieval with rerank. - Add `rag.where_used`. ### Phase 3 - VS Code & Slack integration. - Error-to-metadata mapping. - PR impact summarization. ### Phase 4 - Multi-org tenancy. - Governance layer (citation enforcement, hallucination guard). - Graph DB for advanced where-used analysis. --- ## 7. Success Metrics - **Coverage**: ≥ 90% of key metadata types ingested. - **Accuracy**: ≥ 90% of answers cite valid metadata/code. - **Adoption**: ≥ 50% dev team weekly use within 3 months. - **Efficiency**: ≥ 30% reduction in time spent searching code/metadata. --- ## 8. Risks & Mitigations - **API Limits**: Use incremental deltas + batch `listMetadata`. - **Large Orgs**: Parallelize ingestion; shard indexes. - **Security**: Guard against PII embedding; enforce org_id isolation. - **Complexity Drift**: Enforce modular chunkers; validate with evaluation set. --- ## 9. Recommended Libraries & Projects ### 9.1 Salesforce Connectivity - **[JSforce](https://github.com/jsforce/jsforce)** (Node.js) Mature and well-maintained; covers **Metadata API, Tooling API, REST, Bulk** in one SDK. → Recommended as the single ingestion layer. ### 9.2 Document Parsing & Normalization - **[xml2js](https://github.com/Leonidas-from-XIV/node-xml2js)** (Node) Reliable XML parsing for Salesforce metadata (Layouts, Flows, Profiles, etc.). → Use alongside custom **Salesforce-aware chunkers** (per method, layout section, flow element, etc.). ### 9.3 Embeddings - **[sentence-transformers](https://github.com/UKPLab/sentence-transformers)** (Python) HuggingFace embeddings (e.g., `all-MiniLM-L6-v2`) — open source, fast, and good balance of accuracy vs. cost. → Recommended default for local/self-hosted; can swap to OpenAI/Cohere if SaaS is acceptable. ### 9.4 Vector & Keyword Indexing - **[pgvector](https://github.com/pgvector/pgvector)** (Postgres extension) Enables **vector search + FTS (BM25)** in one database. → Simplifies ops (one datastore instead of Qdrant + Elastic). ### 9.5 Retrieval & RAG Orchestration - **[LangChain](https://github.com/langchain-ai/langchain)** Widely adopted; has retrievers, rerankers, chunkers, and vector store integrations. → Use only as plumbing inside MCP tools (`rag.search`, `rag.where_used`), not as the outer protocol. ### 9.6 MCP (Model Context Protocol) - **[MCP Node SDK](https://github.com/modelcontextprotocol/ts-sdk)** Official TypeScript SDK; integrates cleanly with JSforce ingestion and Postgres/pgvector retrieval. → Recommended for implementing the `sf.*` and `rag.*` tools. ### 9.7 Observability & Evaluation - **[DeepEval](https://github.com/confident-ai/deepeval)** Purpose-built for RAG evaluation (Recall@k, MRR, groundedness). → Lightweight and open source; ideal for building a Salesforce-specific golden set.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/eddywebs/sfdxmcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

PRD.md•6.39 KiB