Berlin Group MCP Server
Graph database integration for exploring complex relationships and dependencies between API endpoints, schemas, and data models.
Provides vector embeddings for semantic search across PDF documentation using OpenAI's embedding models.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Berlin Group MCP Serverfind the schema for a standing order in PIS"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Berlin Group MCP Server
A Model Context Protocol (MCP) server that provides Berlin Group Open Finance API specifications as contextual information to AI assistants in VS Code and IntelliJ IDEA.
Overview
This MCP server loads and indexes Berlin Group Open Finance specifications from OpenAPI YAML files and PDF documentation, enabling LLMs to provide accurate, specification-compliant guidance during Open Finance Framework implementation.
The server features advanced AI-powered capabilities including:
Semantic Search via ChromaDB for intelligent, context-aware document retrieval
Graph Database via Neo4j for exploring complex relationships between API endpoints, schemas, and data models
Vector Embeddings for natural language queries across PDF documentation
Relationship Traversal for understanding dependencies and schema inheritance
Related MCP server: Swagger Navigator MCP Server
Features
π Complete Specification Access: Loads all Berlin Group OpenAPI specs (AIS, PIS, PIIS, BASK, Consent, etc.)
π Powerful Search: Search across endpoints, schemas, and PDF documentation
π― Smart Filtering: Filter endpoints by method, tag, or specification
π PDF Support: Extract and search content from implementation guides
π§ Semantic Search: AI-powered semantic search using ChromaDB vector embeddings for natural language queries
πΈοΈ Graph Database: Neo4j integration for exploring complex relationships and dependencies
π Relationship Traversal: Navigate through schema references, endpoint dependencies, and API interconnections
βοΈ Intelligent Text Chunking: Splits PDF documents into semantically meaningful chunks for better retrieval
π Automatic Fallback: Gracefully falls back to in-memory storage when ChromaDB or Neo4j are unavailable
π οΈ 24 MCP Tools: Comprehensive toolset including 12 core tools + 6 semantic search tools + 6 graph database tools
π Multi-IDE Support: Works in VS Code and IntelliJ IDEA
Available Specifications
The server indexes the following Berlin Group specifications:
Account Information Services (AIS) v2.3
Payment Initiation Services (PIS) v2.3
Confirmation of Funds (PIIS) v2.3
Bank Account Status Services (BASK) v2.2
Consent Management v2.1
Data Dictionary v2.3.1
Payment Update Status Hub (PUSH) v2.2
Installation
Prerequisites
Node.js v18 or higher
npm or yarn
VS Code or IntelliJ IDEA with MCP support
Optional: ChromaDB server for semantic search (runs on localhost:8000 by default)
Optional: Neo4j database for graph queries (runs on localhost:7687 by default)
Setup
Clone or navigate to the project directory:
cd path-of-the-repo/Berlin-group-mcpInstall dependencies:
npm installBuild the project:
npm run build
Configuration
VS Code
The configuration file is already created at
.vscode/mcp-settings.jsonUpdate the path if needed to match your project location:
{ "mcpServers": { "berlin-group": { "command": "node", "args": [ "absolute-path-of-the-repo/Berlin-group-mcp/build/index.js" ] } } }Restart VS Code or reload the window
The Berlin Group tools should now be available in GitHub Copilot Chat
IntelliJ IDEA
See INTELLIJ_SETUP.md for detailed configuration instructions.
Dependencies
The project uses the following key dependencies:
Core Dependencies
@modelcontextprotocol/sdk (^1.0.4): MCP protocol implementation
js-yaml (^4.1.0): YAML parsing for OpenAPI specifications
pdf-parse (^1.1.1): PDF document text extraction
Advanced Features
chromadb (^1.8.1): Vector database client for semantic search
Enables AI-powered document retrieval
Optional: Falls back to in-memory storage if unavailable
neo4j-driver (^5.27.0): Neo4j graph database driver
Enables complex relationship queries
Optional: Falls back to in-memory graph if unavailable
Development Dependencies
typescript (^5.7.3): TypeScript compiler
jest (^29.7.0): Testing framework
ts-jest (^29.1.2): TypeScript support for Jest
Type definitions for all major dependencies
All dependencies are automatically installed with npm install.
Optional: External Database Configuration
The Berlin Group MCP Server can optionally use external databases for enhanced capabilities. Both are completely optional β the server works perfectly without them using in-memory storage.
Configuration Methods
The server supports two methods for configuration:
Environment Variables (Recommended): Create a
.envfile in the project rootDirect Configuration: Modify the configuration in
src/index.ts
Environment Variables
Copy the .env.example file to .env and customize:
cp .env.example .envThen edit .env with your settings:
# ChromaDB Configuration (for Semantic Search)
CHROMA_HOST=localhost
CHROMA_PORT=8000
CHROMA_COLLECTION=berlin_group_pdfs
# OpenAI Configuration (for embeddings)
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
# Neo4j Configuration (for Graph Database)
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=password
NEO4J_DATABASE=neo4j
NEO4J_MAX_POOL_SIZE=50
NEO4J_CONNECTION_TIMEOUT=60000Direct Configuration
Alternatively, you can modify the configuration directly in src/index.ts:
const indexer = new SpecificationIndexer({
vectorStore: {
chromaHost: 'localhost',
chromaPort: 8000,
collectionName: 'my_collection',
embeddingModel: 'text-embedding-3-small'
},
graphStore: {
uri: 'bolt://localhost:7687',
username: 'neo4j',
password: 'password',
database: 'neo4j',
maxConnectionPoolSize: 50,
connectionAcquisitionTimeout: 60000
}
});ChromaDB (for Semantic Search)
ChromaDB enables AI-powered semantic search across PDF documentation using vector embeddings.
Installation:
# Using pip
pip install chromadb
# Or using Docker
docker run -d -p 8000:8000 chromadb/chromaDefault Configuration:
Host:
localhostPort:
8000Collection:
berlin_group_pdfs
The server automatically connects during initialization. If ChromaDB is unavailable, semantic search falls back to keyword matching.
Neo4j (for Graph Database)
Neo4j enables complex relationship queries and graph traversal across specifications, endpoints, and schemas.
Installation:
# Using Docker (recommended)
docker run -d \
-p 7475:7474 -p 7688:7687 \
-e NEO4J_AUTH=neo4j/password \
--name neo4j_berling_group_mcp \
neo4j:latest
# Or download from https://neo4j.com/download/Default Configuration:
URI:
bolt://localhost:7687Username:
neo4jPassword:
passwordDatabase:
neo4j
The server automatically connects during initialization. If Neo4j is unavailable, graph queries use in-memory implementation.
Neo4j Browser Access:
Once Neo4j is running, access the browser interface at http://localhost:7474 to visualize the graph:
// Example queries in Neo4j Browser
MATCH (s:Specification)-[:DEFINES_ENDPOINT]->(e:Endpoint)
RETURN s, e LIMIT 25MATCH (e:Endpoint)-[:USES_SCHEMA]->(s:Schema) WHERE e.path CONTAINS 'payment' RETURN e, s
MATCH path = (s1:Schema)-[:REFERENCES*1..3]->(s2:Schema) WHERE s1.name = 'PaymentInitiation' RETURN path
### Embedding Providers
For production deployments with ChromaDB, consider using advanced embedding providers:
**OpenAI Embeddings** (highest quality):
```typescript
// Set environment variable
export OPENAI_API_KEY="your-api-key"
// Modify vectorStore.ts to use OpenAIEmbeddingProvider
const embeddingProvider = new OpenAIEmbeddingProvider(
process.env.OPENAI_API_KEY,
'text-embedding-3-small' // or 'text-embedding-3-large'
);Local Embeddings (default, no API required): The server includes a built-in TF-IDF-based embedding provider that works without external APIs. It's automatically used when no other provider is configured.
Deployment Scenarios
Scenario | ChromaDB | Neo4j | Tools Available | Best For |
Full Stack | β Running | β Running | 24 tools | Production, research, complex analysis |
Semantic Focus | β Running | β Not available | 18 tools | Documentation search, Q&A |
Graph Focus | β Not available | β Running | 18 tools | API architecture analysis |
Minimal/Dev | β Not available | β Not available | 12 tools | Development, basic queries |
Architecture
Core Components
The Berlin Group MCP Server is built with a modular architecture consisting of several specialized components:
1. YAML Parser (yamlParser.ts)
Parses Berlin Group OpenAPI specifications from YAML files, extracting:
API endpoints (paths, methods, parameters)
Schema definitions and data models
Tags, descriptions, and metadata
Request/response specifications
2. PDF Parser (pdfParser.ts)
Processes PDF documentation files using pdf-parse library:
Extracts full text content from PDF documents
Performs keyword-based text search
Provides document summaries and metadata
3. Text Chunker (textChunker.ts)
Implements intelligent document segmentation for vector embedding:
Recursive Character Splitting: Breaks text at natural boundaries (paragraphs, sentences, clauses)
Configurable Chunk Size: Default 1000 characters with 200 character overlap for context continuity
Metadata Preservation: Tracks source file, chunk index, section headers, and page estimates
Semantic Coherence: Maintains meaning by avoiding splits mid-sentence when possible
4. Vector Store (vectorStore.ts)
Manages semantic search capabilities using ChromaDB:
ChromaDB Integration: Optional connection to ChromaDB server for persistent vector storage
Local Embedding Provider: Built-in TF-IDF-like embedding generation when external APIs are unavailable
OpenAI Embedding Support: Configurable integration with OpenAI's embedding models (text-embedding-3-small, text-embedding-3-large)
Automatic Fallback: Uses in-memory vector storage when ChromaDB server is unavailable
Semantic Search: Natural language queries with relevance scoring and distance metrics
Metadata Filtering: Search within specific files or document sections
How Vector Store Works:
PDF documents are split into chunks by the Text Chunker
Each chunk is converted to a vector embedding (384-3072 dimensions depending on provider)
Embeddings are stored in ChromaDB collection or in-memory fallback
User queries are embedded using the same model
Cosine similarity finds the most relevant chunks
Results are ranked by relevance score (0.0 to 1.0)
5. Graph Store (graphStore.ts)
Manages graph database operations with Neo4j:
Neo4j Integration: Optional connection to Neo4j database for complex relationship queries
In-Memory Fallback: Complete graph implementation when Neo4j is unavailable
Connection Management: Handles driver lifecycle, sessions, and transactions
CRUD Operations: Create/read nodes and relationships with typed interfaces
Cypher Query Execution: Direct access to Neo4j's powerful query language
Statistics: Provides metrics on node counts, relationships, and graph density
Graph Node Types:
Specification: OpenAPI spec metadata (title, version, description)
Endpoint: API paths with HTTP methods
Schema: Data models and type definitions
Property: Schema fields with types and constraints
Parameter: Request parameters (query, header, path, cookie)
Response: HTTP response definitions with status codes
Tag: Endpoint categorization
Graph Relationship Types:
DEFINES_ENDPOINT: Specification β EndpointDEFINES_SCHEMA: Specification β SchemaHAS_PARAMETER: Endpoint β ParameterHAS_RESPONSE: Endpoint β ResponseUSES_SCHEMA: Endpoint/Parameter/Response β SchemaREFERENCES: Schema β Schema (for $ref relationships)HAS_PROPERTY: Schema β PropertyTAGGED_WITH: Endpoint β Tag
6. Graph Indexer (graphIndexer.ts)
Transforms OpenAPI specifications into graph structures:
Specification Indexing: Creates nodes for each loaded specification file
Endpoint Extraction: Parses all API endpoints with full details
Schema Mapping: Extracts all data models and their properties
Relationship Building: Connects endpoints to schemas, parameters, and responses
Reference Resolution: Follows
$refpointers to build schema dependency graphsProgress Tracking: Provides real-time feedback during indexing operations
Error Handling: Gracefully handles malformed specifications
Indexing Process:
Load YAML files from
yml_files/directoryCreate Specification nodes for each file
Extract and create Endpoint nodes
Extract and create Schema nodes with properties
Build relationships between all entities
Index into Neo4j or in-memory store
7. Graph Models (graphModels.ts)
Defines TypeScript interfaces for type-safe graph operations:
Node interfaces (SpecificationNode, EndpointNode, SchemaNode, etc.)
Relationship type enums
Query result types (GraphTraversalResult, PatternSearchResult, etc.)
DTO types for creating nodes
Utility functions for ID generation and reference extraction
8. Specification Indexer (indexer.ts)
Orchestrates all components and provides unified API:
Coordinates YAML Parser, PDF Parser, Vector Store, and Graph Indexer
Manages initialization sequence and error handling
Provides high-level search and query methods
Handles fallback scenarios when optional services are unavailable
Aggregates statistics across all subsystems
Data Flow
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β YAML Files βββββ>β YAML Parser βββββ>β Graph Indexer β
β (OpenAPI) β β β β β
βββββββββββββββββββ ββββββββββββββββββββ ββββββββββ¬βββββββββ
β
v
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β PDF Files βββββ>β PDF Parser + βββββ>β Vector Store β
β (Documentation) β β Text Chunker β β (ChromaDB) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
v
ββββββββββββββββββββββββββββ
β Specification Indexer β
β (Unified Interface) β
ββββββββββββββ¬ββββββββββββββ
β
v
ββββββββββββββββββββββββββββ
β MCP Server Tools β
β (24 Tools Available) β
ββββββββββββββββββββββββββββFallback Mechanisms
The server is designed to work in various deployment scenarios:
Full Stack (ChromaDB + Neo4j):
All 24 tools available
Best performance and capabilities
Semantic search with persistent vectors
Complex graph queries with Cypher
Vector Only (ChromaDB, no Neo4j):
18 tools available (core + semantic search)
Graph queries use in-memory implementation
Good for semantic search focused use cases
Graph Only (Neo4j, no ChromaDB):
18 tools available (core + graph database)
Semantic search falls back to keyword matching
Good for relationship exploration use cases
Minimal (no external databases):
12 core tools available
All operations use in-memory storage
Keyword-based search only
Suitable for basic queries and development
The server automatically detects available services during initialization and adjusts its capabilities accordingly. Users are informed which features are available through statistics and status endpoints.
Available Tools
The server provides 24 MCP tools organized into three categories:
Core Tools (12 tools)
Search and Discovery
search_endpoints- Search for API endpoints across all specificationsExample: "Find all payment endpoints"search_schemas- Search for data schemas and modelsExample: "Find schemas related to transaction"search_pdf_documentation- Search through PDF documentation using keyword matchingExample: "Search for SCA requirements"search_all- Comprehensive keyword search across all sources (endpoints, schemas, PDFs)Example: "Find everything about consent"
Endpoint Information
get_endpoint_details- Get detailed information about a specific endpointParameters: path, method Example: path="/v1/accounts", method="GET"filter_endpoints_by_tag- Filter endpoints by tagExample: tag="accounts"filter_endpoints_by_method- Filter endpoints by HTTP methodExample: method="POST"
Schema Information
get_schema- Get a specific schema definitionParameters: schemaName, specFile (optional) Example: schemaName="AccountDetails"
Specification Management
list_specifications- List all available OpenAPI specificationsget_specification_details- Get comprehensive details about a specific specParameters: fileNamelist_pdf_documents- List all available PDF documentationget_statistics- Get basic statistics about loaded specifications
Semantic Search Tools (6 tools)
These tools use ChromaDB and vector embeddings for intelligent, context-aware document retrieval. When ChromaDB is unavailable, they automatically fall back to keyword-based search.
search_pdf_semantic- Perform semantic search across PDF documentationParameters: query (string), topK (number, default: 10) Example: "What are the authentication requirements for payment initiation?" How it works: - Converts your natural language query into a vector embedding - Finds the most semantically similar document chunks - Returns results ranked by relevance score (0.0-1.0) - Understands synonyms and related concepts (e.g., "authenticate" matches "authorization")search_pdf_semantic_filtered- Semantic search with metadata filtersParameters: query (string), fileName (optional), section (optional), topK (optional) Example: query="SCA exemptions", fileName="Implementation_Guide.pdf" Use cases: - Search within a specific document - Filter by document section - Narrow results to relevant portionssearch_all_semantic- Comprehensive semantic search across all sourcesParameters: query (string), topK (number, default: 10) Example: "How do I handle declined payments?" Returns: - Matching endpoints (keyword search) - Matching schemas (keyword search) - Semantically similar PDF content (vector search)get_vector_store_stats- Get vector store statisticsReturns: - enabled: Whether vector store is operational - totalChunks: Number of indexed document chunks - collectionName: ChromaDB collection name - isInMemory: Whether using in-memory fallback
Graph Database Tools (6 tools)
These tools use Neo4j for exploring complex relationships between specifications, endpoints, schemas, and data models. When Neo4j is unavailable, they use an in-memory graph implementation.
graph_find_related_schemas- Find schemas related through $ref referencesParameters: schemaName (string), specFile (optional), maxDepth (number, default: 3) Example: schemaName="AccountReference" Use cases: - Understand schema inheritance hierarchies - Find all schemas that reference a particular type - Discover composed data models - Map schema dependenciesgraph_get_endpoint_dependencies- Get all dependencies of an API endpointParameters: path (string), method (string), specFile (optional) Example: path="/v1/payments/sepa-credit-transfers", method="POST" Returns: - All request parameters (query, header, path, body) - Request body schema and nested schemas - All possible response codes and their schemas - Complete dependency treegraph_traverse_relationships- Execute custom graph traversal with filtersParameters: - startNodeType: Type of starting node (Specification, Endpoint, Schema, etc.) - startNodeFilter: Property filters (e.g., {name: "AccountReference"}) - relationshipTypes: Optional list of relationship types to follow - maxDepth: Maximum traversal depth (default: 3) Example: startNodeType="Schema", startNodeFilter={name: "PaymentInitiation*"}, relationshipTypes=["REFERENCES", "USES_SCHEMA"] Use cases: - Custom relationship exploration - Multi-hop dependency analysis - Pattern-based graph queriesgraph_get_specification_graph- Get complete graph for a specificationParameters: fileName (string) Example: fileName="BG_oFA_PIS_Version_2.3_20251128.openapi.yaml" Returns: - All endpoints in the specification - All schemas and their properties - All relationships between entities - Complete specification structure as a graphgraph_search_by_pattern- Search graph nodes by property patternsParameters: - nodeType: Type of node to search - pattern: Property pattern with wildcards (e.g., {path: "/v1/accounts*"}) - limit: Maximum results (default: 50) Example: nodeType="Endpoint", pattern={path: "/v1/payments/*", method: "POST"} Supports wildcards: - {name: "*Account*"} - Contains "Account" - {path: "/v1/accounts*"} - Starts with "/v1/accounts" - {method: "POST"} - Exact matchget_graph_store_stats- Get graph database statisticsReturns: - enabled: Whether graph store is operational - usingNeo4j: Whether connected to Neo4j (true) or using in-memory (false) - Node counts by type (Specification, Endpoint, Schema, etc.) - Relationship counts by type - Indexing metrics (duration, errors) - Graph density metrics
Tool Selection Guide
Use Core Tools when:
You need exact endpoint paths or schema names
You want to filter by tags or HTTP methods
You're looking for specific specification details
Use Semantic Search Tools when:
You have natural language questions
You're exploring concepts across documentation
You don't know the exact terminology
You want AI-powered relevance ranking
Use Graph Database Tools when:
You need to understand relationships and dependencies
You're exploring schema inheritance
You want to analyze endpoint complexity
You need to traverse multi-level references
Usage Examples
In VS Code with GitHub Copilot
Basic Queries
You: "What endpoints are available for account information?"
Copilot: [Uses search_endpoints tool to find AIS endpoints]
You: "Show me the schema for payment initiation request"
Copilot: [Uses search_schemas tool to find payment schemas]Semantic Search Queries
You: "How do I implement Strong Customer Authentication?"
Copilot: [Uses search_pdf_semantic to find relevant SCA documentation with AI ranking]
You: "What are the requirements for payment authorization?"
Copilot: [Uses search_all_semantic to find endpoints, schemas, and semantically related PDF content]
You: "Find information about transaction status in the PIS specification"
Copilot: [Uses search_pdf_semantic_filtered with fileName filter]Graph Database Queries
You: "What schemas does AccountReference depend on?"
Copilot: [Uses graph_find_related_schemas to traverse schema relationships]
You: "Show me all dependencies for the payment initiation endpoint"
Copilot: [Uses graph_get_endpoint_dependencies to get parameters, request/response schemas]
You: "Find all endpoints that use the Amount schema"
Copilot: [Uses graph_traverse_relationships starting from Amount schema]
You: "Get the complete API structure for the AIS specification"
Copilot: [Uses graph_get_specification_graph to return full specification graph]Advanced Analysis
You: "Compare the complexity of payment endpoints vs account endpoints"
Copilot: [Uses graph_get_endpoint_dependencies for multiple endpoints and compares]
You: "What are all the possible error responses for account endpoints?"
Copilot: [Uses graph_traverse_relationships to find all response schemas]
You: "Show me all schemas that contain PII (personally identifiable information)"
Copilot: [Uses search_pdf_semantic to find PII references, then graph_search_by_pattern to find related schemas]Programmatic Usage
The server can also be used programmatically via the MCP protocol:
// Example tool call
{
"method": "tools/call",
"params": {
"name": "search_endpoints",
"arguments": {
"query": "payment"
}
}
}Project Structure
Berlin-group-mcp/
βββ src/
β βββ index.ts # Main MCP server with 24 tool definitions
β βββ indexer.ts # Specification indexer orchestrating all components
β βββ yamlParser.ts # OpenAPI YAML parser
β βββ pdfParser.ts # PDF document parser
β βββ textChunker.ts # Intelligent text chunking for vector embeddings
β βββ vectorStore.ts # ChromaDB integration for semantic search
β βββ graphStore.ts # Neo4j integration and in-memory graph store
β βββ graphIndexer.ts # Graph database indexer
β βββ graphModels.ts # TypeScript interfaces for graph entities
βββ yml_files/ # Berlin Group OpenAPI specs (7 specifications)
β βββ BG_oFA_AIS_Version_2.3_20250818.openapi.yaml
β βββ BG_oFA_PIS_Version_2.3_20251128.openapi.yaml
β βββ BG_oFA_PIIS_Version_2.3_20250818.openapi.yaml
β βββ BG_oFA_BASK_Version_2.2_20251128.openapi.yaml
β βββ BG_oFA_Consent_Version_2.1_20251128.openapi.yaml
β βββ BG_oFA_dataDictionary_Version_2.2.6_20250818.openapi.yaml
β βββ BG_oFA_PUSH_Version_2.2_20250818.openapi.yaml
βββ pdf_files/ # PDF documentation (implementation guides, frameworks)
βββ tests/
β βββ unit/ # Unit tests for individual components
β β βββ vectorStore.test.ts
β β βββ graphStore.test.ts
β β βββ graphIndexer.test.ts
β β βββ graphModels.test.ts
β β βββ textChunker.test.ts
β βββ integration/ # Integration tests
β βββ semanticSearch.test.ts
β βββ graphSearch.test.ts
βββ build/ # Compiled JavaScript (generated)
βββ docs/ # Architecture documentation and diagrams
β βββ architecture/
β βββ diagrams/ # PlantUML diagrams for system architecture
βββ postman/ # Postman collection for testing MCP tools
βββ package.json # Dependencies: chromadb, neo4j-driver, pdf-parse, etc.
βββ tsconfig.json
βββ jest.config.js # Test configuration
βββ .vscode/
β βββ mcp-settings.json # VS Code MCP configuration
βββ INTELLIJ_SETUP.md # IntelliJ configuration guide
βββ README.mdDevelopment
Running Tests
The project includes comprehensive unit and integration tests:
# Run all tests
npm test
# Run tests in watch mode
npm run test:watch
# Run with coverage report
npm run test:coverageTest Coverage:
Unit Tests:
vectorStore.test.ts,graphStore.test.ts,graphIndexer.test.ts,graphModels.test.ts,textChunker.test.tsIntegration Tests:
semanticSearch.test.ts,graphSearch.test.ts
Watch Mode
To automatically rebuild on file changes:
npm run watchDebugging
To debug the server with Node.js inspector:
npm run inspectorAdding New Specifications
Add YAML files to
yml_files/directoryAdd PDF files to
pdf_files/directoryRebuild the project:
npm run buildRestart the MCP server (reload VS Code or restart IDE)
New specifications will be automatically indexed on next startup
Extending the Server
Adding a New Tool:
Define tool schema in
src/index.tsTOOLS arrayAdd handler in
CallToolRequestSchemahandlerImplement business logic in
src/indexer.tsUpdate README documentation
Adding a New Embedding Provider:
Implement
EmbeddingProviderinterface insrc/vectorStore.tsAdd
embed()andembedQuery()methodsConfigure in
src/indexer.tsor via environment variables
Customizing Graph Schema:
Add new node types in
src/graphModels.tsAdd relationships in
RelationshipTypeenumUpdate indexing logic in
src/graphIndexer.tsAdd query methods in
src/graphStore.ts
Troubleshooting
Server Not Starting
Check Node.js version:
node --version(should be v18+)Verify build completed:
ls -la build/(should see .js files)Check for errors: Look in VS Code Developer Tools console (Help β Toggle Developer Tools)
Rebuild:
npm run build
Tools Not Appearing
Ensure MCP settings file exists: Check
.vscode/mcp-settings.jsonVerify correct paths: Ensure the path to
build/index.jsis absolute and correctRestart VS Code completely: Close all windows and reopen
Check GitHub Copilot: Ensure Copilot is enabled and working
Check console logs: Open Developer Tools and look for MCP connection errors
No Results from Search
Verify YAML and PDF files exist:
ls -la yml_files/ pdf_files/Check server logs: Look for initialization errors in console
Ensure files are readable: Check file permissions
Try reindexing: Delete and rebuild:
rm -rf build && npm run build
Semantic Search Not Working
Check if ChromaDB is running (optional):
curl http://localhost:8000/api/v1/heartbeatReview initialization logs: Should see "Indexed X PDF chunks in vector store"
Check fallback mode: Server will fall back to keyword search if ChromaDB unavailable
Verify vector store stats: Use
get_vector_store_statstoolCheck ChromaDB logs (if running via Docker):
docker logs <chromadb-container-id>
Graph Database Not Working
Check if Neo4j is running (optional):
curl http://localhost:7474 # Or check Docker: docker ps | grep neo4jVerify credentials: Default is
neo4j/neo4j(change on first login)Review initialization logs: Should see "Graph indexing complete: X specs, Y endpoints, Z schemas"
Check fallback mode: Server will use in-memory graph if Neo4j unavailable
Verify graph store stats: Use
get_graph_store_statstoolTest Neo4j connection:
# Using cypher-shell cypher-shell -u neo4j -p your-password
Performance Issues
Large PDF files: Consider splitting into smaller documents
ChromaDB slow:
Use local deployment instead of remote
Reduce
topKparameter in semantic searchesConsider faster embedding provider
Neo4j slow:
Check if indexes are created
Reduce
maxDepthin graph traversalsOptimize Cypher queries
Memory usage high:
Use external databases (ChromaDB + Neo4j) instead of in-memory
Reduce number of specifications loaded
Connection Errors
ChromaDB Connection Refused:
Error: connect ECONNREFUSED 127.0.0.1:8000Solution: ChromaDB is not running or running on different port. Server will automatically fall back to in-memory mode.
Neo4j Connection Failed:
Neo4jError: Could not connect to bolt://localhost:7687Solution: Neo4j is not running or wrong credentials. Server will automatically fall back to in-memory mode.
Permission Issues
# If index.js is not executable
chmod +x build/index.js
# If YAML/PDF files are not readable
chmod -R 644 yml_files/*.yaml pdf_files/*.pdfDebugging Tips
Enable verbose logging: Set
NODE_ENV=developmentbefore starting serverCheck initialization sequence: Server logs show each phase
Test individual components:
npm test -- vectorStore.test.ts npm test -- graphStore.test.tsVerify tool availability: Use
get_statistics,get_vector_store_stats,get_graph_store_statstoolsCheck MCP communication: Look for JSON-RPC messages in developer console
Common Error Messages
Error | Cause | Solution |
"Specifications not yet loaded" | Server still initializing | Wait 5-10 seconds and retry |
"Semantic search is not available" | ChromaDB not connected | Normal, falls back to keyword search |
"Graph store is not available" | Neo4j not connected | Normal, falls back to in-memory |
"Collection not found" | ChromaDB collection missing | Server creates it automatically on startup |
"Authentication failed" | Wrong Neo4j credentials | Update credentials in code or use default |
Technical Details
MCP Protocol
This server implements the Model Context Protocol specification (2025-11-25):
Tools: 24 tools organized into core, semantic search, and graph database categories
Resources: Direct access to specification files via
berlin-group://URI schemeTransport: stdio-based communication for IDE integration
Component Architecture
Parser Features
YAML Parser:
Extracts paths, operations, schemas, components from OpenAPI 3.0+ specs
Handles
$refpointer resolutionValidates specification structure
Indexes tags, parameters, and responses
PDF Parser:
Uses
pdf-parselibrary for text extractionPreserves document structure and metadata
Enables full-text keyword search
Provides page number estimation
Text Chunker:
Recursive character splitting algorithm
Configurable chunk size (default: 1000 chars) and overlap (default: 200 chars)
Maintains semantic coherence across chunks
Preserves metadata (file name, section, page number)
Vector Store Implementation
ChromaDB Integration:
HTTP client connection to ChromaDB server
Collection-based document organization
Metadata filtering support
Cosine similarity for relevance scoring
Embedding Providers:
LocalEmbeddingProvider: TF-IDF-based, 384 dimensions, no external dependencies
OpenAIEmbeddingProvider: GPT-based, 1536 or 3072 dimensions, requires API key
Pluggable architecture for custom providers
Search Algorithms:
Query embedding generation
K-nearest neighbors (KNN) search
Distance metrics (cosine similarity, L2 distance)
Relevance score normalization (0.0 to 1.0)
Graph Store Implementation
Neo4j Integration:
Bolt protocol driver (neo4j-driver v5.27.0)
Connection pooling for performance
Transaction management
Cypher query execution
Graph Schema:
Nodes: Specification, Endpoint, Schema, Property, Parameter, Response, Tag Relationships: DEFINES_ENDPOINT, DEFINES_SCHEMA, HAS_PARAMETER, HAS_RESPONSE, USES_SCHEMA, REFERENCES, HAS_PROPERTY, TAGGED_WITHIn-Memory Fallback:
Complete graph implementation using Maps
Same API as Neo4j implementation
Supports all query patterns
Suitable for development and testing
Indexing Process
Initialization (parallel):
Load YAML files β Parse specifications β Extract endpoints/schemas
Load PDF files β Parse documents β Chunk text β Generate embeddings
Vector Store Indexing:
Chunk all PDF documents (typical: 200-500 chunks per document)
Generate embeddings for each chunk
Store in ChromaDB with metadata
Build search index
Graph Store Indexing:
Create Specification nodes
Create Endpoint nodes with relationships
Create Schema nodes with properties
Create Parameter and Response nodes
Build REFERENCES relationships for $ref pointers
Create Tag nodes and relationships
Error Handling:
Graceful degradation if databases unavailable
Detailed logging of indexing progress
Error collection without stopping process
Fallback to in-memory storage
Performance
Initial Load Time:
YAML parsing: ~500ms (7 specifications)
PDF parsing: ~1-2s (depends on file count/size)
Vector indexing: ~2-5s (depends on chunk count and embedding provider)
Graph indexing: ~1-3s (depends on database connection)
Total: ~5-10 seconds for full initialization
Query Performance:
Keyword search: <10ms (in-memory search)
Semantic search: 50-200ms (depends on ChromaDB response time and top-k)
Graph queries: 10-100ms (simple queries), 100-500ms (complex traversals)
In-memory fallback: <50ms for most operations
Memory Usage:
Base (specifications): ~20-30MB
Vector store (in-memory): +30-50MB
Graph store (in-memory): +20-40MB
Total: ~70-120MB (without external databases)
With external databases: ~30-50MB (stores data externally)
Scalability:
Can handle 100+ specifications
Supports 1000+ PDF pages
Graph queries scale with Neo4j (millions of nodes)
Vector search scales with ChromaDB (millions of chunks)
Quick Reference
Tool Categories Summary
Category | Count | Purpose | Requires |
Core Tools | 12 | Basic search, filtering, specification access | None (built-in) |
Semantic Search | 6 | AI-powered document retrieval, natural language queries | ChromaDB (optional) |
Graph Database | 6 | Relationship exploration, dependency analysis | Neo4j (optional) |
Total | 24 | Complete specification analysis toolkit | Node.js only |
Key Features Comparison
Feature | Without Databases | With ChromaDB | With Neo4j | With Both |
Endpoint Search | β Keyword | β Keyword | β Keyword | β Keyword |
Schema Search | β Keyword | β Keyword | β Keyword | β Keyword |
PDF Search | β Keyword | β Semantic + Keyword | β Keyword | β Semantic + Keyword |
Schema Relationships | β In-memory | β In-memory | β Neo4j Graph | β Neo4j Graph |
Endpoint Dependencies | β In-memory | β In-memory | β Neo4j Graph | β Neo4j Graph |
Graph Traversal | β Limited | β Limited | β Full Cypher | β Full Cypher |
Performance | Good | Excellent (PDF) | Excellent (Graph) | Excellent (Both) |
Memory Usage | ~120MB | ~70MB | ~80MB | ~50MB |
Common Queries Cheat Sheet
// Find endpoints
"search for payment endpoints"
β Uses: search_endpoints
// Find schemas
"show me the AccountDetails schema"
β Uses: get_schema or search_schemas
// Natural language search (semantic)
"how to handle authentication errors?"
β Uses: search_pdf_semantic
// Find related schemas
"what schemas does PaymentInitiation reference?"
β Uses: graph_find_related_schemas
// Analyze endpoint
"what are all the parameters and responses for POST /v1/payments?"
β Uses: graph_get_endpoint_dependencies
// Explore relationships
"show me all schemas that use Address type"
β Uses: graph_traverse_relationships
// Get overview
"show me statistics about the loaded specifications"
β Uses: get_statistics, get_vector_store_stats, get_graph_store_statsLicense
MIT
References
Model Context Protocol - MCP specification and documentation
Berlin Group Open Finance - Official Berlin Group website
MCP SDK Documentation - TypeScript SDK for MCP
ChromaDB Documentation - Vector database for AI applications
Neo4j Documentation - Graph database platform
OpenAPI Specification - API specification format
Contributing
Contributions are welcome! Please ensure:
TypeScript code follows project conventions
All tools have proper error handling
Documentation is updated for new features
Tests are added for new components (unit tests in
tests/unit/, integration tests intests/integration/)New embedding providers implement the
EmbeddingProviderinterfaceNew graph node types are added to
graphModels.tsREADME is updated with examples and usage instructions
Areas for Contribution
Additional embedding providers (Cohere, HuggingFace, etc.)
Enhanced graph query capabilities
Additional Berlin Group specifications
Performance optimizations
Additional MCP tools
Documentation improvements
Support
For issues or questions:
Check the troubleshooting section
Review MCP documentation
Check Berlin Group specification documentation
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/Borelli-7/bg-mcp-server'
If you have feedback or need assistance with the MCP directory API, please join our Discord server