Skip to main content
Glama
Borelli-7

Berlin Group MCP Server

by Borelli-7

Berlin Group MCP Server

A Model Context Protocol (MCP) server that provides Berlin Group Open Finance API specifications as contextual information to AI assistants in VS Code and IntelliJ IDEA.

Overview

This MCP server loads and indexes Berlin Group Open Finance specifications from OpenAPI YAML files and PDF documentation, enabling LLMs to provide accurate, specification-compliant guidance during Open Finance Framework implementation.

The server features advanced AI-powered capabilities including:

  • Semantic Search via ChromaDB for intelligent, context-aware document retrieval

  • Graph Database via Neo4j for exploring complex relationships between API endpoints, schemas, and data models

  • Vector Embeddings for natural language queries across PDF documentation

  • Relationship Traversal for understanding dependencies and schema inheritance

Related MCP server: Swagger Navigator MCP Server

Features

  • πŸ“š Complete Specification Access: Loads all Berlin Group OpenAPI specs (AIS, PIS, PIIS, BASK, Consent, etc.)

  • πŸ” Powerful Search: Search across endpoints, schemas, and PDF documentation

  • 🎯 Smart Filtering: Filter endpoints by method, tag, or specification

  • πŸ“– PDF Support: Extract and search content from implementation guides

  • 🧠 Semantic Search: AI-powered semantic search using ChromaDB vector embeddings for natural language queries

  • πŸ•ΈοΈ Graph Database: Neo4j integration for exploring complex relationships and dependencies

  • πŸ“Š Relationship Traversal: Navigate through schema references, endpoint dependencies, and API interconnections

  • βœ‚οΈ Intelligent Text Chunking: Splits PDF documents into semantically meaningful chunks for better retrieval

  • πŸ”„ Automatic Fallback: Gracefully falls back to in-memory storage when ChromaDB or Neo4j are unavailable

  • πŸ› οΈ 24 MCP Tools: Comprehensive toolset including 12 core tools + 6 semantic search tools + 6 graph database tools

  • πŸ”Œ Multi-IDE Support: Works in VS Code and IntelliJ IDEA

Available Specifications

The server indexes the following Berlin Group specifications:

  • Account Information Services (AIS) v2.3

  • Payment Initiation Services (PIS) v2.3

  • Confirmation of Funds (PIIS) v2.3

  • Bank Account Status Services (BASK) v2.2

  • Consent Management v2.1

  • Data Dictionary v2.3.1

  • Payment Update Status Hub (PUSH) v2.2

Installation

Prerequisites

  • Node.js v18 or higher

  • npm or yarn

  • VS Code or IntelliJ IDEA with MCP support

  • Optional: ChromaDB server for semantic search (runs on localhost:8000 by default)

  • Optional: Neo4j database for graph queries (runs on localhost:7687 by default)

Setup

  1. Clone or navigate to the project directory:

    cd path-of-the-repo/Berlin-group-mcp
  2. Install dependencies:

    npm install
  3. Build the project:

    npm run build

Configuration

VS Code

  1. The configuration file is already created at .vscode/mcp-settings.json

  2. Update the path if needed to match your project location:

    {
      "mcpServers": {
        "berlin-group": {
          "command": "node",
          "args": [
            "absolute-path-of-the-repo/Berlin-group-mcp/build/index.js"
          ]
        }
      }
    }
  3. Restart VS Code or reload the window

  4. The Berlin Group tools should now be available in GitHub Copilot Chat

IntelliJ IDEA

See INTELLIJ_SETUP.md for detailed configuration instructions.

Dependencies

The project uses the following key dependencies:

Core Dependencies

  • @modelcontextprotocol/sdk (^1.0.4): MCP protocol implementation

  • js-yaml (^4.1.0): YAML parsing for OpenAPI specifications

  • pdf-parse (^1.1.1): PDF document text extraction

Advanced Features

  • chromadb (^1.8.1): Vector database client for semantic search

    • Enables AI-powered document retrieval

    • Optional: Falls back to in-memory storage if unavailable

  • neo4j-driver (^5.27.0): Neo4j graph database driver

    • Enables complex relationship queries

    • Optional: Falls back to in-memory graph if unavailable

Development Dependencies

  • typescript (^5.7.3): TypeScript compiler

  • jest (^29.7.0): Testing framework

  • ts-jest (^29.1.2): TypeScript support for Jest

  • Type definitions for all major dependencies

All dependencies are automatically installed with npm install.

Optional: External Database Configuration

The Berlin Group MCP Server can optionally use external databases for enhanced capabilities. Both are completely optional – the server works perfectly without them using in-memory storage.

Configuration Methods

The server supports two methods for configuration:

  1. Environment Variables (Recommended): Create a .env file in the project root

  2. Direct Configuration: Modify the configuration in src/index.ts

Environment Variables

Copy the .env.example file to .env and customize:

cp .env.example .env

Then edit .env with your settings:

# ChromaDB Configuration (for Semantic Search)
CHROMA_HOST=localhost
CHROMA_PORT=8000
CHROMA_COLLECTION=berlin_group_pdfs

# OpenAI Configuration (for embeddings)
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_EMBEDDING_MODEL=text-embedding-3-small

# Neo4j Configuration (for Graph Database)
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=password
NEO4J_DATABASE=neo4j
NEO4J_MAX_POOL_SIZE=50
NEO4J_CONNECTION_TIMEOUT=60000

Direct Configuration

Alternatively, you can modify the configuration directly in src/index.ts:

const indexer = new SpecificationIndexer({
  vectorStore: {
    chromaHost: 'localhost',
    chromaPort: 8000,
    collectionName: 'my_collection',
    embeddingModel: 'text-embedding-3-small'
  },
  graphStore: {
    uri: 'bolt://localhost:7687',
    username: 'neo4j',
    password: 'password',
    database: 'neo4j',
    maxConnectionPoolSize: 50,
    connectionAcquisitionTimeout: 60000
  }
});

ChromaDB enables AI-powered semantic search across PDF documentation using vector embeddings.

Installation:

# Using pip
pip install chromadb

# Or using Docker
docker run -d -p 8000:8000 chromadb/chroma

Default Configuration:

  • Host: localhost

  • Port: 8000

  • Collection: berlin_group_pdfs

The server automatically connects during initialization. If ChromaDB is unavailable, semantic search falls back to keyword matching.

Neo4j (for Graph Database)

Neo4j enables complex relationship queries and graph traversal across specifications, endpoints, and schemas.

Installation:

# Using Docker (recommended)
docker run -d \
  -p 7475:7474 -p 7688:7687 \
  -e NEO4J_AUTH=neo4j/password \
  --name neo4j_berling_group_mcp \
  neo4j:latest

# Or download from https://neo4j.com/download/

Default Configuration:

  • URI: bolt://localhost:7687

  • Username: neo4j

  • Password: password

  • Database: neo4j

The server automatically connects during initialization. If Neo4j is unavailable, graph queries use in-memory implementation.

Neo4j Browser Access: Once Neo4j is running, access the browser interface at http://localhost:7474 to visualize the graph:

// Example queries in Neo4j Browser
MATCH (s:Specification)-[:DEFINES_ENDPOINT]->(e:Endpoint)
RETURN s, e LIMIT 25

MATCH (e:Endpoint)-[:USES_SCHEMA]->(s:Schema) WHERE e.path CONTAINS 'payment' RETURN e, s

MATCH path = (s1:Schema)-[:REFERENCES*1..3]->(s2:Schema) WHERE s1.name = 'PaymentInitiation' RETURN path


### Embedding Providers

For production deployments with ChromaDB, consider using advanced embedding providers:

**OpenAI Embeddings** (highest quality):
```typescript
// Set environment variable
export OPENAI_API_KEY="your-api-key"

// Modify vectorStore.ts to use OpenAIEmbeddingProvider
const embeddingProvider = new OpenAIEmbeddingProvider(
  process.env.OPENAI_API_KEY,
  'text-embedding-3-small'  // or 'text-embedding-3-large'
);

Local Embeddings (default, no API required): The server includes a built-in TF-IDF-based embedding provider that works without external APIs. It's automatically used when no other provider is configured.

Deployment Scenarios

Scenario

ChromaDB

Neo4j

Tools Available

Best For

Full Stack

βœ… Running

βœ… Running

24 tools

Production, research, complex analysis

Semantic Focus

βœ… Running

❌ Not available

18 tools

Documentation search, Q&A

Graph Focus

❌ Not available

βœ… Running

18 tools

API architecture analysis

Minimal/Dev

❌ Not available

❌ Not available

12 tools

Development, basic queries

Architecture

Core Components

The Berlin Group MCP Server is built with a modular architecture consisting of several specialized components:

1. YAML Parser (yamlParser.ts)

Parses Berlin Group OpenAPI specifications from YAML files, extracting:

  • API endpoints (paths, methods, parameters)

  • Schema definitions and data models

  • Tags, descriptions, and metadata

  • Request/response specifications

2. PDF Parser (pdfParser.ts)

Processes PDF documentation files using pdf-parse library:

  • Extracts full text content from PDF documents

  • Performs keyword-based text search

  • Provides document summaries and metadata

3. Text Chunker (textChunker.ts)

Implements intelligent document segmentation for vector embedding:

  • Recursive Character Splitting: Breaks text at natural boundaries (paragraphs, sentences, clauses)

  • Configurable Chunk Size: Default 1000 characters with 200 character overlap for context continuity

  • Metadata Preservation: Tracks source file, chunk index, section headers, and page estimates

  • Semantic Coherence: Maintains meaning by avoiding splits mid-sentence when possible

4. Vector Store (vectorStore.ts)

Manages semantic search capabilities using ChromaDB:

  • ChromaDB Integration: Optional connection to ChromaDB server for persistent vector storage

  • Local Embedding Provider: Built-in TF-IDF-like embedding generation when external APIs are unavailable

  • OpenAI Embedding Support: Configurable integration with OpenAI's embedding models (text-embedding-3-small, text-embedding-3-large)

  • Automatic Fallback: Uses in-memory vector storage when ChromaDB server is unavailable

  • Semantic Search: Natural language queries with relevance scoring and distance metrics

  • Metadata Filtering: Search within specific files or document sections

How Vector Store Works:

  1. PDF documents are split into chunks by the Text Chunker

  2. Each chunk is converted to a vector embedding (384-3072 dimensions depending on provider)

  3. Embeddings are stored in ChromaDB collection or in-memory fallback

  4. User queries are embedded using the same model

  5. Cosine similarity finds the most relevant chunks

  6. Results are ranked by relevance score (0.0 to 1.0)

5. Graph Store (graphStore.ts)

Manages graph database operations with Neo4j:

  • Neo4j Integration: Optional connection to Neo4j database for complex relationship queries

  • In-Memory Fallback: Complete graph implementation when Neo4j is unavailable

  • Connection Management: Handles driver lifecycle, sessions, and transactions

  • CRUD Operations: Create/read nodes and relationships with typed interfaces

  • Cypher Query Execution: Direct access to Neo4j's powerful query language

  • Statistics: Provides metrics on node counts, relationships, and graph density

Graph Node Types:

  • Specification: OpenAPI spec metadata (title, version, description)

  • Endpoint: API paths with HTTP methods

  • Schema: Data models and type definitions

  • Property: Schema fields with types and constraints

  • Parameter: Request parameters (query, header, path, cookie)

  • Response: HTTP response definitions with status codes

  • Tag: Endpoint categorization

Graph Relationship Types:

  • DEFINES_ENDPOINT: Specification β†’ Endpoint

  • DEFINES_SCHEMA: Specification β†’ Schema

  • HAS_PARAMETER: Endpoint β†’ Parameter

  • HAS_RESPONSE: Endpoint β†’ Response

  • USES_SCHEMA: Endpoint/Parameter/Response β†’ Schema

  • REFERENCES: Schema β†’ Schema (for $ref relationships)

  • HAS_PROPERTY: Schema β†’ Property

  • TAGGED_WITH: Endpoint β†’ Tag

6. Graph Indexer (graphIndexer.ts)

Transforms OpenAPI specifications into graph structures:

  • Specification Indexing: Creates nodes for each loaded specification file

  • Endpoint Extraction: Parses all API endpoints with full details

  • Schema Mapping: Extracts all data models and their properties

  • Relationship Building: Connects endpoints to schemas, parameters, and responses

  • Reference Resolution: Follows $ref pointers to build schema dependency graphs

  • Progress Tracking: Provides real-time feedback during indexing operations

  • Error Handling: Gracefully handles malformed specifications

Indexing Process:

  1. Load YAML files from yml_files/ directory

  2. Create Specification nodes for each file

  3. Extract and create Endpoint nodes

  4. Extract and create Schema nodes with properties

  5. Build relationships between all entities

  6. Index into Neo4j or in-memory store

7. Graph Models (graphModels.ts)

Defines TypeScript interfaces for type-safe graph operations:

  • Node interfaces (SpecificationNode, EndpointNode, SchemaNode, etc.)

  • Relationship type enums

  • Query result types (GraphTraversalResult, PatternSearchResult, etc.)

  • DTO types for creating nodes

  • Utility functions for ID generation and reference extraction

8. Specification Indexer (indexer.ts)

Orchestrates all components and provides unified API:

  • Coordinates YAML Parser, PDF Parser, Vector Store, and Graph Indexer

  • Manages initialization sequence and error handling

  • Provides high-level search and query methods

  • Handles fallback scenarios when optional services are unavailable

  • Aggregates statistics across all subsystems

Data Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   YAML Files    │────>β”‚  YAML Parser     │────>β”‚  Graph Indexer  β”‚
β”‚  (OpenAPI)      β”‚     β”‚                  β”‚     β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                            β”‚
                                                            v
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   PDF Files     │────>β”‚  PDF Parser +    │────>β”‚  Vector Store   β”‚
β”‚ (Documentation) β”‚     β”‚  Text Chunker    β”‚     β”‚   (ChromaDB)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                            β”‚
                                                            v
                                             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                             β”‚   Specification Indexer  β”‚
                                             β”‚   (Unified Interface)    β”‚
                                             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                          β”‚
                                                          v
                                             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                             β”‚     MCP Server Tools     β”‚
                                             β”‚  (24 Tools Available)    β”‚
                                             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Fallback Mechanisms

The server is designed to work in various deployment scenarios:

  1. Full Stack (ChromaDB + Neo4j):

    • All 24 tools available

    • Best performance and capabilities

    • Semantic search with persistent vectors

    • Complex graph queries with Cypher

  2. Vector Only (ChromaDB, no Neo4j):

    • 18 tools available (core + semantic search)

    • Graph queries use in-memory implementation

    • Good for semantic search focused use cases

  3. Graph Only (Neo4j, no ChromaDB):

    • 18 tools available (core + graph database)

    • Semantic search falls back to keyword matching

    • Good for relationship exploration use cases

  4. Minimal (no external databases):

    • 12 core tools available

    • All operations use in-memory storage

    • Keyword-based search only

    • Suitable for basic queries and development

The server automatically detects available services during initialization and adjusts its capabilities accordingly. Users are informed which features are available through statistics and status endpoints.

Available Tools

The server provides 24 MCP tools organized into three categories:

Core Tools (12 tools)

Search and Discovery

  • search_endpoints - Search for API endpoints across all specifications

    Example: "Find all payment endpoints"
  • search_schemas - Search for data schemas and models

    Example: "Find schemas related to transaction"
  • search_pdf_documentation - Search through PDF documentation using keyword matching

    Example: "Search for SCA requirements"
  • search_all - Comprehensive keyword search across all sources (endpoints, schemas, PDFs)

    Example: "Find everything about consent"

Endpoint Information

  • get_endpoint_details - Get detailed information about a specific endpoint

    Parameters: path, method
    Example: path="/v1/accounts", method="GET"
  • filter_endpoints_by_tag - Filter endpoints by tag

    Example: tag="accounts"
  • filter_endpoints_by_method - Filter endpoints by HTTP method

    Example: method="POST"

Schema Information

  • get_schema - Get a specific schema definition

    Parameters: schemaName, specFile (optional)
    Example: schemaName="AccountDetails"

Specification Management

  • list_specifications - List all available OpenAPI specifications

  • get_specification_details - Get comprehensive details about a specific spec

    Parameters: fileName
  • list_pdf_documents - List all available PDF documentation

  • get_statistics - Get basic statistics about loaded specifications

Semantic Search Tools (6 tools)

These tools use ChromaDB and vector embeddings for intelligent, context-aware document retrieval. When ChromaDB is unavailable, they automatically fall back to keyword-based search.

  • search_pdf_semantic - Perform semantic search across PDF documentation

    Parameters: query (string), topK (number, default: 10)
    Example: "What are the authentication requirements for payment initiation?"
    
    How it works:
    - Converts your natural language query into a vector embedding
    - Finds the most semantically similar document chunks
    - Returns results ranked by relevance score (0.0-1.0)
    - Understands synonyms and related concepts (e.g., "authenticate" matches "authorization")
  • search_pdf_semantic_filtered - Semantic search with metadata filters

    Parameters: query (string), fileName (optional), section (optional), topK (optional)
    Example: query="SCA exemptions", fileName="Implementation_Guide.pdf"
    
    Use cases:
    - Search within a specific document
    - Filter by document section
    - Narrow results to relevant portions
  • search_all_semantic - Comprehensive semantic search across all sources

    Parameters: query (string), topK (number, default: 10)
    Example: "How do I handle declined payments?"
    
    Returns:
    - Matching endpoints (keyword search)
    - Matching schemas (keyword search)
    - Semantically similar PDF content (vector search)
  • get_vector_store_stats - Get vector store statistics

    Returns:
    - enabled: Whether vector store is operational
    - totalChunks: Number of indexed document chunks
    - collectionName: ChromaDB collection name
    - isInMemory: Whether using in-memory fallback

Graph Database Tools (6 tools)

These tools use Neo4j for exploring complex relationships between specifications, endpoints, schemas, and data models. When Neo4j is unavailable, they use an in-memory graph implementation.

  • graph_find_related_schemas - Find schemas related through $ref references

    Parameters: schemaName (string), specFile (optional), maxDepth (number, default: 3)
    Example: schemaName="AccountReference"
    
    Use cases:
    - Understand schema inheritance hierarchies
    - Find all schemas that reference a particular type
    - Discover composed data models
    - Map schema dependencies
  • graph_get_endpoint_dependencies - Get all dependencies of an API endpoint

    Parameters: path (string), method (string), specFile (optional)
    Example: path="/v1/payments/sepa-credit-transfers", method="POST"
    
    Returns:
    - All request parameters (query, header, path, body)
    - Request body schema and nested schemas
    - All possible response codes and their schemas
    - Complete dependency tree
  • graph_traverse_relationships - Execute custom graph traversal with filters

    Parameters: 
      - startNodeType: Type of starting node (Specification, Endpoint, Schema, etc.)
      - startNodeFilter: Property filters (e.g., {name: "AccountReference"})
      - relationshipTypes: Optional list of relationship types to follow
      - maxDepth: Maximum traversal depth (default: 3)
    
    Example: 
      startNodeType="Schema", 
      startNodeFilter={name: "PaymentInitiation*"},
      relationshipTypes=["REFERENCES", "USES_SCHEMA"]
    
    Use cases:
    - Custom relationship exploration
    - Multi-hop dependency analysis
    - Pattern-based graph queries
  • graph_get_specification_graph - Get complete graph for a specification

    Parameters: fileName (string)
    Example: fileName="BG_oFA_PIS_Version_2.3_20251128.openapi.yaml"
    
    Returns:
    - All endpoints in the specification
    - All schemas and their properties
    - All relationships between entities
    - Complete specification structure as a graph
  • graph_search_by_pattern - Search graph nodes by property patterns

    Parameters: 
      - nodeType: Type of node to search
      - pattern: Property pattern with wildcards (e.g., {path: "/v1/accounts*"})
      - limit: Maximum results (default: 50)
    
    Example: 
      nodeType="Endpoint", 
      pattern={path: "/v1/payments/*", method: "POST"}
    
    Supports wildcards:
    - {name: "*Account*"} - Contains "Account"
    - {path: "/v1/accounts*"} - Starts with "/v1/accounts"
    - {method: "POST"} - Exact match
  • get_graph_store_stats - Get graph database statistics

    Returns:
    - enabled: Whether graph store is operational
    - usingNeo4j: Whether connected to Neo4j (true) or using in-memory (false)
    - Node counts by type (Specification, Endpoint, Schema, etc.)
    - Relationship counts by type
    - Indexing metrics (duration, errors)
    - Graph density metrics

Tool Selection Guide

Use Core Tools when:

  • You need exact endpoint paths or schema names

  • You want to filter by tags or HTTP methods

  • You're looking for specific specification details

Use Semantic Search Tools when:

  • You have natural language questions

  • You're exploring concepts across documentation

  • You don't know the exact terminology

  • You want AI-powered relevance ranking

Use Graph Database Tools when:

  • You need to understand relationships and dependencies

  • You're exploring schema inheritance

  • You want to analyze endpoint complexity

  • You need to traverse multi-level references

Usage Examples

In VS Code with GitHub Copilot

Basic Queries

You: "What endpoints are available for account information?"
Copilot: [Uses search_endpoints tool to find AIS endpoints]

You: "Show me the schema for payment initiation request"
Copilot: [Uses search_schemas tool to find payment schemas]

Semantic Search Queries

You: "How do I implement Strong Customer Authentication?"
Copilot: [Uses search_pdf_semantic to find relevant SCA documentation with AI ranking]

You: "What are the requirements for payment authorization?"
Copilot: [Uses search_all_semantic to find endpoints, schemas, and semantically related PDF content]

You: "Find information about transaction status in the PIS specification"
Copilot: [Uses search_pdf_semantic_filtered with fileName filter]

Graph Database Queries

You: "What schemas does AccountReference depend on?"
Copilot: [Uses graph_find_related_schemas to traverse schema relationships]

You: "Show me all dependencies for the payment initiation endpoint"
Copilot: [Uses graph_get_endpoint_dependencies to get parameters, request/response schemas]

You: "Find all endpoints that use the Amount schema"
Copilot: [Uses graph_traverse_relationships starting from Amount schema]

You: "Get the complete API structure for the AIS specification"
Copilot: [Uses graph_get_specification_graph to return full specification graph]

Advanced Analysis

You: "Compare the complexity of payment endpoints vs account endpoints"
Copilot: [Uses graph_get_endpoint_dependencies for multiple endpoints and compares]

You: "What are all the possible error responses for account endpoints?"
Copilot: [Uses graph_traverse_relationships to find all response schemas]

You: "Show me all schemas that contain PII (personally identifiable information)"
Copilot: [Uses search_pdf_semantic to find PII references, then graph_search_by_pattern to find related schemas]

Programmatic Usage

The server can also be used programmatically via the MCP protocol:

// Example tool call
{
  "method": "tools/call",
  "params": {
    "name": "search_endpoints",
    "arguments": {
      "query": "payment"
    }
  }
}

Project Structure

Berlin-group-mcp/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ index.ts              # Main MCP server with 24 tool definitions
β”‚   β”œβ”€β”€ indexer.ts            # Specification indexer orchestrating all components
β”‚   β”œβ”€β”€ yamlParser.ts         # OpenAPI YAML parser
β”‚   β”œβ”€β”€ pdfParser.ts          # PDF document parser
β”‚   β”œβ”€β”€ textChunker.ts        # Intelligent text chunking for vector embeddings
β”‚   β”œβ”€β”€ vectorStore.ts        # ChromaDB integration for semantic search
β”‚   β”œβ”€β”€ graphStore.ts         # Neo4j integration and in-memory graph store
β”‚   β”œβ”€β”€ graphIndexer.ts       # Graph database indexer
β”‚   └── graphModels.ts        # TypeScript interfaces for graph entities
β”œβ”€β”€ yml_files/                # Berlin Group OpenAPI specs (7 specifications)
β”‚   β”œβ”€β”€ BG_oFA_AIS_Version_2.3_20250818.openapi.yaml
β”‚   β”œβ”€β”€ BG_oFA_PIS_Version_2.3_20251128.openapi.yaml
β”‚   β”œβ”€β”€ BG_oFA_PIIS_Version_2.3_20250818.openapi.yaml
β”‚   β”œβ”€β”€ BG_oFA_BASK_Version_2.2_20251128.openapi.yaml
β”‚   β”œβ”€β”€ BG_oFA_Consent_Version_2.1_20251128.openapi.yaml
β”‚   β”œβ”€β”€ BG_oFA_dataDictionary_Version_2.2.6_20250818.openapi.yaml
β”‚   └── BG_oFA_PUSH_Version_2.2_20250818.openapi.yaml
β”œβ”€β”€ pdf_files/                # PDF documentation (implementation guides, frameworks)
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ unit/                 # Unit tests for individual components
β”‚   β”‚   β”œβ”€β”€ vectorStore.test.ts
β”‚   β”‚   β”œβ”€β”€ graphStore.test.ts
β”‚   β”‚   β”œβ”€β”€ graphIndexer.test.ts
β”‚   β”‚   β”œβ”€β”€ graphModels.test.ts
β”‚   β”‚   └── textChunker.test.ts
β”‚   └── integration/          # Integration tests
β”‚       β”œβ”€β”€ semanticSearch.test.ts
β”‚       └── graphSearch.test.ts
β”œβ”€β”€ build/                    # Compiled JavaScript (generated)
β”œβ”€β”€ docs/                     # Architecture documentation and diagrams
β”‚   └── architecture/
β”‚       └── diagrams/         # PlantUML diagrams for system architecture
β”œβ”€β”€ postman/                  # Postman collection for testing MCP tools
β”œβ”€β”€ package.json              # Dependencies: chromadb, neo4j-driver, pdf-parse, etc.
β”œβ”€β”€ tsconfig.json
β”œβ”€β”€ jest.config.js            # Test configuration
β”œβ”€β”€ .vscode/
β”‚   └── mcp-settings.json     # VS Code MCP configuration
β”œβ”€β”€ INTELLIJ_SETUP.md         # IntelliJ configuration guide
└── README.md

Development

Running Tests

The project includes comprehensive unit and integration tests:

# Run all tests
npm test

# Run tests in watch mode
npm run test:watch

# Run with coverage report
npm run test:coverage

Test Coverage:

  • Unit Tests: vectorStore.test.ts, graphStore.test.ts, graphIndexer.test.ts, graphModels.test.ts, textChunker.test.ts

  • Integration Tests: semanticSearch.test.ts, graphSearch.test.ts

Watch Mode

To automatically rebuild on file changes:

npm run watch

Debugging

To debug the server with Node.js inspector:

npm run inspector

Adding New Specifications

  1. Add YAML files to yml_files/ directory

  2. Add PDF files to pdf_files/ directory

  3. Rebuild the project: npm run build

  4. Restart the MCP server (reload VS Code or restart IDE)

  5. New specifications will be automatically indexed on next startup

Extending the Server

Adding a New Tool:

  1. Define tool schema in src/index.ts TOOLS array

  2. Add handler in CallToolRequestSchema handler

  3. Implement business logic in src/indexer.ts

  4. Update README documentation

Adding a New Embedding Provider:

  1. Implement EmbeddingProvider interface in src/vectorStore.ts

  2. Add embed() and embedQuery() methods

  3. Configure in src/indexer.ts or via environment variables

Customizing Graph Schema:

  1. Add new node types in src/graphModels.ts

  2. Add relationships in RelationshipType enum

  3. Update indexing logic in src/graphIndexer.ts

  4. Add query methods in src/graphStore.ts

Troubleshooting

Server Not Starting

  1. Check Node.js version: node --version (should be v18+)

  2. Verify build completed: ls -la build/ (should see .js files)

  3. Check for errors: Look in VS Code Developer Tools console (Help β†’ Toggle Developer Tools)

  4. Rebuild: npm run build

Tools Not Appearing

  1. Ensure MCP settings file exists: Check .vscode/mcp-settings.json

  2. Verify correct paths: Ensure the path to build/index.js is absolute and correct

  3. Restart VS Code completely: Close all windows and reopen

  4. Check GitHub Copilot: Ensure Copilot is enabled and working

  5. Check console logs: Open Developer Tools and look for MCP connection errors

  1. Verify YAML and PDF files exist:

    ls -la yml_files/ pdf_files/
  2. Check server logs: Look for initialization errors in console

  3. Ensure files are readable: Check file permissions

  4. Try reindexing: Delete and rebuild: rm -rf build && npm run build

Semantic Search Not Working

  1. Check if ChromaDB is running (optional):

    curl http://localhost:8000/api/v1/heartbeat
  2. Review initialization logs: Should see "Indexed X PDF chunks in vector store"

  3. Check fallback mode: Server will fall back to keyword search if ChromaDB unavailable

  4. Verify vector store stats: Use get_vector_store_stats tool

  5. Check ChromaDB logs (if running via Docker):

    docker logs <chromadb-container-id>

Graph Database Not Working

  1. Check if Neo4j is running (optional):

    curl http://localhost:7474
    # Or check Docker: docker ps | grep neo4j
  2. Verify credentials: Default is neo4j/neo4j (change on first login)

  3. Review initialization logs: Should see "Graph indexing complete: X specs, Y endpoints, Z schemas"

  4. Check fallback mode: Server will use in-memory graph if Neo4j unavailable

  5. Verify graph store stats: Use get_graph_store_stats tool

  6. Test Neo4j connection:

    # Using cypher-shell
    cypher-shell -u neo4j -p your-password

Performance Issues

  1. Large PDF files: Consider splitting into smaller documents

  2. ChromaDB slow:

    • Use local deployment instead of remote

    • Reduce topK parameter in semantic searches

    • Consider faster embedding provider

  3. Neo4j slow:

    • Check if indexes are created

    • Reduce maxDepth in graph traversals

    • Optimize Cypher queries

  4. Memory usage high:

    • Use external databases (ChromaDB + Neo4j) instead of in-memory

    • Reduce number of specifications loaded

Connection Errors

ChromaDB Connection Refused:

Error: connect ECONNREFUSED 127.0.0.1:8000

Solution: ChromaDB is not running or running on different port. Server will automatically fall back to in-memory mode.

Neo4j Connection Failed:

Neo4jError: Could not connect to bolt://localhost:7687

Solution: Neo4j is not running or wrong credentials. Server will automatically fall back to in-memory mode.

Permission Issues

# If index.js is not executable
chmod +x build/index.js

# If YAML/PDF files are not readable
chmod -R 644 yml_files/*.yaml pdf_files/*.pdf

Debugging Tips

  1. Enable verbose logging: Set NODE_ENV=development before starting server

  2. Check initialization sequence: Server logs show each phase

  3. Test individual components:

    npm test -- vectorStore.test.ts
    npm test -- graphStore.test.ts
  4. Verify tool availability: Use get_statistics, get_vector_store_stats, get_graph_store_stats tools

  5. Check MCP communication: Look for JSON-RPC messages in developer console

Common Error Messages

Error

Cause

Solution

"Specifications not yet loaded"

Server still initializing

Wait 5-10 seconds and retry

"Semantic search is not available"

ChromaDB not connected

Normal, falls back to keyword search

"Graph store is not available"

Neo4j not connected

Normal, falls back to in-memory

"Collection not found"

ChromaDB collection missing

Server creates it automatically on startup

"Authentication failed"

Wrong Neo4j credentials

Update credentials in code or use default

Technical Details

MCP Protocol

This server implements the Model Context Protocol specification (2025-11-25):

  • Tools: 24 tools organized into core, semantic search, and graph database categories

  • Resources: Direct access to specification files via berlin-group:// URI scheme

  • Transport: stdio-based communication for IDE integration

Component Architecture

Parser Features

  • YAML Parser:

    • Extracts paths, operations, schemas, components from OpenAPI 3.0+ specs

    • Handles $ref pointer resolution

    • Validates specification structure

    • Indexes tags, parameters, and responses

  • PDF Parser:

    • Uses pdf-parse library for text extraction

    • Preserves document structure and metadata

    • Enables full-text keyword search

    • Provides page number estimation

  • Text Chunker:

    • Recursive character splitting algorithm

    • Configurable chunk size (default: 1000 chars) and overlap (default: 200 chars)

    • Maintains semantic coherence across chunks

    • Preserves metadata (file name, section, page number)

Vector Store Implementation

  • ChromaDB Integration:

    • HTTP client connection to ChromaDB server

    • Collection-based document organization

    • Metadata filtering support

    • Cosine similarity for relevance scoring

  • Embedding Providers:

    • LocalEmbeddingProvider: TF-IDF-based, 384 dimensions, no external dependencies

    • OpenAIEmbeddingProvider: GPT-based, 1536 or 3072 dimensions, requires API key

    • Pluggable architecture for custom providers

  • Search Algorithms:

    • Query embedding generation

    • K-nearest neighbors (KNN) search

    • Distance metrics (cosine similarity, L2 distance)

    • Relevance score normalization (0.0 to 1.0)

Graph Store Implementation

  • Neo4j Integration:

    • Bolt protocol driver (neo4j-driver v5.27.0)

    • Connection pooling for performance

    • Transaction management

    • Cypher query execution

  • Graph Schema:

    Nodes: Specification, Endpoint, Schema, Property, Parameter, Response, Tag
    Relationships: DEFINES_ENDPOINT, DEFINES_SCHEMA, HAS_PARAMETER, 
                   HAS_RESPONSE, USES_SCHEMA, REFERENCES, HAS_PROPERTY, TAGGED_WITH
  • In-Memory Fallback:

    • Complete graph implementation using Maps

    • Same API as Neo4j implementation

    • Supports all query patterns

    • Suitable for development and testing

Indexing Process

  1. Initialization (parallel):

    • Load YAML files β†’ Parse specifications β†’ Extract endpoints/schemas

    • Load PDF files β†’ Parse documents β†’ Chunk text β†’ Generate embeddings

  2. Vector Store Indexing:

    • Chunk all PDF documents (typical: 200-500 chunks per document)

    • Generate embeddings for each chunk

    • Store in ChromaDB with metadata

    • Build search index

  3. Graph Store Indexing:

    • Create Specification nodes

    • Create Endpoint nodes with relationships

    • Create Schema nodes with properties

    • Create Parameter and Response nodes

    • Build REFERENCES relationships for $ref pointers

    • Create Tag nodes and relationships

  4. Error Handling:

    • Graceful degradation if databases unavailable

    • Detailed logging of indexing progress

    • Error collection without stopping process

    • Fallback to in-memory storage

Performance

  • Initial Load Time:

    • YAML parsing: ~500ms (7 specifications)

    • PDF parsing: ~1-2s (depends on file count/size)

    • Vector indexing: ~2-5s (depends on chunk count and embedding provider)

    • Graph indexing: ~1-3s (depends on database connection)

    • Total: ~5-10 seconds for full initialization

  • Query Performance:

    • Keyword search: <10ms (in-memory search)

    • Semantic search: 50-200ms (depends on ChromaDB response time and top-k)

    • Graph queries: 10-100ms (simple queries), 100-500ms (complex traversals)

    • In-memory fallback: <50ms for most operations

  • Memory Usage:

    • Base (specifications): ~20-30MB

    • Vector store (in-memory): +30-50MB

    • Graph store (in-memory): +20-40MB

    • Total: ~70-120MB (without external databases)

    • With external databases: ~30-50MB (stores data externally)

  • Scalability:

    • Can handle 100+ specifications

    • Supports 1000+ PDF pages

    • Graph queries scale with Neo4j (millions of nodes)

    • Vector search scales with ChromaDB (millions of chunks)

Quick Reference

Tool Categories Summary

Category

Count

Purpose

Requires

Core Tools

12

Basic search, filtering, specification access

None (built-in)

Semantic Search

6

AI-powered document retrieval, natural language queries

ChromaDB (optional)

Graph Database

6

Relationship exploration, dependency analysis

Neo4j (optional)

Total

24

Complete specification analysis toolkit

Node.js only

Key Features Comparison

Feature

Without Databases

With ChromaDB

With Neo4j

With Both

Endpoint Search

βœ… Keyword

βœ… Keyword

βœ… Keyword

βœ… Keyword

Schema Search

βœ… Keyword

βœ… Keyword

βœ… Keyword

βœ… Keyword

PDF Search

βœ… Keyword

βœ… Semantic + Keyword

βœ… Keyword

βœ… Semantic + Keyword

Schema Relationships

βœ… In-memory

βœ… In-memory

βœ… Neo4j Graph

βœ… Neo4j Graph

Endpoint Dependencies

βœ… In-memory

βœ… In-memory

βœ… Neo4j Graph

βœ… Neo4j Graph

Graph Traversal

βœ… Limited

βœ… Limited

βœ… Full Cypher

βœ… Full Cypher

Performance

Good

Excellent (PDF)

Excellent (Graph)

Excellent (Both)

Memory Usage

~120MB

~70MB

~80MB

~50MB

Common Queries Cheat Sheet

// Find endpoints
"search for payment endpoints"
β†’ Uses: search_endpoints

// Find schemas
"show me the AccountDetails schema"
β†’ Uses: get_schema or search_schemas

// Natural language search (semantic)
"how to handle authentication errors?"
β†’ Uses: search_pdf_semantic

// Find related schemas
"what schemas does PaymentInitiation reference?"
β†’ Uses: graph_find_related_schemas

// Analyze endpoint
"what are all the parameters and responses for POST /v1/payments?"
β†’ Uses: graph_get_endpoint_dependencies

// Explore relationships
"show me all schemas that use Address type"
β†’ Uses: graph_traverse_relationships

// Get overview
"show me statistics about the loaded specifications"
β†’ Uses: get_statistics, get_vector_store_stats, get_graph_store_stats

License

MIT

References

Contributing

Contributions are welcome! Please ensure:

  • TypeScript code follows project conventions

  • All tools have proper error handling

  • Documentation is updated for new features

  • Tests are added for new components (unit tests in tests/unit/, integration tests in tests/integration/)

  • New embedding providers implement the EmbeddingProvider interface

  • New graph node types are added to graphModels.ts

  • README is updated with examples and usage instructions

Areas for Contribution

  • Additional embedding providers (Cohere, HuggingFace, etc.)

  • Enhanced graph query capabilities

  • Additional Berlin Group specifications

  • Performance optimizations

  • Additional MCP tools

  • Documentation improvements

Support

For issues or questions:

  1. Check the troubleshooting section

  2. Review MCP documentation

  3. Check Berlin Group specification documentation

F
license - not found
-
quality - not tested
F
maintenance

Maintenance

–Maintainers
–Response time
–Release cycle
–Releases (12mo)
Commit activity
Issues opened vs closed

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Borelli-7/bg-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server