Provides a user-friendly web interface for the knowledge graph builder with dual JSON/SVG output visualization
Stores knowledge graph data as nodes with properties and relationships, enabling graph database persistence with enhanced metadata, lineage tracking, and visualization metadata
Uses local Ollama models for entity extraction and knowledge graph generation, supporting various models like llama3.2, mistral, codellama, and phi3
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Knowledge Graph Builderanalyze this article about AI ethics and create a knowledge graph"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Knowledge Graph Builder MCP Server
A Knowledge Graph Builder that transforms text or web content into structured knowledge graphs using local AI models with MCP (Model Context Protocol) integration for persistent storage in Neo4j and Qdrant.
π Features
Local AI Processing: Uses local models via Ollama or LM Studio for entity extraction
Large Content Support: Handles arbitrarily large content (300MB+) via intelligent chunking
Web Content Extraction: Scrapes and analyzes full web pages without size limits
Knowledge Graph Generation: Creates structured graphs with entities and relationships
Smart Chunking: Automatically chunks large content with sentence boundary detection
Entity Merging: Intelligently merges duplicate entities across chunks
Real-Time Visualization: Live SVG graph updates as chunks are processed
Interactive SVG Output: Color-coded entity types with progress tracking
MCP Integration: Stores data in Neo4j (graph database) and Qdrant (vector database)
UUID Tracking: Generates UUIDv8 for unified entity tracking across systems
Gradio Interface: User-friendly web interface with dual JSON/SVG output
Related MCP server: Neo4j MCP Server
π Entity Types Extracted
π₯ PERSON: Names, individuals, key figures
π’ ORGANIZATION: Companies, institutions, groups
π LOCATION: Places, countries, regions, addresses
π‘ CONCEPT: Ideas, technologies, abstract concepts
π EVENT: Specific events, occurrences, incidents
π§ OTHER: Miscellaneous entities not fitting other categories
π§ Setup
Requirements
pip install -r requirements.txt
# For full visualization capabilities:
pip install networkx matplotlibEnvironment Variables
For detailed configuration instructions and complete environment variables reference, see the Configuration section below.
Quick Start Configuration:
# Basic setup (uses sensible defaults)
export MODEL_PROVIDER=ollama
export LOCAL_MODEL=llama3.2:latest
# Optional: Custom endpoints and processing limits
export OLLAMA_BASE_URL=http://localhost:11434
export CHUNK_SIZE=2000
export MAX_CHUNKS=0Note: All environment variables are optional and have sensible defaults. The application will run without any configuration.
Local Model Setup
For Ollama:
# Install and start Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve
# Pull a model
ollama pull llama3.2:latestFor LM Studio:
Download and install LM Studio
Load a model in the local server
Start the local server on port 1234
π Running the Application
python app.pyThe application will launch a Gradio interface with MCP server capabilities enabled.
π Usage
Text Input
Paste any text content to analyze:
Apple Inc. was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne in 1976. The company is headquartered in Cupertino, California.URL Input
Provide a web URL to extract and analyze:
https://en.wikipedia.org/wiki/Artificial_intelligenceLarge Content Processing (300MB+ Files)
For very large content like LLM conversation extracts:
# Example: Processing a 300MB conversation log
# The system will automatically:
# 1. Detect large content (>2000 chars by default)
# 2. Split into intelligent chunks at sentence boundaries
# 3. Process each chunk with the local AI model
# 4. Merge and deduplicate entities/relationships
# 5. Store with full lineage tracking in hKG
# Processing will show progress:
# "Processing large content (314,572,800 chars) in chunks..."
# "Processing 157,286 chunks..."
# "Processing chunk 1/157,286 (2000 chars)..."
# "Merged results: 45,231 entities, 128,904 relationships"Output Format
The system returns a structured JSON knowledge graph:
{
"source": {
"type": "text|url",
"value": "input_value",
"content_preview": "first 200 characters..."
},
"knowledge_graph": {
"entities": [
{
"name": "Apple Inc.",
"type": "ORGANIZATION",
"description": "Technology company founded in 1976"
}
],
"relationships": [
{
"source": "Steve Jobs",
"target": "Apple Inc.",
"relationship": "FOUNDED",
"description": "Steve Jobs founded Apple Inc."
}
],
"entity_count": 5,
"relationship_count": 4
},
"visualization": {
"svg_content": "<svg>...</svg>",
"svg_file_path": "/path/to/knowledge_graph_12345678.svg",
"visualization_available": true,
"real_time_updates": false,
"incremental_files_saved": 0,
"entity_color_mapping": {
"ORGANIZATION": "#4ECDC4",
"PERSON": "#FF6B6B"
},
"svg_generation_timestamp": "2024-01-15T10:30:05Z",
"visualization_engine": "networkx+matplotlib"
},
"metadata": {
"model": "ollama:llama3.2:latest",
"content_length": 150,
"uuid": "xxxxxxxx-xxxx-8xxx-xxxx-xxxxxxxxxxxx",
"neo4j_stored": true,
"qdrant_stored": true,
"timestamp": "2024-01-15T10:30:00Z",
"hkg_metadata": {
"processing_method": "single",
"chunk_count": 1,
"chunk_size": 2000,
"chunk_overlap": 200,
"source_type": "text",
"supports_large_content": true,
"max_content_size": "unlimited",
"visualization_integration": {
"real_time_visualization": false,
"svg_files_generated": 1,
"entity_color_tracking": true,
"visualization_lineage": true,
"incremental_updates": false,
"neo4j_viz_metadata": true,
"qdrant_viz_metadata": true
}
}
}
}π¨ Real-Time Graph Visualization
SVG Generation Features
Color-Coded Entity Types: Each entity type has a distinct color (Person=Red, Organization=Teal, Location=Blue, Concept=Green, Event=Yellow, Other=Plum)
Interactive Layout: Automatic graph layout using NetworkX spring layout algorithm
Relationship Labels: Edge labels showing relationship types between entities
Entity Information: Node labels with entity names and types
Legend: Automatic legend generation based on entity types present
Statistics: Real-time entity and relationship counts
Real-Time Processing for Large Content
Progress Tracking: Visual progress bar showing chunk processing completion
Incremental Updates: Graph updates after each chunk is processed
Live Statistics: Running totals of entities and relationships discovered
Incremental File Saves: Each chunk creates a timestamped SVG file
Final Visualization: Complete graph saved as final SVG
File Output
Single Content:
knowledge_graph_<uuid8>.svgLarge Content (Chunked):
Incremental:
knowledge_graph_<uuid8>_chunk_0001.svg,chunk_0002.svg, etc.Final:
knowledge_graph_<uuid8>.svg
Example Large Content Processing
# Processing a 300MB conversation log:
# "Processing large content (314,572,800 chars) in chunks..."
# "Processing 157,286 chunks..."
#
# Real-time updates:
# "Processing chunk 1/157,286 (2000 chars)..."
# "Real-time graph updated: Updated graph: 5 entities, 3 relationships (Chunk 1/157,286)"
# "Saved incremental graph: knowledge_graph_12345678_chunk_0001.svg"
#
# "Processing chunk 2/157,286 (2000 chars)..."
# "Real-time graph updated: Updated graph: 12 entities, 8 relationships (Chunk 2/157,286)"
# "Saved incremental graph: knowledge_graph_12345678_chunk_0002.svg"
#
# ... continues for all chunks ...
#
# "Final results: 45,231 entities, 128,904 relationships"
# "Final SVG visualization saved: knowledge_graph_12345678.svg"ποΈ hKG (Hybrid Knowledge Graph) Storage with Visualization Integration
Neo4j Integration (Graph Database)
Stores entities as nodes with properties and enhanced metadata
Creates relationships between entities with lineage tracking
Maintains UUIDv8 for entity tracking across all databases
Tracks chunking metadata for large content processing
Records processing method (single vs chunked)
NEW: Visualization metadata in entity observations including:
SVG file paths and availability status
Entity color mappings for graph visualization
Real-time update tracking for chunked processing
Incremental file counts for large content processing
Accessible via MCP server tools
Qdrant Integration (Vector Database)
Stores knowledge graphs as vector embeddings with enhanced metadata
Enables semantic search across graphs of any size
Maintains metadata for each knowledge graph including chunk information
Tracks content length, processing method, and chunk count
Supports similarity search across large document collections
NEW: Visualization lineage tracking including:
Entity type and color mapping information
SVG generation timestamps and file paths
Real-time visualization update history
Incremental SVG file tracking for large content
Accessible via MCP server tools
hKG Unified Tracking with Visualization Lineage
UUIDv8 Across All Systems: Common ancestry-encoded identifiers
Content Lineage: Track how large content was processed and chunked
Processing Metadata: Record chunk size, overlap, and processing method
Entity Provenance: Track which chunks contributed to each entity
Relationship Mapping: Maintain relationships across chunk boundaries
Semantic Coherence: Ensure knowledge graph consistency across databases
NEW - Visualization Lineage: Complete tracking of visual representation:
SVG File Provenance: Track all generated visualization files
Color Mapping Consistency: Maintain entity color assignments across chunks
Real-Time Update History: Log all incremental visualization updates
Cross-Database Visual Metadata: Synchronized visualization tracking in both Neo4j and Qdrant
Incremental Visualization Tracking: Complete audit trail of real-time graph updates
π§ Architecture
Core Components
app.py: Main application file with Gradio interfaceextract_text_from_url(): Web scraping functionality (app.py:41)chunk_text(): Smart content chunking with sentence boundary detection (app.py:214)merge_extraction_results(): Intelligent merging of chunk results (app.py:250)get_entity_color(): Entity type color mapping (app.py:299)create_knowledge_graph_svg(): SVG graph generation (app.py:311)RealTimeGraphVisualizer: Real-time incremental visualization (app.py:453)extract_entities_and_relationships(): AI-powered entity extraction with real-time updates (app.py:645)extract_entities_and_relationships_single(): Single chunk processing (app.py:722)build_knowledge_graph(): Main orchestration function with visualization (app.py:795)generate_uuidv8(): UUID generation for entity tracking (app.py:68)
Data Flow with hKG Integration and Real-Time Visualization
Input Processing: Text or URL input validation
Content Extraction: Web scraping for URLs, direct text for text input
Real-Time Visualizer Setup: Initialize incremental graph visualization system
Content Chunking: Smart chunking for large content (>2000 chars) with sentence boundary detection
AI Analysis with Live Updates: Local model processes each chunk for entities/relationships
Incremental Visualization: Real-time SVG graph updates after each chunk completion
Result Merging: Intelligent deduplication and merging of entities/relationships across chunks
hKG Metadata Creation: Generate processing metadata for lineage tracking
Graph Generation: Structured knowledge graph creation with enhanced metadata
Final Visualization: Generate complete SVG graph with all entities and relationships
hKG Storage: Persistence in Neo4j (graph) and Qdrant (vector) with unified UUIDv8 tracking
Output: JSON response with complete knowledge graph, hKG metadata, and SVG visualization
ποΈ Configuration
Environment Variables Reference
All configuration is handled through environment variables. The application provides sensible defaults for all settings, allowing it to run without any configuration while still offering full customization.
Complete Environment Variables Table
Variable | Type | Default | Required | Description | Example Values |
| string |
| No | AI model provider to use |
|
| string |
| No | Local model identifier |
|
| string |
| No | Ollama API endpoint |
|
| string |
| No | LM Studio API endpoint |
|
| integer |
| No | Characters per chunk for AI processing |
|
| integer |
| No | Overlap between chunks for context |
|
| integer |
| No | Maximum chunks to process (0=unlimited) |
|
| string |
| No | HuggingFace API token (legacy, unused) |
|
Configuration Methods
1. Environment Variables (Recommended)
# Core Model Configuration
export MODEL_PROVIDER=ollama
export LOCAL_MODEL=llama3.2:latest
export OLLAMA_BASE_URL=http://localhost:11434
# Large Content Processing
export CHUNK_SIZE=2000
export CHUNK_OVERLAP=200
export MAX_CHUNKS=02. Shell Configuration (.bashrc/.zshrc)
# Add to ~/.bashrc or ~/.zshrc
export MODEL_PROVIDER=ollama
export LOCAL_MODEL=llama3.2:latest
export OLLAMA_BASE_URL=http://localhost:11434
export CHUNK_SIZE=2000
export CHUNK_OVERLAP=200
export MAX_CHUNKS=03. Python Environment File (.env)
# Create .env file in project root
MODEL_PROVIDER=ollama
LOCAL_MODEL=llama3.2:latest
OLLAMA_BASE_URL=http://localhost:11434
LMSTUDIO_BASE_URL=http://localhost:1234
CHUNK_SIZE=2000
CHUNK_OVERLAP=200
MAX_CHUNKS=0Model Provider Configuration
Ollama Configuration (Default)
# Basic Ollama setup
export MODEL_PROVIDER=ollama
export LOCAL_MODEL=llama3.2:latest
export OLLAMA_BASE_URL=http://localhost:11434
# Alternative models
export LOCAL_MODEL=mistral:7b # Mistral 7B
export LOCAL_MODEL=codellama:13b # Code Llama 13B
export LOCAL_MODEL=llama3.2:3b # Llama 3.2 3B (faster)
export LOCAL_MODEL=phi3:mini # Phi-3 Mini (lightweight)
# Remote Ollama instance
export OLLAMA_BASE_URL=http://192.168.1.100:11434LM Studio Configuration
# Basic LM Studio setup
export MODEL_PROVIDER=lmstudio
export LOCAL_MODEL=any-model-name # Model name is flexible for LM Studio
export LMSTUDIO_BASE_URL=http://localhost:1234
# Custom LM Studio port
export LMSTUDIO_BASE_URL=http://localhost:8080
# Remote LM Studio instance
export LMSTUDIO_BASE_URL=http://192.168.1.200:1234Large Content Processing Configuration
Chunk Size Optimization
# Small chunks (faster processing, more chunks)
export CHUNK_SIZE=1000
export CHUNK_OVERLAP=100
# Medium chunks (balanced performance)
export CHUNK_SIZE=2000 # Default
export CHUNK_OVERLAP=200 # Default
# Large chunks (fewer chunks, more context)
export CHUNK_SIZE=4000
export CHUNK_OVERLAP=400
# Very large chunks (maximum context, slower)
export CHUNK_SIZE=8000
export CHUNK_OVERLAP=800Processing Limits
# Unlimited processing (default)
export MAX_CHUNKS=0
# Process only first 100 chunks (testing)
export MAX_CHUNKS=100
# Process first 1000 chunks (moderate datasets)
export MAX_CHUNKS=1000
# Process first 10000 chunks (large datasets)
export MAX_CHUNKS=10000Performance Tuning Guidelines
For Speed Optimization
# Smaller chunks, less overlap, limited processing
export CHUNK_SIZE=1000
export CHUNK_OVERLAP=50
export MAX_CHUNKS=500
export LOCAL_MODEL=llama3.2:3b # Faster modelFor Quality Optimization
# Larger chunks, more overlap, unlimited processing
export CHUNK_SIZE=4000
export CHUNK_OVERLAP=400
export MAX_CHUNKS=0
export LOCAL_MODEL=llama3.2:latest # Full modelFor Memory-Constrained Systems
# Balanced settings for limited resources
export CHUNK_SIZE=1500
export CHUNK_OVERLAP=150
export MAX_CHUNKS=1000
export LOCAL_MODEL=phi3:mini # Lightweight modelConfiguration Validation
The application performs automatic validation of configuration settings:
Model Provider: Validates
MODEL_PROVIDERis either"ollama"or"lmstudio"URLs: Validates that provider URLs are accessible
Numeric Values: Ensures
CHUNK_SIZE,CHUNK_OVERLAP, andMAX_CHUNKSare valid integersModel Availability: Checks if the specified model is available on the provider
Configuration Troubleshooting
Common Issues and Solutions
1. Model Provider Not Responding
# Check if Ollama is running
curl http://localhost:11434/api/version
# Check if LM Studio is running
curl http://localhost:1234/v1/models
# Solution: Start the appropriate service
ollama serve # For Ollama
# Or start LM Studio GUI and enable local server2. Model Not Found
# List available Ollama models
ollama list
# Pull missing model
ollama pull llama3.2:latest
# For LM Studio: Load model in GUI3. Memory Issues with Large Content
# Reduce chunk size and set limits
export CHUNK_SIZE=1000
export MAX_CHUNKS=100
# Use lighter model
export LOCAL_MODEL=llama3.2:3b4. Slow Processing
# Optimize for speed
export CHUNK_SIZE=1500
export CHUNK_OVERLAP=100
export MAX_CHUNKS=500
export LOCAL_MODEL=phi3:miniExample Configuration Scenarios
Scenario 1: Development Setup
# Fast iteration, limited processing
export MODEL_PROVIDER=ollama
export LOCAL_MODEL=llama3.2:3b
export CHUNK_SIZE=1000
export CHUNK_OVERLAP=100
export MAX_CHUNKS=50Scenario 2: Production Setup
# High quality, unlimited processing
export MODEL_PROVIDER=ollama
export LOCAL_MODEL=llama3.2:latest
export CHUNK_SIZE=3000
export CHUNK_OVERLAP=300
export MAX_CHUNKS=0Scenario 3: Large Dataset Processing
# Optimized for 300MB+ files
export MODEL_PROVIDER=ollama
export LOCAL_MODEL=llama3.2:latest
export CHUNK_SIZE=2000
export CHUNK_OVERLAP=200
export MAX_CHUNKS=0Scenario 4: Resource-Constrained Environment
# Minimal resource usage
export MODEL_PROVIDER=ollama
export LOCAL_MODEL=phi3:mini
export CHUNK_SIZE=800
export CHUNK_OVERLAP=50
export MAX_CHUNKS=200Advanced Configuration
Custom Model Endpoints
# Docker-based Ollama
export OLLAMA_BASE_URL=http://ollama-container:11434
# Kubernetes service
export OLLAMA_BASE_URL=http://ollama-service.default.svc.cluster.local:11434
# Load balancer
export OLLAMA_BASE_URL=http://ollama-lb.example.com:11434Dynamic Configuration
The application reads environment variables at startup. To change configuration:
Set new environment variables
Restart the application
Configuration changes take effect immediately
Error Handling
Comprehensive error handling for:
Invalid URLs or network failures
Missing local models or API endpoints
JSON parsing errors from LLM responses
Malformed or empty inputs
Database connection issues
Invalid configuration values
Model provider connectivity issues
Memory constraints during large content processing
π hKG MCP Integration with Visual Lineage
The application integrates with MCP servers for hybrid knowledge graph storage with complete visualization tracking:
Neo4j: Graph database storage and querying with enhanced metadata + visualization lineage
Qdrant: Vector database for semantic search with chunk tracking + visual metadata
Unified Tracking: UUIDv8 across all storage systems for entity lineage + visualization provenance
Metadata Persistence: Processing method, chunk count, content lineage + SVG generation tracking
Large Content Support: Seamless handling of 300MB+ content via chunking + real-time visualization
Visualization Integration: Complete visual representation tracking across all storage systems
Enhanced hKG Features via MCP
Entity Provenance: Track which content chunks contributed to each entity + their visual representation
Relationship Lineage: Maintain relationships across chunk boundaries + visual edge tracking
Content Ancestry: UUIDv8 encoding for hierarchical content tracking + visualization file lineage
Processing Audit: Complete record of how large content was processed + visualization generation
Semantic Search: Vector similarity across knowledge graphs of any size + visual metadata search
NEW - Visual Lineage: Complete visualization tracking including:
SVG File Provenance: Track all generated visualization files with timestamps
Entity Color Consistency: Maintain color mappings across all chunks and storage systems
Real-Time Visualization History: Log every incremental graph update during processing
Cross-Database Visual Sync: Synchronized visualization metadata in Neo4j and Qdrant
Incremental Visualization Audit: Complete trail of real-time updates for large content
Visualization-Enhanced Storage
Neo4j Entity Observations now include:
SVG file paths and generation status
Entity color assignments for visual consistency
Real-time update counts for chunked processing
Visualization availability and engine information
Qdrant Vector Content now includes:
Entity color mapping information for similarity search
SVG generation timestamps and file paths
Real-time visualization update metadata
Incremental file tracking for large content visualization
MCP tools are automatically available when running in Claude Code environment with MCP servers configured.
π― hKG Visualization Architecture
Integrated Visualization Lineage System
The hKG system now maintains complete visualization lineage alongside traditional knowledge graph storage:
βββββββββββββββββββ ββββββββββββββββββββββββ βββββββββββββββββββββββ
β Source Text βββββΆβ Chunking + AI βββββΆβ Entity/Relation β
β (300MB+) β β Processing β β Extraction β
βββββββββββββββββββ ββββββββββββββββββββββββ βββββββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββ ββββββββββββββββββββββββ βββββββββββββββββββββββ
β Real-Time SVG ββββββ Incremental Graph ββββββ Merged Results β
β Generation β β Visualization β β + Deduplication β
βββββββββββββββββββ ββββββββββββββββββββββββ βββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ ββββββββββββββββββββββββ βββββββββββββββββββββββ
β SVG File β β Visualization β β hKG Storage β
β Storage β β Metadata Creation β β (Neo4j + Qdrant) β
β (Incremental) β β β β + Viz Metadata β
βββββββββββββββββββ ββββββββββββββββββββββββ βββββββββββββββββββββββVisualization Metadata Flow
Real-Time Updates: Each chunk generates incremental SVG with progress tracking
Color Consistency: Entity colors maintained across all chunks and storage systems
File Lineage: Complete audit trail of all generated SVG files
Cross-Database Sync: Visualization metadata synchronized in both Neo4j and Qdrant
Provenance Tracking: Link between source chunks, entities, and their visual representation
hKG Benefits for Large Content (300MB+)
Visual Progress Monitoring: Real-time graph evolution during processing
Chunk-Level Visualization: Individual SVG files for each processing stage
Complete Audit Trail: Full lineage from source text to final visualization
Cross-Reference Capability: Link entities back to their source chunks and visual appearance
Scalable Visualization: Handles arbitrarily large graphs with consistent performance
π Development
Project Structure
KGB-mcp/
βββ app.py # Main application
βββ requirements.txt # Dependencies
βββ CLAUDE.md # Claude Code instructions
βββ ARCHITECTURE.md # System architecture
βββ test_core.py # Core functionality tests
βββ test_integration.py # Integration testsTesting
# Run core tests
python test_core.py
# Run integration tests
python test_integration.pyTransform any content into structured knowledge graphs with the power of local AI and MCP integration!