Enables automatic deployment of file ingestion jobs with ConfigMaps for processing multi-format files into vector embeddings, with support for persistent volume claims and container orchestration.
Provides vector database integration for storing and managing document embeddings, with automatic collection creation, folder-based organization, and semantic search capabilities.
Provides optional AI features for enhanced storage management and file processing capabilities.
MCP 1.5 - Advanced Hammerspace Storage Management with AI
A production-ready Model Context Protocol (MCP) server for Hammerspace storage management with natural language interface, automatic file ingestion, and comprehensive monitoring capabilities.
🚀 Quick Start
Prerequisites
Python 3.8 or higher
Linux system (Ubuntu 20.04+ recommended)
Hammerspace storage cluster (HSTK CLI installed)
Anthropic API key (for natural language UI)
NVIDIA API key (optional, for AI features)
Kubernetes cluster (for automatic file ingestion)
Milvus vector database (for document embeddings)
Installation
Clone the repository:
git clone https://github.com/mbloomhammerspace/mcp-1.5.git cd mcp-1.5Create virtual environment:
python3 -m venv venv source venv/bin/activate pip install -r requirements.txtConfigure environment variables:
cp .env.example .env nano .envSet the following:
# Hammerspace Configuration HS_ANVIL=10.200.120.90 # Anthropic API (for Web UI) ANTHROPIC_API_KEY=your_anthropic_key_here # NVIDIA API (optional) NVIDIA_API_KEY=your_nvidia_key_hereStart all services with one command:
./start_all_services.sh startAccess Web UI at:
http://localhost:5000
📋 Features
🤖 Natural Language Interface
Web-based UI powered by Claude AI at
http://localhost:5000Direct MCP integration for all Hammerspace operations
Real-time debug log viewer at
/debugFile ingest monitor dashboard at
/monitorAction-oriented responses - executes commands, reports results
📁 Advanced File Ingestion System
Multi-format support - BMP, DOCX, HTML, JPEG, JSON, MD, PDF, PNG, PPTX, SH, TIFF, TXT, MP3
Real-time file monitoring - Polling-based detection on NFS 4.2 mounts
Folder-based processing - New folders trigger batch processing of all supported files
Kubernetes job deployment - Automatic file ingestion jobs with ConfigMaps
Milvus vector database integration - Automatic embedding generation and storage
Intelligent collection naming - Collections named after folders or sequential numbering
Tag-based tier management - Automatic tier0 promotion for embedding files
Event streaming - Real-time monitoring of ingestion pipeline
📊 Comprehensive Monitoring UI
Real-time event streaming - Watch files being ingested live
Event filtering - Filter by event type, file pattern, or timestamp
Toast notifications - Get notified when new files arrive
Status dashboard - Monitor service health, file counts, CPU usage
Interactive UI - Beautiful, modern interface for tracking ingestion pipelines
Multi-service monitoring - Track all MCP servers and services
🔧 Persistent Service Management
Unified startup script - Start all services with one command
Screen session persistence - Services survive SSH disconnections
Auto-restart capability - Services automatically restart if they crash
Systemd integration - Optional auto-start on boot
Comprehensive logging - Detailed logs for all services
Health monitoring - Real-time service status and port monitoring
API Endpoints
/api/monitor/status- Get monitor service status/api/monitor/events- Query ingestion events with filters/api/monitor/events/stream- Server-Sent Events for real-time updates/api/chat- Natural language chat interface/api/tools- List available MCP tools/api/logs/stream- Stream debug logs in real-time
Core MCP Tools
All tools use real HSTK CLI commands (no mock data):
tag_directory_recursive- Tag files and directories recursivelycheck_tagged_files_alignment- Check if tagged files are aligned to their objectivesapply_objective_to_path- Apply Hammerspace objectives (e.g., "Place-on-tier0", "placeonvolumes")remove_objective_from_path- Remove objectives from pathslist_objectives_for_path- List all objectives applied to a pathingest_new_files- Find new files by ctime/mtime, tag them, and place on Tier 1refresh_mounts- Refresh NFS mounts to resolve stale file handles
Advanced Features
Automatic file monitoring - Polling service that auto-tags new files with MD5 hash and MIME type
Automatic stale file handle recovery - Detects and automatically refreshes mounts
Share-relative path support - Use
/modelstore/dirinstead of/mnt/se-lab/modelstore/dirMulti-tier support - Tier 0 (high-performance) and Tier 1 (default storage)
Tag-based workflows - Tag files and manage them as collections
Intelligent batching - Groups file events (15 sec) or processes immediately if traffic is light
🆕 Advanced File Ingestion Workflow
How It Works
File Detection: The file monitor polls NFS mounts every 5 seconds for new files
Tagging: New files are automatically tagged with:
user.ingestid=<md5hash>- MD5 hash for deduplicationuser.mimeid=<mimetype>- MIME type for classificationuser.embedding- Tag for files requiring embedding
Multi-format Processing: All supported file types trigger Kubernetes ingestion jobs
Collection Management: Files are organized into Milvus collections:
Folder-based: Collections named after folder (e.g.,
cold_0011)Sequential: Collections named
intel_1,intel_2, etc.
Tier Management: Files tagged for embedding are automatically promoted to tier0
Embedding Generation: Files are processed into vector embeddings
Storage: Embeddings stored in Milvus for semantic search
Tier Demotion: After embedding, files are moved back to default tier
Folder Processing
When a new folder is detected:
All supported files in the folder are processed as a batch
A single Milvus collection is created named after the folder
Collection names are sanitized (e.g.,
cold-0011→cold_0011)All files in the folder are uploaded to the same collection
Files are tagged with
embeddingand promoted to tier0 for processingAfter embedding, tier0 objective is removed
Configuration
The file monitor supports:
Polling intervals: 5 seconds (fast) during business hours, 30 seconds during retroactive hours
Retroactive tagging: Configurable time windows (default: disabled for testing)
Path monitoring: Monitors
/mnt/anvil/hub/by defaultFile filtering: Processes all supported file types (BMP, DOCX, HTML, JPEG, JSON, MD, PDF, PNG, PPTX, SH, TIFF, TXT, MP3)
Recursive folder tagging: Tags entire folder hierarchies for 40x performance improvement
Kubernetes Integration
Job Templates: Uses
k8s-templates/ingest.yamlfor PDF processingConfigMaps: File lists passed to jobs via Kubernetes ConfigMaps
PVC Integration: Uses
hammerspace-hub-pvcfor persistent storagePath Mapping: Container paths mapped from
/mnt/anvil/hub/to/data/
🎯 Usage
Example Commands (Web UI)
Try these in the natural language interface:
🆕 File Ingestion Commands
Direct MCP Server Usage
Add to your MCP client configuration (e.g., Cursor IDE):
Available MCP Tools
Tag Management
tag_directory_recursive- Recursively tag all files in a directory{ "path": "/mnt/se-lab/modelstore/my-models", "tag_name": "user.modelsetid", "tag_value": "my-tag-value" }check_tagged_files_alignment- Find and check alignment of tagged files{ "tag_name": "user.modelsetid", "tag_value": "my-tag-value", "share_path": "/mnt/se-lab/modelstore/" }
Tier Management
apply_objective_to_path- Apply objectives like tier promotion{ "objective_name": "Place-on-tier0", "path": "/mnt/se-lab/modelstore/my-models" }remove_objective_from_path- Remove objectives{ "objective_name": "Place-on-tier0", "path": "/mnt/se-lab/modelstore/my-models" }
File Ingestion
ingest_new_files- Find new files, tag them, and place on Tier 1{ "path": "/mnt/se-lab/modelstore/", "tag_name": "user.modelsetid", "tag_value": "new-batch", "age_minutes": 60 }
System Utilities
refresh_mounts- Refresh NFS mounts to resolve stale file handles{ "mount_type": "all" }
File Monitoring & Ingestion
start_inotify_monitor- Start automated file monitoring service{}Monitors all Hammerspace shares, automatically tags new files with:
user.ingestid=<md5hash>- MD5 hash for deduplicationuser.mimeid=<mimetype>- MIME type for classification
Events are batched every 15 seconds or processed immediately if traffic is light.
get_file_monitor_status- Get monitor status{}Returns: running state, watched paths, pending events, files tagged, CPU usage
get_file_ingest_events- Query file ingest/tagging events{ "limit": 100, "event_type": "NEW_FILE", "file_pattern": ".pdf", "since_timestamp": "2025-10-09T20:00:00" }Returns recent file ingestion events with full metadata including:
Event type (NEW_FILE or RETROACTIVE_TAG)
File path and name
MD5 hash (ingestid)
MIME type (mimeid)
File size and timestamp
Use Cases:
Track which files have been ingested
Build automated workflows based on file events
Monitor ingestion pipelines
Audit file processing
stop_inotify_monitor- Stop monitoring service{}list_objectives_for_path- List all objectives for a path{ "path": "/mnt/se-lab/modelstore/my-models" }
🔧 Architecture
Components
MCP Server (
src/aiq_hstk_mcp_server.py)Implements MCP protocol
Executes Hammerspace HSTK CLI commands
Handles automatic retry on stale file handles
Uses real HSTK operations (no mock data)
Web UI (
web_ui/app.py)Flask-based natural language interface
Anthropic Claude integration for NL understanding
MCP client for tool execution
Real-time debug log streaming
🆕 File Monitor (
src/file_monitor.py)Polling-based file detection for NFS 4.2 mounts
Automatic tagging with MD5 and MIME type
Folder-based batch processing
Kubernetes job deployment for PDF ingestion
Milvus collection management
🆕 File Monitor Daemon (
src/file_monitor_daemon.py)Standalone daemon for running file monitor
Background service for continuous monitoring
Logging and error handling
Mount Refresh Script (
refresh_mounts.sh)Unmounts and remounts Hammerspace NFS shares
Resolves stale file handle errors
Supports selective mount refresh
Data Flow
🆕 File Ingestion Data Flow
🧪 Testing
Web UI Tests
Single Command Test
🆕 File Ingestion Tests
🛠️ Service Management
Unified Service Management
Individual Service Management
Service URLs
Once started, services are available at:
Web UI: http://localhost:5000
Web UI (LAN): http://10.0.0.236:5000
Hammerspace MCP: stdio-based (no HTTP endpoint)
Milvus MCP: http://localhost:9902/sse (if Milvus is running)
Kubernetes MCP: http://localhost:9903/sse (not implemented)
View Logs
🐛 Troubleshooting
Stale File Handle Errors
The MCP server automatically detects and retries operations when stale file handles occur. If issues persist:
Tag Operations Not Working
Ensure you're targeting a mounted directory
Verify the HSTK CLI is accessible:
/home/mike/hs-mcp-1.0/.venv/bin/hs --versionCheck mounts:
mount | grep hammerspace
🆕 File Ingestion Issues
Files not detected: Check if file monitor is running
ps aux | grep file_monitor_daemonKubernetes jobs failing: Check pod status
kubectl get pods -l app=pdf-ingest kubectl logs -l app=pdf-ingestEmpty Milvus collections: Check ingest service logs
kubectl logs -l app=ingestor-serverNFS mount issues: Verify mount points
mount | grep anvil ls -la /mnt/anvil/hub/
Common Tag Formats
Tags in Hammerspace use the format: namespace.key=value
Examples:
user.modelsetid=my-demouser.project=gtc-2025user.tier=critical
📚 Documentation
🚀 Quick Start
Quick Reference Guide - Your go-to guide for common operations
Service Management - Complete service management guide
🏗️ Architecture & API
Architecture Documentation - Comprehensive system architecture
API Documentation - Complete API reference for all MCP servers
🔗 Integration Guides
Integration Guide - Connect to Cursor, Windsurf, NVIDIA Playground
📖 Feature Guides
🧪 Testing Guides
🔒 Production Deployment
Security Considerations
Run the Web UI behind a reverse proxy (nginx, Apache)
Use HTTPS for production deployments
Restrict access with authentication
Review and limit MCP tool permissions
Performance Tuning
Adjust
max_files_to_checkfor large filesystems (default: 500)Use specific
share_pathparameters to limit search scopeMonitor log files for performance issues
📞 Support
For support and questions:
Create an issue in the GitHub repository
Check the documentation in the
docs/folderReview the troubleshooting section above
🎉 Recent Updates
🆕 October 23, 2025 - Advanced Service Management + Multi-Format Support
✅ NEW: Unified service management script (
start_all_services.sh) for all MCP services✅ NEW: Screen session persistence - services survive SSH disconnections
✅ NEW: Auto-restart capability for crashed services
✅ NEW: Systemd integration for auto-start on boot
✅ NEW: Multi-format file support (BMP, DOCX, HTML, JPEG, JSON, MD, PDF, PNG, PPTX, SH, TIFF, TXT, MP3)
✅ NEW: Tag-based tier management with automatic tier0 promotion/demotion
✅ NEW: Recursive folder tagging for 40x performance improvement
✅ NEW: Enhanced event filtering and monitoring UI
✅ NEW: Comprehensive health monitoring and port conflict resolution
✅ FIXED: Unicode handling in logs and CLI output
✅ FIXED: NFS timing issues with retry mechanisms
✅ FIXED: Hammerspace CLI tag operations with fallback methods
✅ FIXED: Individual file vs folder-level tagging optimization
🆕 October 16, 2025 - Automatic File Ingestion + Kubernetes + MCP
✅ NEW: Complete automatic file ingestion system with Kubernetes integration
✅ NEW: Polling-based file monitoring for NFS 4.2 mounts (inotify not supported)
✅ NEW: Folder-based batch processing - new folders trigger collection creation
✅ NEW: Milvus vector database integration for PDF embeddings
✅ NEW: Kubernetes job deployment with ConfigMaps for file processing
✅ NEW: Intelligent collection naming (folder-based and sequential)
✅ NEW: Real-time event streaming and monitoring dashboard
✅ NEW: Standalone file monitor daemon for continuous operation
✅ FIXED: NFS compatibility issues with polling-only approach
✅ FIXED: Kubernetes PVC integration and path mapping
✅ FIXED: Milvus collection naming conventions (underscores vs hyphens)
✅ FIXED: File path handling in container environments
✅ FIXED: Job completion tracking and error handling
✅ FIXED: Retroactive tagging time window configuration
✅ FIXED: Threading and async operation coordination
October 9, 2025 - File Ingestion Event System & Monitoring UI
✅ NEW: Agentic event consumption via
get_file_ingest_eventsMCP tool✅ NEW: Real-time monitoring dashboard at
/monitorwith live event streaming✅ NEW: API endpoints for event querying and monitoring
✅ FIXED: Duplicate file processing - files now tagged exactly once
✅ FIXED: In-memory tracking prevents repeated tagging of same files
✅ Deduplication verification tools:
check_duplicates.shandfind-dup.sh✅ Full event metadata: timestamp, file path, MD5 hash, MIME type, file size
✅ Event filtering: by type, file pattern, or timestamp
✅ Toast notifications for new file arrivals
✅ Automated file monitoring service with MD5/MIME tagging
Previous Updates
✅ Added automatic mount refresh on stale file handle errors
✅ Suppressed Anthropic API deprecation warnings in logs
✅ Added copy buttons to all chat messages
✅ Fixed PARTIALLY ALIGNED status detection
✅ Web-based natural language console with Claude AI
MCP 1.5 - Production-ready Hammerspace storage management with natural language interface and automatic file ingestion.