README.mdā¢9.8 kB
# MCP Brain Service Documentation
## Overview
The MCP Brain Service is a pure infrastructure service for storing and retrieving documents with semantic search, graph relationships, and hybrid search capabilities.
**Key Principle**: Brain service is infrastructure, not application logic. Apps prepare data, brain service stores and retrieves.
## Quick Links
### š New in v1.1.0
- **[How to Use](how-to-use.md)** - Complete API usage guide with all 10 endpoints
- **[Batch Endpoints Guide](BATCH_ENDPOINTS_GUIDE.md)** - Detailed guide for new batch endpoints
- **[Implementation Summary](IMPLEMENTATION_SUMMARY.md)** - Technical implementation details
- **[Deployment Guide](DEPLOYMENT_GUIDE.md)** - Deployment instructions
- **[Changelog](../CHANGELOG.md)** - Version history and changes
### Architecture & Design
- [Architecture Decision](./architecture-decision.md) - Why data preparation belongs in the app
- [Retriv Integration Plan](./retriv-integration-plan.md) - How to enhance queries with hybrid search
- [API Contracts](./api-contracts.md) - Storage and query API specifications
### Implementation
- [Implementation Checklist](./implementation-checklist.md) - Step-by-step guide to add Retriv
### Operations
- [QUICKSTART](../QUICKSTART.md) - Get started quickly
- [PRODUCTION](../PRODUCTION.md) - Production deployment guide
- [WARP](../WARP.md) - Development commands and tips
## What is MCP Brain Service?
A Python-based service that provides:
1. **Storage**: Store documents with embeddings and relationships
2. **Semantic Search**: Find similar documents using Jina embeddings
3. **Hybrid Search**: Combine keyword (BM25) + semantic search with Retriv
4. **Graph Queries**: Navigate relationships using Neo4j
5. **MCP Protocol**: WebSocket-based tool interface for agents
## Architecture
```
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Application (Next.js) ā
ā ā
ā - Prepares data (enrichment, transformation) ā
ā - Calls brain service for storage/retrieval ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā ā
(store API) (query API)
ā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Brain Service (brain.ft.tc) ā
ā Pure Storage & Retrieval Infrastructure ā
ā ā
ā āāāāāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāāāāā ā
ā ā Store API ā ā Query API ā ā
ā ā - Store documents ā ā - Hybrid search ā ā
ā ā - Batch storage ā ā - Semantic search ā ā
ā āāāāāāāāāāāāāāāāāāāāāā ā - Graph queries ā ā
ā āāāāāāāāāāāāāāāāāāāāāā ā
ā ā ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā Storage & Retrieval Services ā ā
ā ā - RetrivService (hybrid search) ā ā
ā ā - Neo4jService (graph storage) ā ā
ā ā - JinaService (embeddings) ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
```
## Key Concepts
### 1. Pure Infrastructure
Brain service is like a database - it doesn't understand your business logic:
```
ā
Good: App prepares character data ā Brain service stores
ā Bad: Brain service understands PayloadCMS structure
```
### 2. Document Format
All documents follow a standard format:
```typescript
{
id: string // Unique identifier
type: string // Document type (character, scene, etc.)
project_id: string // Project isolation
text: string // Searchable text (prepared by app)
metadata: object // Structured data (prepared by app)
relationships: array // Graph relationships (prepared by app)
}
```
### 3. Hybrid Search
Retriv combines two search methods:
- **BM25 (Keyword)**: Matches exact terms like "vest", "scene 3"
- **Embeddings (Semantic)**: Understands meaning like "clothing", "appearance"
- **Combined**: Better results than either alone
### 4. Project Isolation
All data is isolated by `project_id` - different projects don't see each other's data.
## Getting Started
### For Brain Service Development
1. Read [Retriv Integration Plan](./retriv-integration-plan.md)
2. Follow [Implementation Checklist](./implementation-checklist.md)
3. Run tests: `pytest`
4. Deploy: See [PRODUCTION.md](../PRODUCTION.md)
### For App Integration
1. Read [Architecture Decision](./architecture-decision.md)
2. Review [API Contracts](./api-contracts.md)
3. Implement data preparation in your app
4. Call brain service storage/query APIs
## API Overview
### Storage
```bash
# Store single document
POST /store
{
"id": "char_aladdin_proj123",
"type": "character",
"project_id": "proj123",
"text": "Aladdin: A street-smart young man...",
"metadata": {...},
"relationships": [...]
}
# Store multiple documents
POST /store/batch
{
"documents": [...]
}
```
### Query
```bash
# Hybrid search (recommended)
POST /query
{
"project_id": "proj123",
"query": "What does Aladdin wear in scene 3?",
"search_type": "hybrid",
"top_k": 5
}
# Character context
POST /query/character-context
{
"project_id": "proj123",
"character_name": "Aladdin",
"scene_number": 3
}
```
See [API Contracts](./api-contracts.md) for full details.
## MCP Tools
Brain service also exposes WebSocket-based MCP tools:
- `store_document` - Store a document
- `store_batch` - Store multiple documents
- `search` - Hybrid search
- `find_similar` - Find similar documents
- `get_relationships` - Get document relationships
## Technology Stack
- **FastAPI**: Web framework with WebSocket support
- **Neo4j**: Graph database for relationships
- **Jina AI**: Embedding generation
- **Retriv**: Hybrid search (BM25 + embeddings)
- **Pydantic**: Data validation
- **Pytest**: Testing framework
## Development Workflow
1. **Make changes** to brain service code
2. **Run tests**: `pytest`
3. **Test locally**: `python src/main.py`
4. **Deploy**: Follow deployment guide
## Testing
```bash
# Run all tests
pytest
# Run specific test type
pytest tests/unit/
pytest tests/integration/
pytest tests/performance/
# Run with coverage
pytest --cov=src
```
## Deployment
See [PRODUCTION.md](../PRODUCTION.md) for deployment instructions.
Quick deploy to Coolify:
1. Push to main branch
2. Coolify auto-deploys
3. Verify health: `curl https://brain.ft.tc/health`
## Monitoring
### Health Check
```bash
curl https://brain.ft.tc/health
```
Response:
```json
{
"status": "healthy",
"services": {
"neo4j": "connected",
"jina": "connected",
"retriv": "initialized"
}
}
```
### Logs
```bash
# View logs in Coolify dashboard
# Or SSH to server and check Docker logs
docker logs mcp-brain-service
```
## Troubleshooting
### Retriv Not Initializing
```bash
# Check if Retriv package installed
pip list | grep retriv
# Check data directory permissions
ls -la data/retriv_index/
# Check logs for errors
docker logs mcp-brain-service | grep -i retriv
```
### Slow Queries
```bash
# Check Retriv index size
du -sh data/retriv_index/
# Monitor query performance
# Add logging in retriv_service.py
```
### Neo4j Connection Issues
```bash
# Check Neo4j connection
curl http://neo4j.ft.tc:7474
# Verify credentials in .env
echo $NEO4J_URI
echo $NEO4J_USER
```
## Contributing
1. Create feature branch
2. Make changes
3. Add tests
4. Update documentation
5. Submit PR
## Support
- **Issues**: Create GitHub issue
- **Questions**: Check documentation first
- **Urgent**: Contact team lead
## Roadmap
### ā
Completed (v1.1.0) - January 2025
- ā
Neo4j storage
- ā
Jina embeddings v4
- ā
MCP protocol
- ā
REST API routes (10 endpoints)
- ā
Authentication (API key)
- ā
Batch node creation (up to 50 nodes)
- ā
Semantic duplicate detection
- ā
AI-powered department context aggregation (OpenRouter LLM)
- ā
Content coverage analysis with LLM
- ā
Comprehensive documentation
### In Progress (v1.2)
- š§ Retriv hybrid search
- š§ Rate limiting
- š§ Prometheus metrics
- š§ Redis caching layer
### Future (v2.0)
- [ ] Multi-modal search (images + text)
- [ ] Real-time updates via WebSocket
- [ ] Advanced analytics dashboard
- [ ] Multi-tenancy support
- [ ] GraphQL API
## License
[Your License Here]
## Contact
[Your Contact Info Here]