EMBEDDING_SYSTEM_README.mdโข14.5 kB
# ๐ Embedding System for MCP Conversation Intelligence
## Overview
The Embedding System is a powerful enhancement to your MCP Conversation Intelligence system that provides **semantic understanding** and **intelligent context matching** using state-of-the-art embedding technology.
## โจ Key Features
- **๐ Semantic Similarity Search**: Find contextually relevant information using semantic understanding
- **๐ง Intelligent Context Injection**: Automatically enhance prompts with the most relevant historical context
- **๐ Learning System**: Continuously improve context selection based on user interactions
- **โก High Performance**: FAISS-based vector search with fallback to database similarity
- **๐ Seamless Integration**: Works alongside your existing prompt generation and context systems
- **๐ก๏ธ Fallback Support**: Gracefully handles missing dependencies with basic functionality
## ๐๏ธ Architecture
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MCP Conversation Intelligence โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Prompt โ โ Enhanced Prompt โ โ
โ โ Generator โโโโโบโ Generator โ โ
โ โ (Existing) โ โ (New) โ โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ
โ โผ โผ โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Context โ โ Embedding โ โ
โ โ Manager โ โ Manager โ โ
โ โ (Existing) โ โ (New) โ โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ
โ โผ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Embedding Integration Layer โ โ
โ โ (Seamless Bridge) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
## ๐ฆ Installation
### 1. Install Dependencies
```bash
# Install core dependencies
pip install -r requirements_embeddings.txt
# Or install manually
pip install sentence-transformers numpy faiss-cpu
```
### 2. Verify Installation
```bash
# Test the embedding system
python test_embedding_system.py
```
## ๐ Quick Start
### Basic Usage
```python
from embedding_integration import get_embedding_integration
# Get the integration instance
integration = get_embedding_integration()
# Enhance a prompt with semantic context
user_message = "How can I improve my machine learning system?"
enhanced_prompt = integration.enhance_prompt_with_embeddings(
user_message,
context_type="machine_learning",
use_semantic_search=True
)
print(enhanced_prompt)
```
### Advanced Usage
```python
from embedding_manager import create_embedding_manager
from enhanced_prompt_generator import create_enhanced_prompt_generator
# Create custom instances
embedding_manager = create_embedding_manager(
model_name="all-MiniLM-L6-v2",
max_embeddings=5000
)
enhanced_generator = create_enhanced_prompt_generator(embedding_manager)
# Generate semantic prompt
semantic_prompt = enhanced_generator.generate_semantic_prompt(
"What are the best practices for data preprocessing?",
context_type="data_science",
max_similar_contexts=5,
similarity_threshold=0.7
)
```
## ๐ง Configuration
### Embedding Manager Configuration
```python
from embedding_manager import EmbeddingManager
manager = EmbeddingManager(
model_name="all-MiniLM-L6-v2", # Embedding model
db_path="data/embeddings.db", # Database path
max_embeddings=10000, # Maximum embeddings to store
similarity_threshold=0.7 # Default similarity threshold
)
```
### Available Models
- **`all-MiniLM-L6-v2`** (default): Fast, lightweight, good quality
- **`all-mpnet-base-v2`**: Higher quality, slower
- **`paraphrase-multilingual-MiniLM-L12-v2`**: Multilingual support
## ๐ API Reference
### EmbeddingManager
#### Core Methods
```python
# Generate embeddings
embedding = manager.generate_embedding("Your text here")
# Add context embeddings
embedding_id = manager.add_embedding(
text="Context text",
context_type="conversation",
session_id="session_123",
user_id="user_456",
metadata={"source": "user_input"}
)
# Find similar contexts
similar_contexts = manager.find_similar_contexts(
query="Your query",
context_type="conversation",
limit=5,
min_similarity=0.7
)
# Get statistics
stats = manager.get_embedding_stats()
```
### EnhancedPromptGenerator
#### Enhanced Methods
```python
# Generate enhanced prompt with semantic context
enhanced_prompt = generator.generate_enhanced_prompt(
user_message,
context_type="smart",
use_semantic_search=True,
similarity_threshold=0.7
)
# Generate semantic-only prompt
semantic_prompt = generator.generate_semantic_prompt(
user_message,
context_type="conversation",
max_similar_contexts=5
)
# Learn from interactions
generator.learn_from_interaction(
user_message="User input",
enhanced_prompt="Generated prompt",
response_quality=0.8,
context_type="conversation"
)
```
### EmbeddingIntegration
#### Integration Methods
```python
# Test integration
test_results = integration.test_integration()
# Get system status
status = integration.get_integration_status()
# Get comprehensive statistics
stats = integration.get_system_statistics()
# Clear system cache
result = integration.clear_system_cache("conversation")
```
## ๐ Integration with Existing Systems
### 1. Enhanced Prompt Generation
The embedding system automatically enhances your existing prompt generation:
```python
# Before: Basic context injection
from prompt_generator import PromptGenerator
generator = PromptGenerator()
prompt = generator.generate_enhanced_prompt("User message", "smart")
# After: Semantic context injection
from enhanced_prompt_generator import EnhancedPromptGenerator
enhanced_generator = EnhancedPromptGenerator()
enhanced_prompt = enhanced_generator.generate_enhanced_prompt(
"User message", "smart", use_semantic_search=True
)
```
### 2. Context Learning System
Integrate with your existing context learning:
```python
from context_learning_system import ContextLearningSystem
from embedding_integration import get_embedding_integration
# Your existing context learning system
context_learner = ContextLearningSystem()
# Enhanced with embeddings
embedding_integration = get_embedding_integration()
# Learn from interactions with semantic context
result = embedding_integration.learn_from_interaction(
user_message="How to implement embeddings?",
enhanced_prompt="Enhanced prompt with semantic context",
response_quality=0.9,
context_type="technical"
)
```
### 3. Smart Caching System
Enhance your caching with semantic similarity:
```python
from smart_caching_system import SmartCachingSystem
from embedding_integration import find_semantic_contexts
# Your existing caching system
cache_system = SmartCachingSystem()
# Find semantically similar cached contexts
similar_contexts = find_semantic_contexts(
"Your query",
context_type="cached",
min_similarity=0.8
)
```
## ๐ Performance & Monitoring
### Performance Metrics
```python
# Get comprehensive statistics
stats = integration.get_system_statistics()
print(f"Total embeddings: {stats['embedding_system']['total_embeddings']}")
print(f"Semantic searches: {stats['enhanced_generator']['embedding_stats']['semantic_searches']}")
print(f"Context matches found: {stats['enhanced_generator']['embedding_stats']['context_matches_found']}")
```
### Monitoring Dashboard
```python
# Real-time monitoring
from simple_dashboard import start_simple_dashboard
# Start monitoring dashboard
dashboard = start_simple_dashboard(refresh_rate=5)
# Get current data
data = dashboard.get_dashboard_data()
```
## ๐งช Testing
### Run All Tests
```bash
python test_embedding_system.py
```
### Run Specific Test Categories
```python
import unittest
from test_embedding_system import TestEmbeddingManager, TestEnhancedPromptGenerator
# Test only embedding manager
suite = unittest.TestSuite()
suite.addTest(unittest.makeSuite(TestEmbeddingManager))
unittest.TextTestRunner().run(suite)
```
### Test Integration
```python
from embedding_integration import get_embedding_integration
integration = get_embedding_integration()
test_results = integration.test_integration()
print(f"Integration Status: {test_results['overall_status']}")
```
## ๐ง Troubleshooting
### Common Issues
#### 1. Missing Dependencies
```bash
# Error: No module named 'sentence_transformers'
pip install sentence-transformers
# Error: No module named 'faiss'
pip install faiss-cpu
```
#### 2. Fallback Mode
The system automatically falls back to basic functionality if dependencies are missing:
```python
# Check if using fallback mode
status = integration.get_integration_status()
print(f"Using fallback: {not status['embedding_manager_available']}")
```
#### 3. Database Issues
```python
# Check database status
stats = integration.get_system_statistics()
if 'error' in stats['embedding_system']:
print(f"Database error: {stats['embedding_system']['error']}")
```
### Performance Optimization
#### 1. Adjust Similarity Threshold
```python
# Lower threshold = more results, lower quality
# Higher threshold = fewer results, higher quality
enhanced_prompt = integration.enhance_prompt_with_embeddings(
user_message,
similarity_threshold=0.5 # More permissive
)
```
#### 2. Limit Context Results
```python
# Limit the number of similar contexts
similar_contexts = integration.find_semantic_contexts(
query, limit=3 # Only top 3 results
)
```
#### 3. Cache Management
```python
# Clear old embeddings periodically
integration.clear_system_cache()
# Or clear specific context types
integration.clear_system_cache("old_conversations")
```
## ๐ Advanced Features
### 1. Custom Embedding Models
```python
from sentence_transformers import SentenceTransformer
# Use custom model
custom_model = SentenceTransformer("your-custom-model")
manager = EmbeddingManager(model_name="custom")
```
### 2. Batch Processing
```python
# Process multiple texts at once
texts = ["Text 1", "Text 2", "Text 3"]
context_types = ["type1", "type2", "type3"]
for text, context_type in zip(texts, context_types):
integration.add_context_embedding(text, context_type)
```
### 3. Metadata Enrichment
```python
# Add rich metadata for better context understanding
metadata = {
"source": "user_input",
"session_id": "session_123",
"user_preferences": ["technical", "detailed"],
"response_quality": 0.9,
"timestamp": "2024-01-01T00:00:00Z"
}
integration.add_context_embedding(
"Your context text",
"conversation",
metadata=metadata
)
```
## ๐ Future Enhancements
### Planned Features
- **๐ Real-time Learning**: Continuous model updates based on user feedback
- **๐ Multi-language Support**: Enhanced multilingual embedding models
- **๐ Advanced Analytics**: Detailed performance and usage analytics
- **๐ External Integrations**: Connect with external knowledge bases
- **โก Performance Optimization**: GPU acceleration and model quantization
### Contributing
To contribute to the embedding system:
1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Submit a pull request
## ๐ Additional Resources
### Documentation
- [Sentence Transformers Documentation](https://www.sbert.net/)
- [FAISS Documentation](https://github.com/facebookresearch/faiss)
- [MCP Protocol Documentation](https://modelcontextprotocol.io/)
### Research Papers
- [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084)
- [FAISS: A Library for Efficient Similarity Search](https://arxiv.org/abs/1702.08734)
### Community
- [MCP Community Discord](https://discord.gg/modelcontextprotocol)
- [GitHub Issues](https://github.com/your-repo/issues)
## ๐ฏ Conclusion
The Embedding System transforms your MCP Conversation Intelligence from a simple context injection system into an **intelligent, learning, and semantically-aware** conversation partner.
By understanding the **meaning** behind user queries and automatically selecting the most relevant historical context, your system will provide:
- **๐ฏ More Relevant Responses**: Context that actually matches user intent
- **๐ง Continuous Learning**: System that gets smarter with every interaction
- **โก Better Performance**: Faster context selection and higher quality results
- **๐ Seamless Integration**: Works with your existing systems without disruption
Start using the embedding system today and experience the power of **semantic intelligence** in your conversations! ๐