# ๐๏ธ Conception de l'Architecture `activated_rag`
## ๐ฏ Objectif Principal
Crรฉer un **outil maรฎtre unique** qui remplace les 5 outils actuels :
- `injection_rag` (alias `index_project`)
- `update_project`
- `search_code` (remplacรฉ par `recherche_rag`)
- `manage_projects` (intรฉgrรฉ)
- `analyse_code` (intรฉgrรฉ)
## ๐ Schรฉma d'Entrรฉe/Sortie
### Entrรฉe (`activated_rag` Input Schema)
```typescript
interface ActivatedRagInput {
// Mode d'opรฉration
mode: 'full' | 'incremental' | 'watch' | 'analyze_only';
// Cible
project_path?: string; // Auto-dรฉtectรฉ si vide
file_patterns?: string[]; // Par dรฉfaut: ['**/*']
// Options avancรฉes
enable_phase0?: boolean; // Dรฉtection workspace automatique
enable_watcher?: boolean; // Surveillance temps rรฉel
enable_llm_enrichment?: boolean; // Phase 0.3 optionnelle
// Filtres
content_types?: Array<'code' | 'doc' | 'config' | 'other'>;
languages?: string[]; // ['typescript', 'python', ...]
// Configuration embeddings
embedding_models?: {
code?: string; // Par dรฉfaut: 'nomic-embed-code'
text?: string; // Par dรฉfaut: 'nomic-embed-text'
config?: string; // Par dรฉfaut: 'bge-small'
};
// Options de chunking
chunking_strategy?: 'logical' | 'fixed' | 'ai_enhanced';
max_chunk_size?: number; // Par dรฉfaut: 1000 tokens
// Mรฉtadonnรฉes
metadata_overrides?: Record<string, any>;
}
```
### Sortie (`activated_rag` Output Schema)
```typescript
interface ActivatedRagOutput {
success: boolean;
version: string;
duration_seconds: number;
// Statistiques
stats: {
total_files: number;
indexed_files: number;
ignored_files: number;
errors: number;
chunks_created: number;
embeddings_generated: number;
};
// Pipeline exรฉcutรฉ
pipeline: {
phase_0: 'โ' | 'โ' | 'N/A'; // Dรฉtection workspace
phase_1: 'โ' | 'โ' | 'N/A'; // Analyse statique
phase_2: 'โ' | 'โ' | 'N/A'; // Chunking intelligent
phase_3: 'โ' | 'โ' | 'N/A'; // Embeddings spรฉcialisรฉs
phase_4: 'โ' | 'โ' | 'N/A'; // Injection & mise ร jour
};
// Mรฉtadonnรฉes projet
project_metadata: {
project_path: string;
project_hash: string;
last_indexed: string;
total_size_bytes: number;
file_types: Record<string, number>;
};
// Configuration utilisรฉe
config_used: RagConfig;
// Erreurs (si any)
errors?: Array<{
file_path: string;
error: string;
timestamp: string;
}>;
}
```
## ๐ Pipeline Interne (5 Phases)
### Phase 0 : Dรฉtection & Surveillance Workspace
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Phase 0 - Workspace Detection โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Input: project_path (auto-dรฉtectรฉ si vide) โ
โ Output: WorkspaceConfig + FileWatcher โ
โ โ
โ Composants: โ
โ 1. WorkspaceDetector โ
โ - Dรฉtection VS Code workspace โ
โ - Dรฉtection Git repository โ
โ - Analyse .gitignore/.ragignore โ
โ โ
โ 2. FileWatcher (chokidar) โ
โ - Surveillance temps rรฉel โ
โ - รvรฉnements: add, change, unlink โ
โ - Debouncing (500ms) โ
โ โ
โ 3. EventLogger โ
โ - Logs structurรฉs โ
โ - Mรฉtriques performance โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
### Phase 1 : Analyse Statique Multi-Langage
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Phase 1 - Static Analysis โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Input: File paths + content โ
โ Output: AnalyzedFile[] avec AST + symbols โ
โ โ
โ Composants: โ
โ 1. ContentDetector (existant) โ
โ - Type: code/doc/config/other โ
โ - Langage: typescript/python/javascript/etc. โ
โ โ
โ 2. TreeSitterAnalyzer (nouveau) โ
โ - Parsers: tsx, python, rust, go, java, cpp โ
โ - Extraction: fonctions, classes, imports, comments โ
โ - Relations: hiรฉrarchie, dรฉpendances โ
โ โ
โ 3. SymbolExtractor (nouveau) โ
โ - Identifiants uniques โ
โ - Portรฉe (scope) โ
โ - Documentation associรฉe โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
### Phase 2 : Chunking Intelligent par Unitรฉs Logiques
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Phase 2 - Intelligent Chunking โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Input: AnalyzedFile[] โ
โ Output: Chunk[] avec mรฉtadonnรฉes enrichies โ
โ โ
โ Stratรฉgies par type: โ
โ โ
โ 1. CODE โ Chunking par unitรฉ logique โ
โ - 1 fonction = 1 chunk โ
โ - 1 classe = N chunks (mรฉthodes sรฉparรฉes) โ
โ - Imports = chunk sรฉparรฉ โ
โ - Tests = chunk sรฉparรฉ โ
โ โ
โ 2. DOCUMENTATION โ Chunking sรฉmantique โ
โ - 1 section Markdown (##) = 1 chunk โ
โ - 1 paragraphe = 1 chunk si long โ
โ - Tables = chunk sรฉparรฉ โ
โ - Listes = chunk complet โ
โ โ
โ 3. CONFIGURATION โ Chunking structurel โ
โ - 1 objet JSON = 1 chunk โ
โ - 1 tableau = 1 chunk โ
โ - 1 section YAML = 1 chunk โ
โ โ
โ 4. AUTRE โ Chunking fixe (fallback) โ
โ - 500 tokens par dรฉfaut โ
โ - Overlap: 100 tokens โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
### Phase 3 : Embeddings Spรฉcialisรฉs par Type
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Phase 3 - Specialized Embeddings โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Input: Chunk[] + type dรฉtectรฉ โ
โ Output: Vector[] + mรฉtadonnรฉes embedding โ
โ โ
โ Modรจles par type: โ
โ โ
โ 1. CODE โ nomic-embed-code / codebert โ
โ - Dimensions: 768 โ
โ - Optimisรฉ: fonctions, syntaxe, sรฉmantique code โ
โ - Alternative: starcoder2-embedding โ
โ โ
โ 2. TEXT โ nomic-embed-text / bge-small โ
โ - Dimensions: 768 โ
โ - Optimisรฉ: documentation, commentaires โ
โ - Alternative: all-minilm โ
โ โ
โ 3. CONFIG โ bge-small / all-minilm โ
โ - Dimensions: 384 โ
โ - Optimisรฉ: JSON, YAML, configurations โ
โ โ
โ 4. FALLBACK โ qwen3-embedding:8b โ
โ - Dimensions: 1024 โ
โ - Usage: type inconnu ou mixte โ
โ โ
โ Features: โ
โ - Cache embeddings (LRU, 1000 entrรฉes) โ
โ - Batching Ollama (max 10 chunks) โ
โ - Normalisation L2 automatique โ
โ - Fallback fake embeddings โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
### Phase 4 : Injection & Mise ร Jour Automatique
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Phase 4 - Injection & Automatic Update โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Input: Chunk[] + Vector[] + mรฉtadonnรฉes โ
โ Output: PostgreSQL rows + statistiques โ
โ โ
โ Processus: โ
โ 1. Calcul hash du chunk (SHA-256) โ
โ 2. Vรฉrification existence (par hash) โ
โ 3. Insertion ou mise ร jour โ
โ 4. Mise ร jour mรฉtadonnรฉes projet โ
โ โ
โ Mรฉtadonnรฉes obligatoires: โ
โ { โ
โ chunk_hash: string, // SHA-256 du contenu โ
โ content_type: string, // 'code', 'doc', 'config' โ
โ language: string, // 'typescript', 'python' โ
โ file_path: string, // Chemin relatif โ
โ symbol_name?: string, // Nom fonction/classe โ
โ symbol_type?: string, // 'function', 'class', etc. โ
โ start_line: number, // Ligne dรฉbut โ
โ end_line: number, // Ligne fin โ
โ ast_depth?: number, // Profondeur AST โ
โ dependencies?: string[], // Dรฉpendances identifiรฉes โ
โ parent_symbol?: string, // Symbole parent โ
โ updated_at: string, // Timestamp ISO โ
โ project_hash: string // Hash du projet โ
โ } โ
โ โ
โ Optimisations: โ
โ - Transaction batch (100 chunks) โ
โ - Compression zlib (si > 1KB) โ
โ - Index PostgreSQL optimisรฉ โ
โ - Cleanup anciennes versions โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
## ๐บ๏ธ Diagramme de Flux Complet
```mermaid
flowchart TD
A[activated_rag] --> B{Mode?}
B -->|full| C[Phase 0: Workspace Detection]
B -->|incremental| D[Phase 0: Git Diff Analysis]
B -->|watch| E[Phase 0: File Watcher Start]
B -->|analyze_only| F[Phase 1: Static Analysis Only]
C --> G[Phase 1: Static Analysis]
D --> G
E --> G
F --> Z[Return Analysis Results]
G --> H[Phase 2: Intelligent Chunking]
H --> I{Content Type?}
I -->|code| J[Code Embeddings<br/>nomic-embed-code]
I -->|doc| K[Text Embeddings<br/>nomic-embed-text]
I -->|config| L[Config Embeddings<br/>bge-small]
I -->|other| M[Fallback Embeddings<br/>qwen3-embedding:8b]
J --> N[Phase 4: Injection & Update]
K --> N
L --> N
M --> N
N --> O[Generate Statistics]
O --> P[Return Success]
E --> Q[Watch Mode Active]
Q --> R[Real-time Updates]
R --> G
```
## ๐ง Intรฉgration avec Composants Existants
### Rรฉutilisation des Composants
1. **ContentDetector** (existant) โ Dรฉtection type/langage
2. **VectorStore** (modifiรฉ) โ Support multi-modรจles
3. **Indexer** (modifiรฉ) โ Chunking intelligent
4. **LLM Cache** (existant) โ Cache embeddings
5. **PostgreSQL** (existant) โ Stockage v2
### Nouveaux Composants ร Crรฉer
1. **WorkspaceDetector** โ Dรฉtection automatique VS Code/Git
2. **TreeSitterAnalyzer** โ Analyse AST multi-langage
3. **SymbolExtractor** โ Extraction symboles
4. **IntelligentChunker** โ Chunking par unitรฉs logiques
5. **EmbeddingRouter** โ Routage vers modรจle appropriรฉ
## ๐๏ธ Configuration (`rag-config.json`)
### Nouveaux Paramรจtres
```json
{
"phase0": {
"enabled": true,
"auto_detect_workspace": true,
"watch_files": false,
"watch_debounce_ms": 500,
"ignore_patterns": ["node_modules", ".git", "*.log"]
},
"analysis": {
"tree_sitter": {
"enabled": true,
"parsers": ["typescript", "python", "javascript", "rust", "go", "java", "cpp"],
"extract_symbols": true,
"extract_comments": true
},
"max_file_size_mb": 10
},
"chunking": {
"strategy": "logical",
"max_chunk_size_tokens": 1000,
"overlap_tokens": 100,
"code": {
"chunk_by_function": true,
"chunk_by_class": true,
"include_imports": true
},
"documentation": {
"chunk_by_section": true,
"chunk_by_paragraph": true,
"min_paragraph_length": 50
},
"configuration": {
"chunk_by_object": true,
"chunk_by_array": true
}
},
"embeddings": {
"provider": "ollama",
"models": {
"code": "nomic-embed-code",
"text": "nomic-embed-text",
"config": "bge-small",
"fallback": "qwen3-embedding:8b"
},
"dimensions": {
"code": 768,
"text": 768,
"config": 384,
"fallback": 1024
},
"cache": {
"enabled": true,
"max_entries": 1000,
"ttl_seconds": 3600
}
},
"database": {
"table_name": "rag_store_v2",
"enable_compression": true,
"batch_size": 100,
"cleanup_old_versions": true
}
}
```
## ๐ Rรฉtrocompatibilitรฉ
### Mapping Ancien โ Nouveau
| Ancien Outil | Nouveau Equivalent | Notes |
|-------------|-------------------|-------|
| `injection_rag` | `activated_rag` avec `mode: 'full'` | Migration transparente |
| `index_project` | `activated_rag` avec `mode: 'full'` | Alias maintenu |
| `update_project` | `activated_rag` avec `mode: 'incremental'` | Utilise Git diff |
| `search_code` | `recherche_rag` | Outil sรฉparรฉ (lecture seule) |
| `manage_projects` | Intรฉgrรฉ dans `activated_rag` output | Statistiques incluses |
### Migration Automatique
1. **Alias dans le registre** : `index_project` โ `activated_rag`
2. **Conversion paramรจtres** : Mapping automatique
3. **Donnรฉes existantes** : Compatible avec `rag_store_v2`
4. **Fallback** : Ancien pipeline disponible si erreur
## ๐ Mรฉtriques de Performance Cibles
### Temps d'Exรฉcution
- **Phase 0** : < 100ms (dรฉtection workspace)
- **Phase 1** : ~50ms/fichier (analyse statique)
- **Phase 2** : ~20ms/chunk (chunking intelligent)
- **Phase 3** : ~100ms/chunk (embeddings, avec cache)
- **Phase 4** : ~10ms/chunk (injection DB)
### Mรฉmoire
- **Cache embeddings** : Max 100MB
- **AST en mรฉmoire** : Max 50 fichiers simultanรฉs
- **Chunks batch** : Max 100 chunks/batch
### Qualitรฉ
- **Prรฉcision recherche** : +20% vs ancien systรจme
- **Recall** : +15% vs chunking fixe
- **Latence recherche** : < 100ms
## ๐ Plan d'Implรฉmentation
### รtape 1 : Structure de Base
1. Crรฉer `src/tools/rag/activated-rag.ts`
2. Dรฉfinir schรฉma d'entrรฉe/sortie
3. Implรฉmenter routing vers phases
### รtape 2 : Phase 0 - Workspace Detection
1. Implรฉmenter `Workspace