CURRENT_SYSTEM_OVERVIEW.md•11.6 kB
# Current Query Processing System - Complete Overview
## System Architecture
### Core Components
```
┌──────────────────────────────────────────────────────────────┐
│ Query API Endpoint │
│ (/api/v1/query) │
└────────────────────────┬─────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ Authentication Layer │
│ (auth_service + token_service) │
└────────────────────────┬─────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ Query Router (query.py) │
│ │
│ if simple_greeting: ──► Pattern-based Processor │
│ elif USE_UNIVERSAL: ──► Universal LLM Processor │
│ else: ──► Specific Tools LLM Processor │
└────────────────────────┬─────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ Universal LLM Processor (Main) │
│ │
│ 1. Domain Identification (LLM) │
│ 2. Date Extraction (LLM → Pattern fallback) │
│ 3. Data Fetching (Universal Query Builder) │
│ 4. LLM Processing (Generate response) │
└────────────────────────┬─────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ Universal Query Builder │
│ │
│ • Fetches COMPLETE datasets for domains │
│ • No filtering, no aggregation │
│ • Returns ALL data │
└────────────────────────┬─────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ MongoDB Database │
│ │
│ Collections: products, orders, customers, │
│ inventory, warehouse, etc. │
└──────────────────────────────────────────────────────────────┘
```
## Detailed Query Processing Flow
### Step 1: API Entry Point (`query.py`)
```python
POST /api/v1/query
{
"query": "How much revenue did we generate in July?"
}
```
### Step 2: Authentication & Token Management
1. Validate bearer token with platform API
2. Check subscription status
3. Verify token limits (100,000 tokens/month for pro plan)
4. Estimate query token usage
### Step 3: Conversation Management
1. Create or retrieve conversation
2. Add user message
3. Track message history
### Step 4: Query Processing Selection
```python
if query in ["hi", "hello", "hey"]:
→ Pattern-based processor (simple responses)
elif settings.USE_UNIVERSAL_PROCESSOR:
→ Universal LLM Processor (default: true)
else:
→ Specific Tools LLM Processor
```
### Step 5: Universal LLM Processing
#### 5.1 Domain Identification
```python
Query: "How much revenue in July?"
↓
LLM Prompt: "Identify domains needed..."
↓
Response: ["sales"] # Sometimes fails, returns explanations
↓
Fallback: Keyword matching
```
#### 5.2 Date Extraction
```python
Query: "How much revenue in July?"
↓
LLM Prompt: "Extract date range..."
↓
Response: {"start": "2025-07-01", "end": "2025-07-31"}
↓
If fails: Pattern matching fallback
```
#### 5.3 Data Fetching (Universal Query Builder)
```python
For domain "sales" with July date range:
↓
Fetches:
- ALL orders in July (could be thousands)
- ALL order items
- Daily sales aggregation
- Customer purchase summaries
↓
Returns: Complete dataset with statistics
```
#### 5.4 LLM Processing
```python
Input to LLM:
{
"query": "How much revenue in July?",
"data": {
"sales": {
"orders": [...10,000 items...],
"order_items": [...50,000 items...],
"daily_sales": [...31 days...],
"statistics": {
"total_revenue": 50000,
"total_orders": 1000
}
}
}
}
↓
LLM analyzes and generates:
{
"answer": "We generated $50,000 in revenue in July.",
"intent": "sales_inquiry",
"confidence": 0.9
}
```
### Step 6: Response Formatting
```python
Return to user:
{
"success": true,
"response": "We generated $50,000 in revenue in July.",
"metadata": {
"model_used": "qwen2.5-3b",
"tokens_used": 500,
"execution_time_ms": 3000
}
}
```
## System Limitations
### 1. **Token Overflow (CRITICAL)**
```
Problem: Sending entire datasets to LLM
Example: "Which products are low on stock?"
- Fetches: 10,000 products + 10,000 inventory records
- Token count: 59,082 tokens
- Model limit: 2,048 tokens
- Result: CRASH! 💥
```
### 2. **Single-Pass Processing**
```
Problem: Can only fetch data once
Example: "Compare July revenue with June"
- Current: Cannot fetch two different date ranges
- Result: Query fails or gives incorrect answer
```
### 3. **No Real Calculations**
```
Problem: LLM performs calculations on text
Example: "Calculate month-over-month growth"
- Issue: Small LLMs bad at math
- Result: Incorrect calculations
```
### 4. **Inefficient Data Transfer**
```
Problem: Transfers ALL data even when unnecessary
Example: "Total revenue in July"
- Transfers: 10,000 order records
- Actually needs: Just the sum (1 number)
- Waste: 99.9% of data transfer
```
### 5. **No Query Optimization**
```
Problem: No aggregation at database level
Example: "Average order value"
- Current: Fetch all orders, LLM calculates
- Better: MongoDB aggregation pipeline
```
### 6. **Limited Context Window**
```
Model: qwen2.5-3b
Context: 2,048 tokens (very small)
Result: Cannot handle real-world data volumes
```
### 7. **No Memory/State**
```
Problem: Each query is independent
Example: User asks follow-up questions
- No context from previous queries
- Cannot build on previous results
```
### 8. **LLM Reliability Issues**
```
Problem: LLM returns inconsistent formats
Examples:
- Date extraction: Returns explanations instead of JSON
- Domain identification: Returns prose instead of array
- Final response: Sometimes not JSON formatted
```
## Real-World Failure Scenarios
### Scenario 1: Inventory Query
```
Query: "Which products are low on stock?"
What happens:
1. Fetches ALL products (10,000 items)
2. Fetches ALL inventory (10,000 items)
3. Attempts to send 59,082 tokens to LLM
4. Model crashes (limit: 2,048)
5. Falls back to error response
```
### Scenario 2: Comparison Query
```
Query: "Is July revenue higher than June?"
What happens:
1. Can only fetch one date range
2. Defaults to 90-day range
3. Cannot compare specific months
4. Returns generic response
```
### Scenario 3: Complex Calculation
```
Query: "What's the average price of top 10 selling products?"
What happens:
1. Fetches ALL products
2. Fetches ALL sales data
3. Token overflow
4. Even if it worked, LLM can't accurately:
- Identify top 10 by sales
- Calculate average price
```
## Performance Metrics
### Token Usage
| Query Type | Data Size | Tokens Generated | Status |
|------------|-----------|-----------------|---------|
| Simple product count | 100 products | 2,000 | ✅ Works |
| Revenue query | 1,000 orders | 15,000 | ❌ Fails |
| Inventory check | 10,000 items | 59,082 | ❌ Fails |
| Customer analysis | 5,000 customers | 35,000 | ❌ Fails |
### Response Times
| Query Type | Time | Bottleneck |
|------------|------|------------|
| Simple greeting | <1s | None |
| Single domain | 3-5s | Data transfer |
| Multiple domains | 5-10s | LLM processing |
| Large dataset | Timeout | Token overflow |
## Why Current Approach Fails
### Fundamental Design Flaw
```
Current Design:
"Fetch ALL data → Send to LLM → LLM does everything"
Problems:
1. LLM context windows are limited
2. LLMs are bad at math/counting
3. Transferring all data is inefficient
4. No ability to iterate or refine
```
### The MongoDB Challenge
```
MongoDB Aggregations Available:
- $match, $group, $sum, $avg
- $lookup for joins
- $project for field selection
Current System Uses: NONE
Result: Massive inefficiency
```
## Immediate Pain Points
1. **Production Readiness**: System fails on real data volumes
2. **User Experience**: Many queries simply don't work
3. **Resource Usage**: Wastes tokens on unnecessary data
4. **Scalability**: Gets worse as data grows
## Solutions Required
### Short-term (Without LangChain)
1. Pre-aggregate data at database level
2. Implement smart data filtering
3. Add calculation functions outside LLM
4. Chunk large datasets
### Long-term (With LangChain or similar)
1. Agent-based architecture
2. Tool-based processing
3. Multi-step reasoning
4. Proper MongoDB aggregation usage
## Conclusion
The current system works for:
- ✅ Simple queries on small datasets
- ✅ Basic questions with limited data
The current system fails for:
- ❌ Real-world data volumes
- ❌ Complex calculations
- ❌ Comparisons
- ❌ Multi-step reasoning
- ❌ Any query needing >2,048 tokens
**Critical Issue**: The system is fundamentally not scalable. As data grows, more queries will fail due to token overflow.
---
*Document generated: September 2024*
*System Status: Requires architectural changes for production use*