Log Analyzer MCP Server

FAISS_QUICKSTART.md•5.78 KiB

# Quick Start: FAISS-Enabled Log Analyzer

## Installation

1. Install dependencies:
```bash
pip install -r requirements.txt
```

This installs:
- `faiss-cpu>=1.7.4` - Vector similarity search
- `numpy>=1.24.0` - Array operations
- Other required packages

## Basic Workflow

### 1. Fetch and Chunk Logs

```python
# Process log files into chunks
fetch_local_logs(
    input_folder="./logs",  # Your log directory
    chunk_size=4096,        # Characters per chunk
    overlap=1024            # Overlap between chunks
)
```

### 2. Vectorize with FAISS

```python
# Create embeddings and build FAISS index
store_chunks_as_vectors(
    use_cache=True,      # Use embedding cache (faster)
    clear_cache=False    # Keep existing cache
)
```

**Output:**
```
✅ Vectorization complete
Total: 5,234
Cached: 2,100
Newly embedded: 3,134
Time: 45.2s
Output: ./logs/vectors_20251211_143022.json
FAISS Index: IVFFlat with 5,234 vectors
```

### 3. Query Logs (FAISS-powered)

```python
# Semantic search with FAISS
query_SFlogs(query="authentication errors in production")
```

**Benefits:**
- ✅ 30-150x faster than traditional search
- ✅ Handles millions of log entries
- ✅ Automatic fallback if FAISS unavailable

## Configuration

### Environment Variables (.env)

```bash
# FAISS Index Type
FAISS_INDEX_TYPE=IVFFlat  # Options: Flat, IVFFlat, IVFPQ, HNSW

# IVF Parameters (for IVFFlat/IVFPQ)
FAISS_NLIST=100    # Number of clusters (more = better accuracy, slower build)
FAISS_NPROBE=10    # Search clusters (more = better accuracy, slower search)

# Search Parameters
FAISS_TOP_K=150    # Maximum results to retrieve
```

### Index Types Guide

| Type | Best For | Speed | Accuracy | Memory |
|------|----------|-------|----------|--------|
| **Flat** | < 1K vectors | Slow | Exact | High |
| **IVFFlat** | 1K-100K vectors | Fast | ~99% | Medium |
| **IVFPQ** | > 100K vectors | Very Fast | ~95% | Low |
| **HNSW** | Read-heavy | Very Fast | ~99% | Medium |

**Recommendation**: Use `IVFFlat` for most cases (default)

## Performance Comparison

### Traditional Search (No FAISS)
- 10,000 chunks: ~10-15 seconds per query
- 100,000 chunks: ~100+ seconds per query
- Memory: O(n) - all vectors in memory

### FAISS Search
- 10,000 chunks: ~0.1-0.5 seconds per query
- 100,000 chunks: ~0.5-2 seconds per query
- Memory: Compressed, efficient indexing

**Speed improvement: 30-150x faster!**

## Common Use Cases

### Small Dataset (< 1K logs)
```bash
FAISS_INDEX_TYPE=Flat
```
Use exact search - no approximation needed.

### Medium Dataset (1K-100K logs)
```bash
FAISS_INDEX_TYPE=IVFFlat
FAISS_NLIST=100
FAISS_NPROBE=10
```
Default settings work well.

### Large Dataset (> 100K logs)
```bash
FAISS_INDEX_TYPE=IVFPQ
FAISS_NLIST=1000
FAISS_NPROBE=20
```
Memory-efficient with good accuracy.

### Maximum Accuracy
```bash
FAISS_INDEX_TYPE=HNSW
```
Graph-based search, excellent for production.

### Maximum Speed
```bash
FAISS_INDEX_TYPE=IVFFlat
FAISS_NLIST=500
FAISS_NPROBE=5
```
Trade accuracy for speed.

## Troubleshooting

### "FAISS not installed"
```bash
pip install faiss-cpu>=1.7.4
```

### "FAISS index not found"
Run vectorization to create index:
```python
store_chunks_as_vectors()
```

### "Out of memory"
Use compressed index:
```bash
FAISS_INDEX_TYPE=IVFPQ
```

### Slow searches
Reduce search scope:
```bash
FAISS_NPROBE=5      # Search fewer clusters
FAISS_TOP_K=50      # Return fewer results
```

### Low accuracy
Increase search scope:
```bash
FAISS_NPROBE=20     # Search more clusters
FAISS_INDEX_TYPE=HNSW  # Use more accurate index
```

## Advanced Features

### Rebuild Index
```python
# Clear cache and rebuild everything
store_chunks_as_vectors(
    use_cache=False,
    clear_cache=True
)
```

### Check Index Stats
After vectorization, check logs for:
```
✅ FAISS index saved to ./logs/faiss_index_20251211_143022
Index type: IVFFlat
Total vectors: 5,234
Dimension: 1024
```

### Multiple Indexes
The system keeps timestamped backups:
```
logs/
  faiss_index_latest.faiss      # Always use this
  faiss_index_20251211.faiss    # Backup
  faiss_index_20251210.faiss    # Backup
```

## Files Created

After vectorization:

```
logs/
├── vectors_20251211_143022.json      # Vector JSON (backward compatible)
├── faiss_index_latest.faiss          # FAISS index binary
├── faiss_index_latest.metadata       # Index metadata (pickle)
├── faiss_index_20251211_143022.faiss # Timestamped backup
└── faiss_index_20251211_143022.metadata
```

## GPU Acceleration (Optional)

For 10-100x additional speedup:

1. Install GPU version:
```bash
pip uninstall faiss-cpu
pip install faiss-gpu
```

2. No code changes needed - automatically uses GPU

3. Requires NVIDIA GPU with CUDA support

## Migration from Old System

✅ **No migration needed!**

- FAISS is automatically used when available
- Falls back to traditional search if FAISS fails
- Existing vector JSON files work as before
- No breaking changes to API

## Best Practices

1. **Build index once** - Reuse for multiple queries
2. **Use caching** - Speed up vectorization with `use_cache=True`
3. **Monitor performance** - Check logs for timing info
4. **Tune parameters** - Adjust based on dataset size
5. **Keep backups** - Timestamped indexes preserved automatically

## Performance Tips

🚀 **For Best Performance:**
- Use IVFFlat for most cases
- Increase FAISS_NPROBE for accuracy
- Use IVFPQ for large datasets
- Enable GPU if available

📊 **For Large Scale:**
- Use IVFPQ index type
- Set FAISS_NLIST=1000
- Enable compression
- Consider distributed setup

🎯 **For High Accuracy:**
- Use HNSW index
- Set FAISS_NPROBE=20
- Accept slower indexing

## Next Steps

1. Install dependencies: `pip install -r requirements.txt`
2. Configure settings in `.env`
3. Run vectorization: `store_chunks_as_vectors()`
4. Start querying: `query_SFlogs("your query")`

Enjoy 30-150x faster log analysis! 🚀

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/suriya-ML/log-checker-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

FAISS_QUICKSTART.md•5.78 KiB

# Quick Start: FAISS-Enabled Log Analyzer

## Installation

1. Install dependencies:
```bash
pip install -r requirements.txt
```

This installs:
- `faiss-cpu>=1.7.4` - Vector similarity search
- `numpy>=1.24.0` - Array operations
- Other required packages

## Basic Workflow

### 1. Fetch and Chunk Logs

```python
# Process log files into chunks
fetch_local_logs(
    input_folder="./logs",  # Your log directory
    chunk_size=4096,        # Characters per chunk
    overlap=1024            # Overlap between chunks
)
```

### 2. Vectorize with FAISS

```python
# Create embeddings and build FAISS index
store_chunks_as_vectors(
    use_cache=True,      # Use embedding cache (faster)
    clear_cache=False    # Keep existing cache
)
```

**Output:**
```
✅ Vectorization complete
Total: 5,234
Cached: 2,100
Newly embedded: 3,134
Time: 45.2s
Output: ./logs/vectors_20251211_143022.json
FAISS Index: IVFFlat with 5,234 vectors
```

### 3. Query Logs (FAISS-powered)

```python
# Semantic search with FAISS
query_SFlogs(query="authentication errors in production")
```

**Benefits:**
- ✅ 30-150x faster than traditional search
- ✅ Handles millions of log entries
- ✅ Automatic fallback if FAISS unavailable

## Configuration

### Environment Variables (.env)

```bash
# FAISS Index Type
FAISS_INDEX_TYPE=IVFFlat  # Options: Flat, IVFFlat, IVFPQ, HNSW

# IVF Parameters (for IVFFlat/IVFPQ)
FAISS_NLIST=100    # Number of clusters (more = better accuracy, slower build)
FAISS_NPROBE=10    # Search clusters (more = better accuracy, slower search)

# Search Parameters
FAISS_TOP_K=150    # Maximum results to retrieve
```

### Index Types Guide

| Type | Best For | Speed | Accuracy | Memory |
|------|----------|-------|----------|--------|
| **Flat** | < 1K vectors | Slow | Exact | High |
| **IVFFlat** | 1K-100K vectors | Fast | ~99% | Medium |
| **IVFPQ** | > 100K vectors | Very Fast | ~95% | Low |
| **HNSW** | Read-heavy | Very Fast | ~99% | Medium |

**Recommendation**: Use `IVFFlat` for most cases (default)

## Performance Comparison

### Traditional Search (No FAISS)
- 10,000 chunks: ~10-15 seconds per query
- 100,000 chunks: ~100+ seconds per query
- Memory: O(n) - all vectors in memory

### FAISS Search
- 10,000 chunks: ~0.1-0.5 seconds per query
- 100,000 chunks: ~0.5-2 seconds per query
- Memory: Compressed, efficient indexing

**Speed improvement: 30-150x faster!**

## Common Use Cases

### Small Dataset (< 1K logs)
```bash
FAISS_INDEX_TYPE=Flat
```
Use exact search - no approximation needed.

### Medium Dataset (1K-100K logs)
```bash
FAISS_INDEX_TYPE=IVFFlat
FAISS_NLIST=100
FAISS_NPROBE=10
```
Default settings work well.

### Large Dataset (> 100K logs)
```bash
FAISS_INDEX_TYPE=IVFPQ
FAISS_NLIST=1000
FAISS_NPROBE=20
```
Memory-efficient with good accuracy.

### Maximum Accuracy
```bash
FAISS_INDEX_TYPE=HNSW
```
Graph-based search, excellent for production.

### Maximum Speed
```bash
FAISS_INDEX_TYPE=IVFFlat
FAISS_NLIST=500
FAISS_NPROBE=5
```
Trade accuracy for speed.

## Troubleshooting

### "FAISS not installed"
```bash
pip install faiss-cpu>=1.7.4
```

### "FAISS index not found"
Run vectorization to create index:
```python
store_chunks_as_vectors()
```

### "Out of memory"
Use compressed index:
```bash
FAISS_INDEX_TYPE=IVFPQ
```

### Slow searches
Reduce search scope:
```bash
FAISS_NPROBE=5      # Search fewer clusters
FAISS_TOP_K=50      # Return fewer results
```

### Low accuracy
Increase search scope:
```bash
FAISS_NPROBE=20     # Search more clusters
FAISS_INDEX_TYPE=HNSW  # Use more accurate index
```

## Advanced Features

### Rebuild Index
```python
# Clear cache and rebuild everything
store_chunks_as_vectors(
    use_cache=False,
    clear_cache=True
)
```

### Check Index Stats
After vectorization, check logs for:
```
✅ FAISS index saved to ./logs/faiss_index_20251211_143022
Index type: IVFFlat
Total vectors: 5,234
Dimension: 1024
```

### Multiple Indexes
The system keeps timestamped backups:
```
logs/
  faiss_index_latest.faiss      # Always use this
  faiss_index_20251211.faiss    # Backup
  faiss_index_20251210.faiss    # Backup
```

## Files Created

After vectorization:

```
logs/
├── vectors_20251211_143022.json      # Vector JSON (backward compatible)
├── faiss_index_latest.faiss          # FAISS index binary
├── faiss_index_latest.metadata       # Index metadata (pickle)
├── faiss_index_20251211_143022.faiss # Timestamped backup
└── faiss_index_20251211_143022.metadata
```

## GPU Acceleration (Optional)

For 10-100x additional speedup:

1. Install GPU version:
```bash
pip uninstall faiss-cpu
pip install faiss-gpu
```

2. No code changes needed - automatically uses GPU

3. Requires NVIDIA GPU with CUDA support

## Migration from Old System

✅ **No migration needed!**

- FAISS is automatically used when available
- Falls back to traditional search if FAISS fails
- Existing vector JSON files work as before
- No breaking changes to API

## Best Practices

1. **Build index once** - Reuse for multiple queries
2. **Use caching** - Speed up vectorization with `use_cache=True`
3. **Monitor performance** - Check logs for timing info
4. **Tune parameters** - Adjust based on dataset size
5. **Keep backups** - Timestamped indexes preserved automatically

## Performance Tips

🚀 **For Best Performance:**
- Use IVFFlat for most cases
- Increase FAISS_NPROBE for accuracy
- Use IVFPQ for large datasets
- Enable GPU if available

📊 **For Large Scale:**
- Use IVFPQ index type
- Set FAISS_NLIST=1000
- Enable compression
- Consider distributed setup

🎯 **For High Accuracy:**
- Use HNSW index
- Set FAISS_NPROBE=20
- Accept slower indexing

## Next Steps

1. Install dependencies: `pip install -r requirements.txt`
2. Configure settings in `.env`
3. Run vectorization: `store_chunks_as_vectors()`
4. Start querying: `query_SFlogs("your query")`

Enjoy 30-150x faster log analysis! 🚀