Parquet MCP Server

CHANGES.md•5.26 KiB

# MCP Parquet Server - Audit Log Implementation

## Summary

Implemented efficient audit log system to replace full-snapshot-on-every-write approach, reducing backup storage by 99%+ while providing complete change history and rollback capabilities.

## What Changed

### Storage Efficiency

**Before:**
- Every write operation created full parquet file copy
- 10 MB file × 100 operations = 1 GB storage
- No detailed change tracking

**After:**
- Every write operation creates ~1 KB audit log entry
- 1 KB × 100 operations = ~100 KB storage
- Complete change history with old/new values
- 99.99% storage reduction

### New Capabilities

1. **Audit Trail**: Every change tracked with timestamp, operation type, affected fields, old/new values
2. **Rollback**: Undo specific operations using audit ID (inverse operations)
3. **Change History**: View complete modification history for any record
4. **Configurable Snapshots**: Optional periodic full snapshots for additional safety

## Files Modified

### Core Implementation
- **`parquet_mcp_server.py`**: Added audit log system, rollback functionality, new tools

### New Files Created
- **`data/schemas/audit_log_schema.json`**: Schema for audit log entries
- **`AUDIT_LOG_GUIDE.md`**: Complete documentation for audit log system
- **`IMPLEMENTATION_SUMMARY.md`**: Implementation details and testing procedures
- **`test_audit_log.py`**: Automated test script
- **`CHANGES.md`**: This file

### Updated Files
- **`README.md`**: Updated with audit log documentation and configuration

## New MCP Tools

### `read_audit_log`
View audit log entries with filters:
- Filter by data_type, operation, record_id
- View complete change history
- Track who changed what when

### `rollback_operation`
Undo specific operations:
- Add → Delete record
- Update → Restore old values
- Delete → Restore record

## Configuration

### Default (Development)
```bash
# No environment variables needed
# Uses audit log only, no periodic snapshots
```

### Production
```bash
export MCP_FULL_SNAPSHOTS=true
export MCP_SNAPSHOT_FREQUENCY=weekly
```

Add to Cursor MCP config:
```json
{
  "mcpServers": {
    "parquet": {
      "command": "python",
      "args": ["/path/to/parquet_mcp_server.py"],
      "env": {
        "MCP_FULL_SNAPSHOTS": "true",
        "MCP_SNAPSHOT_FREQUENCY": "weekly"
      }
    }
  }
}
```

## Next Steps

### 1. Restart MCP Server

The MCP server must be restarted for changes to take effect:

**Option A: Restart Cursor**
- Close and reopen Cursor
- All MCP servers will restart with new code

**Option B: Check Cursor Settings**
- Look for MCP server restart option in settings

### 2. Test Implementation

After restart, run:
```bash
python3 mcp-servers/parquet/test_audit_log.py
```

Or test manually via MCP tools:

```python
# Add test record
mcp_parquet_add_record(
    data_type="beliefs",
    record={
        "belief_id": "test-123",
        "name": "Test",
        "categories": "Testing",
        "confidence_level": "High",
        "date": "2025-12-17",
        "import_date": "2025-12-17",
        "import_source_file": "test"
    }
)

# View audit log
mcp_parquet_read_audit_log(limit=10)

# Update record
mcp_parquet_update_records(
    data_type="beliefs",
    filters={"belief_id": "test-123"},
    updates={"notes": "Updated"}
)

# Rollback update
mcp_parquet_rollback_operation(audit_id="<from_update_response>")

# Clean up
mcp_parquet_delete_records(
    data_type="beliefs",
    filters={"belief_id": "test-123"}
)
```

### 3. Verify Audit Log

Check that audit log was created:
```bash
ls -lh data/logs/audit_log.parquet
```

View contents:
```python
import pandas as pd
df = pd.read_parquet("data/logs/audit_log.parquet")
print(df)
```

## Benefits Realized

1. **Storage**: 99%+ reduction in backup storage
2. **Audit Trail**: Complete change history for compliance
3. **Rollback**: Granular undo without full restore
4. **Performance**: Faster operations (no full file copy)
5. **Analysis**: Track modification patterns

## Backward Compatibility

- Existing snapshots preserved in `data/snapshots/`
- No data migration required
- System automatically uses audit log for new operations
- Old snapshots remain available for recovery

## Monitoring

### Check Audit Log Size
```bash
ls -lh data/logs/audit_log.parquet
```

### View Recent Operations
```python
mcp_parquet_read_audit_log(limit=20)
```

### Operation Breakdown
```python
df = pd.read_parquet("data/logs/audit_log.parquet")
print(df["operation"].value_counts())
print(df["data_type"].value_counts())
```

## Rollback if Needed

If issues arise, restore previous version:
```bash
git checkout HEAD~1 mcp-servers/parquet/parquet_mcp_server.py
# Restart MCP server
```

## Documentation

Complete documentation available in:
- **[AUDIT_LOG_GUIDE.md](AUDIT_LOG_GUIDE.md)** - Usage, configuration, examples
- **[IMPLEMENTATION_SUMMARY.md](IMPLEMENTATION_SUMMARY.md)** - Technical details
- **[README.md](README.md)** - Updated with audit log features

## Status

✅ Implementation complete
✅ Documentation created
✅ Tests created
⏳ Awaiting MCP server restart
⏳ Testing pending

## Questions?

Refer to:
1. **[AUDIT_LOG_GUIDE.md](AUDIT_LOG_GUIDE.md)** for usage questions
2. **[IMPLEMENTATION_SUMMARY.md](IMPLEMENTATION_SUMMARY.md)** for technical details
3. **[test_audit_log.py](test_audit_log.py)** for testing procedures

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/markmhendrickson/mcp-server-parquet'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

CHANGES.md•5.26 KiB

# MCP Parquet Server - Audit Log Implementation

## Summary

Implemented efficient audit log system to replace full-snapshot-on-every-write approach, reducing backup storage by 99%+ while providing complete change history and rollback capabilities.

## What Changed

### Storage Efficiency

**Before:**
- Every write operation created full parquet file copy
- 10 MB file × 100 operations = 1 GB storage
- No detailed change tracking

**After:**
- Every write operation creates ~1 KB audit log entry
- 1 KB × 100 operations = ~100 KB storage
- Complete change history with old/new values
- 99.99% storage reduction

### New Capabilities

1. **Audit Trail**: Every change tracked with timestamp, operation type, affected fields, old/new values
2. **Rollback**: Undo specific operations using audit ID (inverse operations)
3. **Change History**: View complete modification history for any record
4. **Configurable Snapshots**: Optional periodic full snapshots for additional safety

## Files Modified

### Core Implementation
- **`parquet_mcp_server.py`**: Added audit log system, rollback functionality, new tools

### New Files Created
- **`data/schemas/audit_log_schema.json`**: Schema for audit log entries
- **`AUDIT_LOG_GUIDE.md`**: Complete documentation for audit log system
- **`IMPLEMENTATION_SUMMARY.md`**: Implementation details and testing procedures
- **`test_audit_log.py`**: Automated test script
- **`CHANGES.md`**: This file

### Updated Files
- **`README.md`**: Updated with audit log documentation and configuration

## New MCP Tools

### `read_audit_log`
View audit log entries with filters:
- Filter by data_type, operation, record_id
- View complete change history
- Track who changed what when

### `rollback_operation`
Undo specific operations:
- Add → Delete record
- Update → Restore old values
- Delete → Restore record

## Configuration

### Default (Development)
```bash
# No environment variables needed
# Uses audit log only, no periodic snapshots
```

### Production
```bash
export MCP_FULL_SNAPSHOTS=true
export MCP_SNAPSHOT_FREQUENCY=weekly
```

Add to Cursor MCP config:
```json
{
  "mcpServers": {
    "parquet": {
      "command": "python",
      "args": ["/path/to/parquet_mcp_server.py"],
      "env": {
        "MCP_FULL_SNAPSHOTS": "true",
        "MCP_SNAPSHOT_FREQUENCY": "weekly"
      }
    }
  }
}
```

## Next Steps

### 1. Restart MCP Server

The MCP server must be restarted for changes to take effect:

**Option A: Restart Cursor**
- Close and reopen Cursor
- All MCP servers will restart with new code

**Option B: Check Cursor Settings**
- Look for MCP server restart option in settings

### 2. Test Implementation

After restart, run:
```bash
python3 mcp-servers/parquet/test_audit_log.py
```

Or test manually via MCP tools:

```python
# Add test record
mcp_parquet_add_record(
    data_type="beliefs",
    record={
        "belief_id": "test-123",
        "name": "Test",
        "categories": "Testing",
        "confidence_level": "High",
        "date": "2025-12-17",
        "import_date": "2025-12-17",
        "import_source_file": "test"
    }
)

# View audit log
mcp_parquet_read_audit_log(limit=10)

# Update record
mcp_parquet_update_records(
    data_type="beliefs",
    filters={"belief_id": "test-123"},
    updates={"notes": "Updated"}
)

# Rollback update
mcp_parquet_rollback_operation(audit_id="<from_update_response>")

# Clean up
mcp_parquet_delete_records(
    data_type="beliefs",
    filters={"belief_id": "test-123"}
)
```

### 3. Verify Audit Log

Check that audit log was created:
```bash
ls -lh data/logs/audit_log.parquet
```

View contents:
```python
import pandas as pd
df = pd.read_parquet("data/logs/audit_log.parquet")
print(df)
```

## Benefits Realized

1. **Storage**: 99%+ reduction in backup storage
2. **Audit Trail**: Complete change history for compliance
3. **Rollback**: Granular undo without full restore
4. **Performance**: Faster operations (no full file copy)
5. **Analysis**: Track modification patterns

## Backward Compatibility

- Existing snapshots preserved in `data/snapshots/`
- No data migration required
- System automatically uses audit log for new operations
- Old snapshots remain available for recovery

## Monitoring

### Check Audit Log Size
```bash
ls -lh data/logs/audit_log.parquet
```

### View Recent Operations
```python
mcp_parquet_read_audit_log(limit=20)
```

### Operation Breakdown
```python
df = pd.read_parquet("data/logs/audit_log.parquet")
print(df["operation"].value_counts())
print(df["data_type"].value_counts())
```

## Rollback if Needed

If issues arise, restore previous version:
```bash
git checkout HEAD~1 mcp-servers/parquet/parquet_mcp_server.py
# Restart MCP server
```

## Documentation

Complete documentation available in:
- **[AUDIT_LOG_GUIDE.md](AUDIT_LOG_GUIDE.md)** - Usage, configuration, examples
- **[IMPLEMENTATION_SUMMARY.md](IMPLEMENTATION_SUMMARY.md)** - Technical details
- **[README.md](README.md)** - Updated with audit log features

## Status

✅ Implementation complete
✅ Documentation created
✅ Tests created
⏳ Awaiting MCP server restart
⏳ Testing pending

## Questions?

Refer to:
1. **[AUDIT_LOG_GUIDE.md](AUDIT_LOG_GUIDE.md)** for usage questions
2. **[IMPLEMENTATION_SUMMARY.md](IMPLEMENTATION_SUMMARY.md)** for technical details
3. **[test_audit_log.py](test_audit_log.py)** for testing procedures