Skip to main content
Glama
AUDIT_LOG_GUIDE.md5.82 kB
# Audit Log Guide ## Overview The MCP Parquet Server now uses an efficient **audit log** approach for tracking data changes, replacing the previous full-snapshot-on-every-write strategy. ## Key Changes ### Before (Full Snapshots) - Every add/update/delete operation created a complete copy of the parquet file - 10 MB file × 100 operations = 1 GB storage - No detailed change history - Difficult to track what changed ### After (Audit Log + Optional Snapshots) - Every operation creates a lightweight audit log entry (~1 KB) - 1 KB × 100 operations = ~100 KB storage (99%+ reduction) - Complete audit trail with old/new values - Easy rollback with inverse operations - Optional periodic full snapshots for safety ## Storage Efficiency | Dataset Size | Operations | Full Snapshots | Audit Log | Savings | |--------------|------------|----------------|-----------|---------| | 10 MB | 100 | 1 GB | ~100 KB | 99.99% | | 10 MB | 1,000 | 10 GB | ~1 MB | 99.99% | | 100 MB | 100 | 10 GB | ~100 KB | 99.999% | ## Audit Log Schema Location: `data/logs/audit_log.parquet` Fields: - `audit_id`: Unique identifier for each operation - `timestamp`: When the operation occurred - `operation`: Type (add, update, delete) - `data_type`: Which parquet file was modified - `record_id`: Identifier of affected record - `affected_fields`: JSON array of modified field names - `old_values`: JSON object with field:value pairs before change - `new_values`: JSON object with field:value pairs after change - `user`: Who performed the operation - `snapshot_reference`: Link to periodic snapshot (if created) - `notes`: Additional context ## Configuration ### Environment Variables ```bash # Enable periodic full snapshots (default: false) export MCP_FULL_SNAPSHOTS=true # Snapshot frequency: daily, weekly, monthly, never (default: weekly) export MCP_SNAPSHOT_FREQUENCY=weekly ``` ### Recommended Settings **Development/Testing:** ```bash export MCP_FULL_SNAPSHOTS=false export MCP_SNAPSHOT_FREQUENCY=never ``` **Production:** ```bash export MCP_FULL_SNAPSHOTS=true export MCP_SNAPSHOT_FREQUENCY=weekly ``` ## New Tools ### 1. `read_audit_log` View audit log entries with optional filters. **Parameters:** - `data_type` (optional): Filter by data type - `operation` (optional): Filter by operation (add/update/delete) - `record_id` (optional): Filter by specific record - `limit` (default: 100): Maximum entries to return **Example:** ```json { "data_type": "beliefs", "operation": "update", "limit": 50 } ``` ### 2. `rollback_operation` Rollback a specific operation using its audit ID. **Parameters:** - `audit_id` (required): The audit ID to rollback **Rollback Logic:** - `add` → Delete the record - `update` → Restore old values - `delete` → Restore the record **Example:** ```json { "audit_id": "abc123def456" } ``` ## Usage Examples ### Track Changes to a Record ```python # 1. Read audit log for specific record mcp_parquet_read_audit_log( data_type="transactions", record_id="txn_123" ) # Shows complete history: # - When added # - All updates with old/new values # - If/when deleted ``` ### Rollback Recent Change ```python # 1. Find recent operations mcp_parquet_read_audit_log( data_type="fixed_costs", limit=10 ) # 2. Rollback specific operation mcp_parquet_rollback_operation( audit_id="abc123def456" ) ``` ### Audit Trail Analysis ```python # View all delete operations across all data types mcp_parquet_read_audit_log( operation="delete", limit=100 ) # Find all changes in last period (filter client-side by timestamp) ``` ## Benefits 1. **Storage Efficiency**: 99%+ reduction in backup storage 2. **Audit Trail**: Complete history of who changed what when 3. **Granular Rollback**: Undo specific operations without full restore 4. **Compliance**: Financial data audit requirements 5. **Analysis**: Track patterns in data modifications 6. **Performance**: Faster operations (no full file copy) ## Backup Strategy ### Audit Log Only (Default) - Best for: Development, frequent changes, storage-constrained environments - Recovery: Rollback operations via audit log - Risk: Cannot recover from audit log corruption ### Hybrid (Recommended) - Best for: Production, critical data - Full snapshots: Weekly (configurable) - Audit log: Every operation - Recovery: Restore from snapshot + replay audit log since snapshot - Risk: Minimal (multiple recovery paths) ## Migration from Old Approach Existing snapshots in `data/snapshots/` are preserved and remain accessible. The new audit log system works alongside them. **No action required** - the system automatically starts using audit log for new operations. ## Monitoring Check audit log size: ```bash ls -lh data/logs/audit_log.parquet ``` View recent operations: ```python mcp_parquet_read_audit_log(limit=20) ``` Count operations by type: ```python df = pd.read_parquet("data/logs/audit_log.parquet") print(df["operation"].value_counts()) ``` ## Troubleshooting ### Audit Log Not Created Check: 1. `data/logs/` directory exists and is writable 2. Python has permissions to write files 3. No schema conflicts with `audit_log_schema.json` ### Rollback Fails Common causes: 1. Record ID changed after original operation 2. `old_values` missing from audit entry (shouldn't happen with new system) 3. Data type schema changed since operation ### Large Audit Log File The audit log grows over time. To archive: ```bash # Move to archive mv data/logs/audit_log.parquet data/logs/audit_log_2025_archive.parquet # System will create fresh audit log on next operation ``` ## Future Enhancements Potential improvements: - Automatic audit log rotation/archiving - Batch rollback (multiple operations) - Audit log compression - Change replay for testing - Diff visualization tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/markmhendrickson/mcp-server-parquet'

If you have feedback or need assistance with the MCP directory API, please join our Discord server