# Database Documentation
This document provides comprehensive information about the MCP Sigmund database setup, schema, and import procedures.
## Database Setup
### Database Dump
This repository includes a complete PostgreSQL database dump (`mcp-sigmund-dump.sql`) that contains:
- **Complete Schema** - All tables, indexes, constraints, and sequences
- **Sample Data** - Realistic financial data for testing and development
- **All Relationships** - Foreign key constraints and referential integrity
- **Performance Indexes** - Optimized indexes for common queries
### Import Instructions
**Prerequisites:**
- PostgreSQL 12+ installed and running
- `psql` command-line tool available
**Import Steps:**
1. **Create the database:**
```bash
createdb mcp-sigmund
```
2. **Import the dump:**
```bash
psql -d mcp-sigmund -f mcp-sigmund-dump.sql
```
3. **Verify the import:**
```bash
psql -d mcp-sigmund -c "\dt"
```
**Alternative Import Methods:**
- **With specific connection string:**
```bash
psql postgresql://username:password@host:port/mcp-sigmund -f mcp-sigmund-dump.sql
```
- **With custom database name:**
```bash
createdb your-database-name
psql -d your-database-name -f mcp-sigmund-dump.sql
```
**Troubleshooting:**
- If you get permission errors, ensure your PostgreSQL user has `CREATEDB` privileges
- If the import fails, check that PostgreSQL is running and accessible
- The dump includes `--clean` flag, so it will drop existing objects before recreating them
## Database Schema
The PostgreSQL database contains normalized financial data from multiple Open Banking sources (directly from banks or via account aggregators like TrueLayer and Tink). The schema is designed for efficient querying and analysis while maintaining data integrity and flexibility.
Database population is not subject to this project.
### Core Tables
#### `transactions` Table
Stores all financial transactions with comprehensive metadata:
```sql
CREATE TABLE transactions (
id VARCHAR(255) PRIMARY KEY, -- Unique transaction identifier
account_id VARCHAR(255) NOT NULL, -- Reference to accounts table
user_id VARCHAR(255) NOT NULL, -- User identifier
provider_id VARCHAR(255) NOT NULL, -- Banking provider identifier
external_id VARCHAR(255), -- External provider transaction ID
amount DECIMAL(15,2) NOT NULL, -- Transaction amount (positive for credits, negative for debits)
currency VARCHAR(3), -- ISO 4217 currency code (e.g., 'EUR', 'GBP', 'USD')
date DATE NOT NULL, -- Transaction date
booking_date DATE, -- Booking date (may differ from transaction date)
description TEXT, -- Original transaction description
enhanced_description TEXT, -- AI-enhanced or cleaned description
counterparty_name VARCHAR(255), -- Name of the counterparty
counterparty_iban VARCHAR(34), -- IBAN of the counterparty
counterparty_bic VARCHAR(11), -- BIC/SWIFT code of the counterparty
category VARCHAR(100), -- Primary transaction category
subcategory VARCHAR(100), -- Secondary transaction category
transaction_type VARCHAR(50), -- Type: 'DEBIT', 'CREDIT', 'TRANSFER', etc.
transaction_method VARCHAR(50), -- Method: 'CARD', 'TRANSFER', 'DIRECT_DEBIT', etc.
status VARCHAR(50) DEFAULT 'posted', -- Status: 'posted', 'pending', 'cancelled'
transaction_code VARCHAR(20), -- Bank-specific transaction code
domain_code VARCHAR(20), -- PSD2 domain code
family_code VARCHAR(20), -- PSD2 family code
subfamily_code VARCHAR(20), -- PSD2 subfamily code
reference VARCHAR(255), -- Payment reference
end_to_end_id VARCHAR(255), -- End-to-end identification
mandate_id VARCHAR(255), -- Direct debit mandate ID
creditor_id VARCHAR(255), -- Creditor identifier
payment_id VARCHAR(255), -- Payment identifier
enrichment_confidence DECIMAL(3,2), -- Confidence score for AI enrichment (0.00-1.00)
metadata JSONB, -- Additional provider-specific metadata
raw_data JSONB, -- Original raw data from provider
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
-- Foreign key constraints
FOREIGN KEY (account_id) REFERENCES accounts(id),
FOREIGN KEY (provider_id) REFERENCES providers(id),
FOREIGN KEY (user_id) REFERENCES users(id)
);
```
**Key Features:**
- **Amount Convention**: Positive for credits (money in), negative for debits (money out)
- **Multi-Currency Support**: ISO 4217 currency codes with proper decimal precision
- **Rich Counterparty Data**: IBAN, BIC, and name for comprehensive transaction tracking
- **Multi-Level Categorization**: Primary and secondary categories for flexible analysis
- **PSD2 Compliance**: Standardized transaction codes for European banking
- **AI Enhancement**: Confidence-scored enriched descriptions
- **Flexible Metadata**: JSONB fields for provider-specific data
#### `accounts` Table
Stores account information with comprehensive banking details:
```sql
CREATE TABLE accounts (
id VARCHAR(255) PRIMARY KEY, -- Unique account identifier
user_id VARCHAR(255) NOT NULL, -- Reference to users table
provider_id VARCHAR(255) NOT NULL, -- Banking provider identifier
external_id VARCHAR(255), -- External provider account ID
iban VARCHAR(34), -- International Bank Account Number
bic VARCHAR(11), -- Bank Identifier Code (SWIFT)
account_number VARCHAR(50), -- Local account number
sort_code VARCHAR(20), -- UK sort code
routing_number VARCHAR(20), -- US routing number
bsb VARCHAR(20), -- Australian BSB code
account_type VARCHAR(50), -- Type: 'CURRENT', 'SAVINGS', 'CREDIT_CARD', etc.
account_subtype VARCHAR(50), -- Subtype: 'BUSINESS', 'PERSONAL', 'STUDENT', etc.
display_name VARCHAR(255), -- User-friendly account name
product_name VARCHAR(255), -- Bank product name
product_description TEXT, -- Detailed product description
currency VARCHAR(3), -- ISO 4217 currency code
current_balance DECIMAL(15,2), -- Current account balance
available_balance DECIMAL(15,2), -- Available balance (excluding pending)
pending_balance DECIMAL(15,2), -- Pending transactions balance
cleared_balance DECIMAL(15,2), -- Cleared balance
overdraft_limit DECIMAL(15,2), -- Overdraft limit
credit_limit DECIMAL(15,2), -- Credit limit
amount_due DECIMAL(15,2), -- Amount due (for credit accounts)
minimum_payment DECIMAL(15,2), -- Minimum payment required
status VARCHAR(50) DEFAULT 'active', -- Account status: 'active', 'closed', 'suspended'
opening_date DATE, -- Account opening date
closure_date DATE, -- Account closure date
institution_name VARCHAR(255), -- Financial institution name
institution_bic VARCHAR(11), -- Institution BIC code
interest_rate DECIMAL(5,2), -- Interest rate percentage
interest_type VARCHAR(50), -- Interest type: 'FIXED', 'VARIABLE'
accrued_interest DECIMAL(15,2), -- Accrued interest amount
masked_pan VARCHAR(20), -- Masked Primary Account Number
card_expiry_date VARCHAR(10), -- Card expiry date (MM/YY)
metadata JSONB, -- Additional provider-specific metadata
raw_data JSONB, -- Original raw data from provider
last_synced_at TIMESTAMP, -- Last synchronization timestamp
last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
-- Foreign key constraints
FOREIGN KEY (provider_id) REFERENCES providers(id),
FOREIGN KEY (user_id) REFERENCES users(id)
);
```
**Key Features:**
- **International Banking**: Support for IBAN, BIC, sort codes, routing numbers, BSB
- **Multiple Balance Types**: Current, available, pending, and cleared balances
- **Account Lifecycle**: Opening and closure dates with status tracking
- **Credit Account Support**: Credit limits, amounts due, minimum payments
- **Interest Tracking**: Rates, types, and accrued interest
- **Card Information**: Masked PAN and expiry dates for card accounts
- **Sync Monitoring**: Last synchronization timestamps
#### `providers` Table
Stores banking provider information:
```sql
CREATE TABLE providers (
id VARCHAR(255) PRIMARY KEY, -- Unique provider identifier
name VARCHAR(255) NOT NULL, -- Provider display name
display_name VARCHAR(255), -- User-friendly display name
country_code VARCHAR(2), -- ISO 3166-1 alpha-2 country code
logo_url VARCHAR(500), -- Provider logo URL
website_url VARCHAR(500), -- Provider website URL
api_base_url VARCHAR(500), -- API base URL
supported_features JSONB, -- Supported API features
rate_limits JSONB, -- API rate limiting information
metadata JSONB, -- Additional provider metadata
raw_data JSONB, -- Original raw data from provider
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```
**Key Features:**
- **Multi-Country Support**: Country-specific provider information
- **API Integration**: Base URLs and supported features
- **Rate Limiting**: API rate limit configuration
- **Branding**: Logo and website URLs for UI integration
#### `users` Table
Stores user information and preferences:
```sql
CREATE TABLE users (
id VARCHAR(255) PRIMARY KEY, -- Unique user identifier
provider_id VARCHAR(255) NOT NULL, -- Reference to providers table
external_id VARCHAR(255), -- External provider user ID
email VARCHAR(255), -- User email address
phone VARCHAR(20), -- User phone number
status VARCHAR(50) DEFAULT 'active', -- User status: 'active', 'inactive', 'suspended'
preferences JSONB, -- User preferences and settings
metadata JSONB, -- Additional user metadata
raw_data JSONB, -- Original raw data from provider
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
-- Foreign key constraints
FOREIGN KEY (provider_id) REFERENCES providers(id)
);
```
**Key Features:**
- **User Preferences**: JSONB field for flexible preference storage
- **Status Management**: User account status tracking
- **Provider Integration**: Links to specific banking providers
### Additional Tables
#### `balance_history` Table
Tracks historical balance changes:
```sql
CREATE TABLE balance_history (
id SERIAL PRIMARY KEY,
account_id VARCHAR(255) NOT NULL,
user_id VARCHAR(255) NOT NULL,
provider_id VARCHAR(255) NOT NULL,
balance DECIMAL(15,2) NOT NULL,
balance_type VARCHAR(50) NOT NULL, -- 'current', 'available', 'pending'
snapshot_date DATE NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (account_id) REFERENCES accounts(id),
FOREIGN KEY (user_id) REFERENCES users(id),
FOREIGN KEY (provider_id) REFERENCES providers(id)
);
```
#### `categories` Table
Hierarchical transaction categorization:
```sql
CREATE TABLE categories (
id VARCHAR(255) PRIMARY KEY,
name VARCHAR(255) NOT NULL,
parent_id VARCHAR(255),
level INTEGER NOT NULL,
description TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (parent_id) REFERENCES categories(id)
);
```
#### `merchant_mappings` Table
Merchant name standardization:
```sql
CREATE TABLE merchant_mappings (
id VARCHAR(255) PRIMARY KEY,
original_name VARCHAR(255) NOT NULL,
standardized_name VARCHAR(255) NOT NULL,
category VARCHAR(100),
confidence DECIMAL(3,2),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```
#### `transaction_patterns` Table
AI-detected transaction patterns:
```sql
CREATE TABLE transaction_patterns (
id VARCHAR(255) PRIMARY KEY,
user_id VARCHAR(255) NOT NULL,
pattern_type VARCHAR(100) NOT NULL,
pattern_data JSONB NOT NULL,
confidence DECIMAL(3,2),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(id)
);
```
#### `sync_log` Table
Data synchronization tracking:
```sql
CREATE TABLE sync_log (
id SERIAL PRIMARY KEY,
provider_id VARCHAR(255) NOT NULL,
sync_type VARCHAR(50) NOT NULL,
status VARCHAR(50) NOT NULL,
records_processed INTEGER,
records_created INTEGER,
records_updated INTEGER,
error_message TEXT,
started_at TIMESTAMP NOT NULL,
completed_at TIMESTAMP,
FOREIGN KEY (provider_id) REFERENCES providers(id)
);
```
#### `cache_metadata` Table
Caching system metadata:
```sql
CREATE TABLE cache_metadata (
id VARCHAR(255) PRIMARY KEY,
cache_key VARCHAR(500) NOT NULL,
cache_type VARCHAR(100) NOT NULL,
expires_at TIMESTAMP NOT NULL,
data_size INTEGER,
hit_count INTEGER DEFAULT 0,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```
## Indexes and Performance
The database includes comprehensive indexing for optimal query performance:
```sql
-- Transaction indexes
CREATE INDEX idx_transactions_account_date ON transactions(account_id, date DESC);
CREATE INDEX idx_transactions_user_id ON transactions(user_id);
CREATE INDEX idx_transactions_provider_id ON transactions(provider_id);
CREATE INDEX idx_transactions_date ON transactions(date);
CREATE INDEX idx_transactions_amount ON transactions(amount);
CREATE INDEX idx_transactions_category ON transactions(category);
CREATE INDEX idx_transactions_counterparty ON transactions(counterparty_name);
-- Account indexes
CREATE INDEX idx_accounts_provider_id ON accounts(provider_id);
CREATE INDEX idx_accounts_user_id ON accounts(user_id);
CREATE INDEX idx_accounts_iban ON accounts(iban);
-- Balance history indexes
CREATE INDEX idx_balances_account_id ON balance_history(account_id);
CREATE INDEX idx_balances_snapshot_date ON balance_history(snapshot_date);
-- Cache indexes
CREATE INDEX idx_cache_expires ON cache_metadata(expires_at);
```
## Data Relationships
```mermaid
erDiagram
USERS ||--o{ ACCOUNTS : owns
USERS ||--o{ TRANSACTIONS : has
PROVIDERS ||--o{ USERS : manages
PROVIDERS ||--o{ ACCOUNTS : provides
PROVIDERS ||--o{ TRANSACTIONS : sources
ACCOUNTS ||--o{ TRANSACTIONS : contains
ACCOUNTS ||--o{ BALANCE_HISTORY : tracks
USERS ||--o{ BALANCE_HISTORY : monitors
PROVIDERS ||--o{ BALANCE_HISTORY : sources
USERS ||--o{ TRANSACTION_PATTERNS : generates
PROVIDERS ||--o{ SYNC_LOG : logs
CATEGORIES ||--o{ CATEGORIES : parent_of
USERS {
varchar id PK
varchar provider_id FK
varchar external_id
varchar email
varchar phone
varchar status
jsonb preferences
jsonb metadata
timestamp created_at
timestamp updated_at
}
ACCOUNTS {
varchar id PK
varchar user_id FK
varchar provider_id FK
varchar external_id
varchar iban
varchar bic
varchar account_number
varchar sort_code
varchar routing_number
varchar bsb
varchar account_type
varchar account_subtype
varchar display_name
varchar product_name
text product_description
varchar currency
decimal current_balance
decimal available_balance
decimal pending_balance
decimal cleared_balance
decimal overdraft_limit
decimal credit_limit
decimal amount_due
decimal minimum_payment
varchar status
date opening_date
date closure_date
varchar institution_name
varchar institution_bic
decimal interest_rate
varchar interest_type
decimal accrued_interest
varchar masked_pan
varchar card_expiry_date
jsonb metadata
jsonb raw_data
timestamp last_synced_at
timestamp last_updated
timestamp created_at
timestamp updated_at
}
TRANSACTIONS {
varchar id PK
varchar account_id FK
varchar user_id FK
varchar provider_id FK
varchar external_id
decimal amount
varchar currency
date date
date booking_date
text description
text enhanced_description
varchar counterparty_name
varchar counterparty_iban
varchar counterparty_bic
varchar category
varchar subcategory
varchar transaction_type
varchar transaction_method
varchar status
varchar transaction_code
varchar domain_code
varchar family_code
varchar subfamily_code
varchar reference
varchar end_to_end_id
varchar mandate_id
varchar creditor_id
varchar payment_id
decimal enrichment_confidence
jsonb metadata
jsonb raw_data
timestamp created_at
timestamp updated_at
}
PROVIDERS {
varchar id PK
varchar name
varchar display_name
varchar country_code
varchar logo_url
varchar website_url
varchar api_base_url
jsonb supported_features
jsonb rate_limits
jsonb metadata
jsonb raw_data
timestamp created_at
timestamp updated_at
}
BALANCE_HISTORY {
serial id PK
varchar account_id FK
varchar user_id FK
varchar provider_id FK
decimal balance
varchar balance_type
date snapshot_date
timestamp created_at
}
CATEGORIES {
varchar id PK
varchar name
varchar parent_id FK
integer level
text description
timestamp created_at
}
MERCHANT_MAPPINGS {
varchar id PK
varchar original_name
varchar standardized_name
varchar category
decimal confidence
timestamp created_at
}
TRANSACTION_PATTERNS {
varchar id PK
varchar user_id FK
varchar pattern_type
jsonb pattern_data
decimal confidence
timestamp created_at
}
SYNC_LOG {
serial id PK
varchar provider_id FK
varchar sync_type
varchar status
integer records_processed
integer records_created
integer records_updated
text error_message
timestamp started_at
timestamp completed_at
}
CACHE_METADATA {
varchar id PK
varchar cache_key
varchar cache_type
timestamp expires_at
integer data_size
integer hit_count
timestamp created_at
}
```
## Enhanced Features
### JSONB Metadata Examples
**Transaction Metadata:**
```json
{
"merchant": {
"name": "AMAZON UK",
"category": "ONLINE_SHOPPING",
"location": {
"country": "GB",
"city": "London"
}
},
"card": {
"last_four": "1234",
"type": "DEBIT"
},
"ai_analysis": {
"confidence": 0.95,
"suggested_category": "SHOPPING",
"keywords": ["amazon", "online", "purchase"]
}
}
```
**Account Metadata:**
```json
{
"features": {
"overdraft_available": true,
"online_banking": true,
"mobile_app": true
},
"limits": {
"daily_withdrawal": 500.00,
"daily_transfer": 1000.00
},
"banking_details": {
"branch_code": "123456",
"branch_name": "Main Street Branch"
}
}
```
### Data Validation and Constraints
The schema includes several data validation features:
- **Currency Codes**: Validated against ISO 4217 standard
- **IBAN Format**: Proper IBAN structure validation
- **BIC Codes**: SWIFT BIC format validation
- **Date Constraints**: Logical date ordering (opening_date < closure_date)
- **Balance Constraints**: Non-negative limits and positive interest rates
- **Status Enums**: Controlled vocabulary for status fields
### Security Considerations
- **Sensitive Data**: Account numbers, IBANs, and personal information are stored securely
- **Data Retention**: Configurable retention policies for historical data
- **Access Control**: Row-level security can be implemented based on user_id
- **Audit Trail**: Comprehensive created_at/updated_at timestamps
- **Data Encryption**: Sensitive fields can be encrypted at the application level
### Query Optimization
The database is optimized for common financial queries:
- **Time-based Analysis**: Efficient date range queries with proper indexing
- **User-specific Data**: Fast user_id-based filtering
- **Provider Aggregation**: Optimized provider-based queries
- **Balance Calculations**: Efficient balance history queries
- **Transaction Search**: Full-text search capabilities on descriptions
### Data Migration and Maintenance
- **Schema Versioning**: Migration scripts for schema updates
- **Data Archiving**: Automated archiving of old transaction data
- **Index Maintenance**: Regular index rebuilding and statistics updates
- **Backup Strategy**: Point-in-time recovery capabilities
- **Performance Monitoring**: Query performance tracking and optimization