# Existing Duplicates Scanner
## Problem Statement
The user discovered that duplicate entries already existed in the database (from imports before conflict detection was implemented). These existing duplicates were not being reported or flagged since the conflict detection only worked during new imports.
**Example:**
```
31 Dec 2025 N/A Book Voucher ₹659.94 - ₹170.54
31 Dec 2025 N/A Book Voucher ₹659.94 - ₹170.54
```
These two identical ledger entries existed in account_id=1 but were not showing up in the conflicts page.
## Solution
Created a database scanner that finds existing duplicate entries and creates conflict records for them, allowing users to review and clean them up.
## Implementation
### 1. New API Endpoint: Scan for Existing Duplicates
**File:** `equity/app/api/conflicts/scan-existing/route.ts`
**Features:**
- Scans tradebook or ledger tables for duplicate entries
- Groups duplicates and creates conflict records
- Avoids creating duplicate conflict records
- Supports both exact duplicates and data mismatches
**Duplicate Detection Logic:**
#### Tradebook
```sql
SELECT trade_id, symbol, COUNT(*) as count
FROM trades
WHERE account_id = ?
AND trade_id IS NOT NULL
GROUP BY trade_id, symbol
HAVING count > 1
```
#### Ledger
```sql
SELECT posting_date, particular, debit, credit, COUNT(*) as count
FROM ledger
WHERE account_id = ?
GROUP BY posting_date, particular, debit, credit
HAVING count > 1
```
**Conflict Types Generated:**
- `exact_duplicate_existing` - Exact match between multiple DB entries
- `duplicate_trade_id_existing` - Same trade_id but different data
### 2. Enhanced Conflicts Page UI
**File:** `equity/app/conflicts/page.tsx`
**New Features:**
#### A. Scan Section
```
┌────────────────────────────────────────────────┐
│ Scan Database for Duplicates │
│ │
│ [Select Account ▼] [Select Type ▼] [Scan] │
└────────────────────────────────────────────────┘
```
- Dropdown to select account
- Dropdown to select import type (Tradebook/Ledger)
- Scan button triggers the duplicate detection
- Shows success/error messages after scan
#### B. Different UI for Existing Duplicates
**Color Coding:**
- **Yellow Box** = First Entry (Keep this one)
- **Red Box** = Duplicate Entry (Delete this one)
- **Blue Icon** = Exact duplicate
- **Database IDs** shown for transparency
**Button Labels:**
- Import Conflicts: "Keep Existing" vs "Use New (CSV)"
- Existing Duplicates: "Keep First Entry" vs "Delete Duplicate"
**Visual Example:**
```
┌────────────────────────────────────────────────────────────┐
│ 🔵 Ledger Conflict │
│ Account: HDFC • Type: Exact Duplicate (Already in DB) │
├────────────────────────────────────────────────────────────┤
│ ⚠️ Exact Duplicate Entry │
│ This entry exists multiple times in your database. │
│ Found during database scan. │
├──────────────────────┬─────────────────────────────────────┤
│ First Entry (Keep) │ Duplicate Entry (Delete) │
│ [Yellow Background] │ [Red Background] │
├──────────────────────┼─────────────────────────────────────┤
│ Date: 31 Dec 2025 │ Date: 31 Dec 2025 │
│ Particular: N/A │ Particular: N/A │
│ Debit: ₹659.94 │ Debit: ₹659.94 │
│ Credit: ₹0 │ Credit: ₹0 │
│ Balance: ₹170.54 │ Balance: ₹170.54 │
│ Database ID: 1234 │ Database ID: 5678 │
└──────────────────────┴─────────────────────────────────────┘
│ 💡 Recommendation: Click "Delete Duplicate" to clean up. │
├────────────────────────────────────────────────────────────┤
│ [Keep First Entry] [Delete Duplicate] [Ignore] [Delete..] │
└────────────────────────────────────────────────────────────┘
```
### 3. Updated Conflict Resolution
**File:** `equity/app/api/conflicts/[id]/route.ts`
**Logic Change:**
- **Import Conflicts** (`use_new`): Updates existing record with CSV data
- **Existing Duplicates** (`use_new`): Deletes the duplicate entry
```typescript
if (isExistingDuplicate) {
// Delete the duplicate entry (stored in new_data)
DELETE FROM trades/ledger WHERE id = newData.id
} else {
// Update with CSV data (normal import conflict)
UPDATE trades/ledger SET ... WHERE ...
}
```
## User Workflow
### Step 1: Scan for Duplicates
1. Go to `/conflicts` page
2. Select account from dropdown (e.g., "HDFC Bank")
3. Select type (Tradebook or Ledger)
4. Click "Scan for Duplicates"
5. See success message: "Scan complete. Found 5 duplicate(s)"
### Step 2: Review Duplicates
Each duplicate is shown as a conflict with:
- Clear indication it's an "Exact Duplicate (Already in DB)"
- Side-by-side comparison with database IDs
- Yellow box = Keep, Red box = Delete
- Recommendation to delete duplicate
### Step 3: Resolve Duplicates
**Options:**
1. **Delete Duplicate** (Recommended)
- Removes the duplicate entry from database
- Keeps the first entry
2. **Keep First Entry**
- Does nothing to database
- Just marks conflict as resolved
3. **Ignore**
- Skips this conflict
- Leaves both entries in database
4. **Delete Conflict**
- Removes conflict record only
- Does not touch database entries
## Benefits
### 1. Historical Cleanup
Users can now clean up duplicates that slipped through before conflict detection was implemented.
### 2. Full Database Integrity
Scan entire accounts to ensure no hidden duplicates exist.
### 3. Transparency
Shows exact database IDs so users know which specific entries are being kept/deleted.
### 4. Safe Operation
- First entry is always kept by default
- Clear visual indicators (yellow vs red)
- Confirmation before deletion
- Can ignore if unsure
## Technical Details
### Conflict Type Naming Convention
- `exact_duplicate` - Import conflict (CSV vs DB)
- `exact_duplicate_existing` - DB scan conflict (DB vs DB)
- `_existing` suffix indicates it was found via scan
### Duplicate Detection Criteria
**Tradebook:**
- Same `account_id`
- Same `trade_id`
- Optionally same `symbol`
**Ledger:**
- Same `account_id`
- Same `posting_date`
- Same `particular`
- Same `debit`
- Same `credit`
### Performance Considerations
- Scans one account at a time
- Uses SQL GROUP BY for efficiency
- Only creates conflicts that don't already exist
- Indexed columns used for grouping
## Testing Guide
### Test Case 1: Find Ledger Duplicates
```bash
# 1. Insert duplicate ledger entries manually
INSERT INTO ledger (account_id, posting_date, particular, debit, credit, net_balance)
VALUES (1, '2025-12-31', 'N/A', 659.94, 0, 170.54);
INSERT INTO ledger (account_id, posting_date, particular, debit, credit, net_balance)
VALUES (1, '2025-12-31', 'N/A', 659.94, 0, 170.54);
# 2. Go to /conflicts page
# 3. Select account 1, type "Ledger"
# 4. Click "Scan for Duplicates"
# 5. Should show 1 conflict found
# 6. Review and click "Delete Duplicate"
# 7. Verify only one entry remains in database
```
### Test Case 2: Find Tradebook Duplicates
```bash
# 1. Insert duplicate trades with same trade_id
INSERT INTO trades (account_id, trade_id, symbol, quantity, price, trade_date, trade_type)
VALUES (1, 'T123', 'RELIANCE', 10, 2450.50, '2026-01-10', 'buy');
INSERT INTO trades (account_id, trade_id, symbol, quantity, price, trade_date, trade_type)
VALUES (1, 'T123', 'RELIANCE', 10, 2450.50, '2026-01-10', 'buy');
# 2. Scan for duplicates
# 3. Should find and create conflict
```
### Test Case 3: No Duplicates Found
```bash
# 1. Scan an account with no duplicates
# 2. Should show "Scan complete. Found 0 duplicate(s)"
# 3. No conflicts should be added
```
## Migration Notes
**Backward Compatible:** ✅
- No database schema changes
- Existing conflicts still work
- New conflict types added
- No migration needed
## Related Files
- `/equity/app/api/conflicts/scan-existing/route.ts` - Scanner API
- `/equity/app/conflicts/page.tsx` - UI with scan feature
- `/equity/app/api/conflicts/[id]/route.ts` - Updated resolution logic
## Future Enhancements
1. **Auto-scan on page load** - Automatically check for duplicates
2. **Bulk resolution** - Delete all exact duplicates at once
3. **Scheduled scans** - Periodic background scanning
4. **Export duplicates** - Download CSV of all duplicates found
5. **Duplicate prevention** - Add database constraints to prevent future duplicates
6. **Analytics** - Dashboard showing duplicate statistics over time