duplicate_detection
Detect duplicate rows in a database table by specifying columns to check for duplicates. Returns duplicate records for data cleaning.
Instructions
Detect duplicate rows in a table based on specified columns.
LEVEL: Table ↔ Column (requires table and columns parameters)
USE FOR: finding duplicates, duplicate rows, "are there duplicate emails?", detecting duplicate records, data deduplication analysis. DO NOT USE FOR: general data quality (use data_quality_report), schema structure (use get_schema), null analysis (use data_quality_report with include='nulls').
Examples: duplicate_detection(table='users', columns='email', schema='public') duplicate_detection(table='orders', columns='customer_id,product_id', schema='shipment')
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| table | Yes | Table name to check | |
| columns | Yes | Comma-separated column names (e.g., 'email,name') | |
| schema | No | Schema containing the table. REQUIRED. Use get_schema() to list available schemas. | |
| format | No | Output format: 'json' or 'markdown' | json |
| url | No | Database URL for auto-connection |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |