sample_data
Retrieve random document samples from MongoDB collections for data exploration, testing, and analysis in JSON or CSV formats.
Instructions
Get a random sample of documents from a collection.
Supports both JSON and CSV output formats:
Use outputFormat="json" for standard JSON (default)
Use outputFormat="csv" for comma-separated values export
Useful for:
Exploratory data analysis
Testing with representative data
Understanding data distribution
Performance testing with realistic data subsets
Example - JSON Sample: use_mcp_tool with server_name: "mongodb", tool_name: "sample_data", arguments: { "collection": "users", "size": 50 }
Example - CSV Export: use_mcp_tool with server_name: "mongodb", tool_name: "sample_data", arguments: { "collection": "users", "size": 100, "outputFormat": "csv", "formatOptions": { "includeHeaders": true, "delimiter": "," } }
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| database | No | Database name (optional if default database is configured) | |
| collection | Yes | Collection name | |
| size | No | Number of random documents to sample (default: 10) | |
| outputFormat | No | Output format for results (json or csv) | |
| formatOptions | No | Format-specific options |
Implementation Reference
- src/index.ts:1306-1357 (handler)The main handler for the 'sample_data' tool. It performs a random sampling of documents from the specified collection using MongoDB's $sample aggregation stage, caps the sample size at 1000 for safety, supports both JSON and CSV output formats, and appends visualization hints for the sampled data.case 'sample_data': { const { database, collection, size = 10, outputFormat = 'json', formatOptions = {} } = request.params.arguments as { database?: string; collection: string; size?: number; outputFormat?: 'json' | 'csv'; formatOptions?: any; }; const dbName = database || this.defaultDatabase; if (!dbName) { throw new McpError( ErrorCode.InvalidRequest, 'Database name is required when no default database is configured' ); } const db = client.db(dbName); const sampleSize = Math.min(size, 1000); // Cap sample size for safety const results = await db.collection(collection).aggregate([ { $sample: { size: sampleSize } } ]).toArray(); // Handle different output formats if (outputFormat.toLowerCase() === 'csv') { return { content: [ { type: 'text', text: this.documentsToCsv(results, formatOptions), }, ], }; } else { // Default JSON format const vizHint = this.generateVisualizationHint(results); return { content: [ { type: 'text', text: JSON.stringify(results, null, 2) + (vizHint ? `\n\nVisualization Hint:\n${vizHint}` : ''), }, ], }; } }
- src/index.ts:681-755 (registration)Registration of the 'sample_data' tool in the list_tools response, including detailed description and complete input schema definition with parameters for database, collection, sample size, output format, and CSV options.name: 'sample_data', description: `Get a random sample of documents from a collection. Supports both JSON and CSV output formats: - Use outputFormat="json" for standard JSON (default) - Use outputFormat="csv" for comma-separated values export Useful for: - Exploratory data analysis - Testing with representative data - Understanding data distribution - Performance testing with realistic data subsets Example - JSON Sample: use_mcp_tool with server_name: "mongodb", tool_name: "sample_data", arguments: { "collection": "users", "size": 50 } Example - CSV Export: use_mcp_tool with server_name: "mongodb", tool_name: "sample_data", arguments: { "collection": "users", "size": 100, "outputFormat": "csv", "formatOptions": { "includeHeaders": true, "delimiter": "," } }`, inputSchema: { type: 'object', properties: { database: { type: 'string', description: 'Database name (optional if default database is configured)', }, collection: { type: 'string', description: 'Collection name', }, size: { type: 'number', description: 'Number of random documents to sample (default: 10)', minimum: 1, maximum: 1000, }, outputFormat: { type: 'string', description: 'Output format for results (json or csv)', enum: ['json', 'csv'], }, formatOptions: { type: 'object', description: 'Format-specific options', properties: { delimiter: { type: 'string', description: 'CSV delimiter character (default: comma)', }, includeHeaders: { type: 'boolean', description: 'Whether to include header row in CSV (default: true)', }, }, }, }, required: ['collection'], }, },
- src/index.ts:716-754 (schema)Input schema definition for the 'sample_data' tool, specifying parameters, types, descriptions, constraints, and required fields.inputSchema: { type: 'object', properties: { database: { type: 'string', description: 'Database name (optional if default database is configured)', }, collection: { type: 'string', description: 'Collection name', }, size: { type: 'number', description: 'Number of random documents to sample (default: 10)', minimum: 1, maximum: 1000, }, outputFormat: { type: 'string', description: 'Output format for results (json or csv)', enum: ['json', 'csv'], }, formatOptions: { type: 'object', description: 'Format-specific options', properties: { delimiter: { type: 'string', description: 'CSV delimiter character (default: comma)', }, includeHeaders: { type: 'boolean', description: 'Whether to include header row in CSV (default: true)', }, }, }, }, required: ['collection'], },
- src/index.ts:254-289 (helper)Helper function to convert MongoDB documents to CSV format, handling varying schemas, proper escaping, and configurable options. Used by sample_data for CSV output.private documentsToCsv(docs: any[], options: { includeHeaders?: boolean; delimiter?: string; } = {}): string { if (!Array.isArray(docs) || docs.length === 0) return ''; const delimiter = options.delimiter || ','; const includeHeaders = options.includeHeaders !== false; // Extract all possible field names from all documents (handles varying schemas) const fieldsSet = new Set<string>(); docs.forEach(doc => { Object.keys(doc).forEach(key => fieldsSet.add(key)); }); const fields = Array.from(fieldsSet); let result = ''; // Add headers if (includeHeaders) { result += fields.map(field => this.escapeCsvField(field, delimiter)).join(delimiter) + '\n'; } // Add data rows docs.forEach(doc => { const row = fields.map(field => { const value = doc[field]; if (value === undefined || value === null) return ''; if (typeof value === 'object') return this.escapeCsvField(JSON.stringify(value), delimiter); return this.escapeCsvField(String(value), delimiter); }); result += row.join(delimiter) + '\n'; }); return result; }
- src/index.ts:194-246 (helper)Helper function that analyzes sampled data and generates visualization recommendations based on data types (time series, numeric, categorical, geospatial), used by sample_data to append hints to JSON output.private generateVisualizationHint(data: any[]): string { if (!Array.isArray(data) || data.length === 0) return ''; // Check if the data looks like time series const hasDateFields = Object.keys(data[0]).some(key => data[0][key] instanceof Date || (typeof data[0][key] === 'string' && !isNaN(Date.parse(data[0][key]))) ); // Check if the data has numeric fields const numericFields = Object.keys(data[0]).filter(key => typeof data[0][key] === 'number' ); // Check if the data has categorical fields const categoricalFields = Object.keys(data[0]).filter(key => typeof data[0][key] === 'string' && data.every(item => typeof item[key] === 'string') ); // Check if the data has geospatial fields const hasGeoData = Object.keys(data[0]).some(key => { const value = data[0][key]; return value && typeof value === 'object' && (('type' in value && value.type === 'Point' && 'coordinates' in value) || (Array.isArray(value) && value.length === 2 && typeof value[0] === 'number' && typeof value[1] === 'number')); }); let hints = []; if (hasDateFields && numericFields.length > 0) { hints.push('Time Series Visualization:\n- Consider line charts for temporal trends\n- Time-based heat maps for density patterns\n- Area charts for cumulative values over time'); } if (categoricalFields.length > 0 && numericFields.length > 0) { hints.push('Categorical Analysis:\n- Bar charts for comparing categories\n- Box plots for distribution analysis\n- Heat maps for category correlations\n- Treemaps for hierarchical data'); } if (numericFields.length >= 2) { hints.push('Numerical Analysis:\n- Scatter plots for correlation analysis\n- Bubble charts if three numeric dimensions\n- Correlation matrices for multiple variables\n- Histograms for distribution analysis'); } if (hasGeoData) { hints.push('Geospatial Visualization:\n- Map overlays for location data\n- Choropleth maps for regional analysis\n- Heat maps for density visualization\n- Cluster maps for point concentration'); } if (data.length > 1000) { hints.push('Large Dataset Considerations:\n- Consider sampling for initial visualization\n- Use aggregation for summary views\n- Implement pagination or infinite scroll\n- Consider server-side rendering'); } return hints.join('\n\n'); }