Skip to main content
Glama
jonfreeland

MongoDB MCP Server

by jonfreeland

sample_data

Retrieve random document samples from MongoDB collections for data exploration, testing, and analysis in JSON or CSV formats.

Instructions

Get a random sample of documents from a collection.

Supports both JSON and CSV output formats:

  • Use outputFormat="json" for standard JSON (default)

  • Use outputFormat="csv" for comma-separated values export

Useful for:

  • Exploratory data analysis

  • Testing with representative data

  • Understanding data distribution

  • Performance testing with realistic data subsets

Example - JSON Sample: use_mcp_tool with server_name: "mongodb", tool_name: "sample_data", arguments: { "collection": "users", "size": 50 }

Example - CSV Export: use_mcp_tool with server_name: "mongodb", tool_name: "sample_data", arguments: { "collection": "users", "size": 100, "outputFormat": "csv", "formatOptions": { "includeHeaders": true, "delimiter": "," } }

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
databaseNoDatabase name (optional if default database is configured)
collectionYesCollection name
sizeNoNumber of random documents to sample (default: 10)
outputFormatNoOutput format for results (json or csv)
formatOptionsNoFormat-specific options

Implementation Reference

  • The main handler for the 'sample_data' tool. It performs a random sampling of documents from the specified collection using MongoDB's $sample aggregation stage, caps the sample size at 1000 for safety, supports both JSON and CSV output formats, and appends visualization hints for the sampled data.
    case 'sample_data': {
      const { 
        database, 
        collection, 
        size = 10,
        outputFormat = 'json',
        formatOptions = {}
      } = request.params.arguments as {
        database?: string;
        collection: string;
        size?: number;
        outputFormat?: 'json' | 'csv';
        formatOptions?: any;
      };
      const dbName = database || this.defaultDatabase;
      if (!dbName) {
        throw new McpError(
          ErrorCode.InvalidRequest,
          'Database name is required when no default database is configured'
        );
      }
      
      const db = client.db(dbName);
      const sampleSize = Math.min(size, 1000); // Cap sample size for safety
      
      const results = await db.collection(collection).aggregate([
        { $sample: { size: sampleSize } }
      ]).toArray();
      
      // Handle different output formats
      if (outputFormat.toLowerCase() === 'csv') {
        return {
          content: [
            {
              type: 'text',
              text: this.documentsToCsv(results, formatOptions),
            },
          ],
        };
      } else {
        // Default JSON format
        const vizHint = this.generateVisualizationHint(results);
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify(results, null, 2) + (vizHint ? `\n\nVisualization Hint:\n${vizHint}` : ''),
            },
          ],
        };
      }
    }
  • src/index.ts:681-755 (registration)
    Registration of the 'sample_data' tool in the list_tools response, including detailed description and complete input schema definition with parameters for database, collection, sample size, output format, and CSV options.
              name: 'sample_data',
              description: `Get a random sample of documents from a collection.
      
    Supports both JSON and CSV output formats:
    - Use outputFormat="json" for standard JSON (default)
    - Use outputFormat="csv" for comma-separated values export
    
    Useful for:
    - Exploratory data analysis
    - Testing with representative data
    - Understanding data distribution
    - Performance testing with realistic data subsets
    
    Example - JSON Sample:
    use_mcp_tool with
      server_name: "mongodb",
      tool_name: "sample_data",
      arguments: {
        "collection": "users",
        "size": 50
      }
    
    Example - CSV Export:
    use_mcp_tool with
      server_name: "mongodb",
      tool_name: "sample_data",
      arguments: {
        "collection": "users",
        "size": 100,
        "outputFormat": "csv",
        "formatOptions": {
          "includeHeaders": true,
          "delimiter": ","
        }
      }`,
              inputSchema: {
                type: 'object',
                properties: {
                  database: {
                    type: 'string',
                    description: 'Database name (optional if default database is configured)',
                  },
                  collection: {
                    type: 'string',
                    description: 'Collection name',
                  },
                  size: {
                    type: 'number',
                    description: 'Number of random documents to sample (default: 10)',
                    minimum: 1,
                    maximum: 1000,
                  },
                  outputFormat: {
                    type: 'string',
                    description: 'Output format for results (json or csv)',
                    enum: ['json', 'csv'],
                  },
                  formatOptions: {
                    type: 'object',
                    description: 'Format-specific options',
                    properties: {
                      delimiter: {
                        type: 'string',
                        description: 'CSV delimiter character (default: comma)',
                      },
                      includeHeaders: {
                        type: 'boolean',
                        description: 'Whether to include header row in CSV (default: true)',
                      },
                    },
                  },
                },
                required: ['collection'],
              },
            },
  • Input schema definition for the 'sample_data' tool, specifying parameters, types, descriptions, constraints, and required fields.
    inputSchema: {
      type: 'object',
      properties: {
        database: {
          type: 'string',
          description: 'Database name (optional if default database is configured)',
        },
        collection: {
          type: 'string',
          description: 'Collection name',
        },
        size: {
          type: 'number',
          description: 'Number of random documents to sample (default: 10)',
          minimum: 1,
          maximum: 1000,
        },
        outputFormat: {
          type: 'string',
          description: 'Output format for results (json or csv)',
          enum: ['json', 'csv'],
        },
        formatOptions: {
          type: 'object',
          description: 'Format-specific options',
          properties: {
            delimiter: {
              type: 'string',
              description: 'CSV delimiter character (default: comma)',
            },
            includeHeaders: {
              type: 'boolean',
              description: 'Whether to include header row in CSV (default: true)',
            },
          },
        },
      },
      required: ['collection'],
    },
  • Helper function to convert MongoDB documents to CSV format, handling varying schemas, proper escaping, and configurable options. Used by sample_data for CSV output.
    private documentsToCsv(docs: any[], options: {
      includeHeaders?: boolean;
      delimiter?: string;
    } = {}): string {
      if (!Array.isArray(docs) || docs.length === 0) return '';
      
      const delimiter = options.delimiter || ',';
      const includeHeaders = options.includeHeaders !== false;
      
      // Extract all possible field names from all documents (handles varying schemas)
      const fieldsSet = new Set<string>();
      docs.forEach(doc => {
        Object.keys(doc).forEach(key => fieldsSet.add(key));
      });
      
      const fields = Array.from(fieldsSet);
      let result = '';
      
      // Add headers
      if (includeHeaders) {
        result += fields.map(field => this.escapeCsvField(field, delimiter)).join(delimiter) + '\n';
      }
      
      // Add data rows
      docs.forEach(doc => {
        const row = fields.map(field => {
          const value = doc[field];
          if (value === undefined || value === null) return '';
          if (typeof value === 'object') return this.escapeCsvField(JSON.stringify(value), delimiter);
          return this.escapeCsvField(String(value), delimiter);
        });
        result += row.join(delimiter) + '\n';
      });
      
      return result;
    }
  • Helper function that analyzes sampled data and generates visualization recommendations based on data types (time series, numeric, categorical, geospatial), used by sample_data to append hints to JSON output.
    private generateVisualizationHint(data: any[]): string {
      if (!Array.isArray(data) || data.length === 0) return '';
    
      // Check if the data looks like time series
      const hasDateFields = Object.keys(data[0]).some(key => 
        data[0][key] instanceof Date || 
        (typeof data[0][key] === 'string' && !isNaN(Date.parse(data[0][key])))
      );
    
      // Check if the data has numeric fields
      const numericFields = Object.keys(data[0]).filter(key => 
        typeof data[0][key] === 'number'
      );
    
      // Check if the data has categorical fields
      const categoricalFields = Object.keys(data[0]).filter(key => 
        typeof data[0][key] === 'string' && 
        data.every(item => typeof item[key] === 'string')
      );
    
      // Check if the data has geospatial fields
      const hasGeoData = Object.keys(data[0]).some(key => {
        const value = data[0][key];
        return value && typeof value === 'object' && 
          (('type' in value && value.type === 'Point' && 'coordinates' in value) ||
           (Array.isArray(value) && value.length === 2 && 
            typeof value[0] === 'number' && typeof value[1] === 'number'));
      });
    
      let hints = [];
    
      if (hasDateFields && numericFields.length > 0) {
        hints.push('Time Series Visualization:\n- Consider line charts for temporal trends\n- Time-based heat maps for density patterns\n- Area charts for cumulative values over time');
      }
    
      if (categoricalFields.length > 0 && numericFields.length > 0) {
        hints.push('Categorical Analysis:\n- Bar charts for comparing categories\n- Box plots for distribution analysis\n- Heat maps for category correlations\n- Treemaps for hierarchical data');
      }
    
      if (numericFields.length >= 2) {
        hints.push('Numerical Analysis:\n- Scatter plots for correlation analysis\n- Bubble charts if three numeric dimensions\n- Correlation matrices for multiple variables\n- Histograms for distribution analysis');
      }
    
      if (hasGeoData) {
        hints.push('Geospatial Visualization:\n- Map overlays for location data\n- Choropleth maps for regional analysis\n- Heat maps for density visualization\n- Cluster maps for point concentration');
      }
    
      if (data.length > 1000) {
        hints.push('Large Dataset Considerations:\n- Consider sampling for initial visualization\n- Use aggregation for summary views\n- Implement pagination or infinite scroll\n- Consider server-side rendering');
      }
    
      return hints.join('\n\n');
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jonfreeland/mongodb-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server