Skip to main content
Glama

generate-dataset

Create structured datasets with realistic mock data for testing databases, APIs, and development scenarios. Supports multiple entity types with referential integrity and relationships.

Instructions

Generate a structured dataset with multiple related entities and referential integrity. Supports person, company, and custom entity types with one-to-many and many-to-many relationships. Perfect for creating test databases, mock APIs, and complex data scenarios.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
schemaYes
seedNo
localeNoen

Implementation Reference

  • The handler function that implements the core logic of the generate-dataset tool, including validation, dataset generation, and response formatting.
    export function handleGenerateDataset(params: unknown) {
      try {
        // Validate parameters
        const validatedParams = GenerateDatasetParamsSchema.parse(params);
    
        // Additional schema validation (referential integrity, circular dependencies)
        const schemaValidation = validateDatasetSchema(validatedParams.schema);
        if (!schemaValidation.valid) {
          throw new Error(`Invalid dataset schema: ${schemaValidation.errors.join(', ')}`);
        }
    
        // Create generator
        const generator = new DatasetGenerator({
          seed: validatedParams.seed,
          locale: validatedParams.locale,
        });
    
        // Generate dataset
        const result = generator.generateDataset(validatedParams.schema);
    
        // Log generation (no console.log, following linter rules - will log in server.ts instead)
    
        // Return response
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify(result, null, 2),
            },
          ],
        };
      } catch (error) {
        // Error handling
        if (error instanceof z.ZodError) {
          const errorMessage = `Validation error: ${error.errors.map((e) => `${e.path.join('.')}: ${e.message}`).join(', ')}`;
          // Log error (will be handled by server)
          throw new Error(errorMessage);
        }
    
        if (error instanceof Error) {
          // Log error (will be handled by server)
          throw error;
        }
    
        // Log unknown error (will be handled by server)
        throw new Error('Unknown error occurred during dataset generation');
      }
    }
  • Zod validation schemas for the generate-dataset tool parameters, including relationship, entity, dataset schema definitions, and the top-level params schema.
    const RelationshipDefinitionSchema = z.object({
      references: z.string().min(1, 'Relationship references must be a non-empty string'),
      type: z.nativeEnum(RelationshipType),
      nullable: z.boolean().optional(),
    });
    
    /**
     * Zod validation schema for entity definitions within datasets.
     *
     * @constant
     * @type {z.ZodObject}
     */
    const EntityDefinitionSchema = z.object({
      count: z
        .number()
        .int('Count must be an integer')
        .min(1, 'Count must be at least 1')
        .max(10000, 'Count must not exceed 10000'),
      type: z.nativeEnum(EntityType),
      fields: z.array(z.string()).optional(),
      relationships: z.record(z.string(), RelationshipDefinitionSchema).optional(),
    });
    
    /**
     * Zod validation schema for complete dataset schemas.
     *
     * @constant
     * @type {z.ZodObject}
     */
    const DatasetSchemaSchema = z.object({
      entities: z
        .record(z.string(), EntityDefinitionSchema)
        .refine((entities) => Object.keys(entities).length > 0, {
          message: 'Schema must contain at least one entity',
        }),
    });
    
    /**
     * Zod validation schema for generate-dataset tool parameters.
     *
     * @constant
     * @type {z.ZodObject}
     */
    export const GenerateDatasetParamsSchema = z.object({
      schema: DatasetSchemaSchema,
      seed: z.number().int().optional(),
      locale: z.nativeEnum(SupportedLocale).optional().default(SupportedLocale.EN),
    });
    
    /**
     * Type definition for generate-dataset parameters, inferred from Zod schema.
     *
     * @typedef {z.infer<typeof GenerateDatasetParamsSchema>} GenerateDatasetParams
     */
    export type GenerateDatasetParams = z.infer<typeof GenerateDatasetParamsSchema>;
  • The Tool object definition for generate-dataset, including name, description, and inputSchema derived from Zod schema.
    export const generateDatasetTool: Tool = {
      name: 'generate-dataset',
      description:
        'Generate a structured dataset with multiple related entities and referential integrity. ' +
        'Supports person, company, and custom entity types with one-to-many and many-to-many relationships. ' +
        'Perfect for creating test databases, mock APIs, and complex data scenarios.',
      inputSchema: zodToJsonSchema(GenerateDatasetParamsSchema) as Tool['inputSchema'],
    };
  • src/index.ts:24-28 (registration)
    The server.registerTool call that registers the generate-dataset tool and its handler with the MCP server.
    // Register User Story 2 tool: generate-dataset
    server.registerTool(generateDatasetTool, async (args) => {
      await Promise.resolve();
      return handleGenerateDataset(args);
    });

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/funsjanssen/faker-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server