Skip to main content
Glama

generate-dataset

Create structured datasets with realistic mock data for testing databases, APIs, and development scenarios. Supports multiple entity types with referential integrity and relationships.

Instructions

Generate a structured dataset with multiple related entities and referential integrity. Supports person, company, and custom entity types with one-to-many and many-to-many relationships. Perfect for creating test databases, mock APIs, and complex data scenarios.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
schemaYes
seedNo
localeNoen

Implementation Reference

  • The handler function that implements the core logic of the generate-dataset tool, including validation, dataset generation, and response formatting.
    export function handleGenerateDataset(params: unknown) {
      try {
        // Validate parameters
        const validatedParams = GenerateDatasetParamsSchema.parse(params);
    
        // Additional schema validation (referential integrity, circular dependencies)
        const schemaValidation = validateDatasetSchema(validatedParams.schema);
        if (!schemaValidation.valid) {
          throw new Error(`Invalid dataset schema: ${schemaValidation.errors.join(', ')}`);
        }
    
        // Create generator
        const generator = new DatasetGenerator({
          seed: validatedParams.seed,
          locale: validatedParams.locale,
        });
    
        // Generate dataset
        const result = generator.generateDataset(validatedParams.schema);
    
        // Log generation (no console.log, following linter rules - will log in server.ts instead)
    
        // Return response
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify(result, null, 2),
            },
          ],
        };
      } catch (error) {
        // Error handling
        if (error instanceof z.ZodError) {
          const errorMessage = `Validation error: ${error.errors.map((e) => `${e.path.join('.')}: ${e.message}`).join(', ')}`;
          // Log error (will be handled by server)
          throw new Error(errorMessage);
        }
    
        if (error instanceof Error) {
          // Log error (will be handled by server)
          throw error;
        }
    
        // Log unknown error (will be handled by server)
        throw new Error('Unknown error occurred during dataset generation');
      }
    }
  • Zod validation schemas for the generate-dataset tool parameters, including relationship, entity, dataset schema definitions, and the top-level params schema.
    const RelationshipDefinitionSchema = z.object({
      references: z.string().min(1, 'Relationship references must be a non-empty string'),
      type: z.nativeEnum(RelationshipType),
      nullable: z.boolean().optional(),
    });
    
    /**
     * Zod validation schema for entity definitions within datasets.
     *
     * @constant
     * @type {z.ZodObject}
     */
    const EntityDefinitionSchema = z.object({
      count: z
        .number()
        .int('Count must be an integer')
        .min(1, 'Count must be at least 1')
        .max(10000, 'Count must not exceed 10000'),
      type: z.nativeEnum(EntityType),
      fields: z.array(z.string()).optional(),
      relationships: z.record(z.string(), RelationshipDefinitionSchema).optional(),
    });
    
    /**
     * Zod validation schema for complete dataset schemas.
     *
     * @constant
     * @type {z.ZodObject}
     */
    const DatasetSchemaSchema = z.object({
      entities: z
        .record(z.string(), EntityDefinitionSchema)
        .refine((entities) => Object.keys(entities).length > 0, {
          message: 'Schema must contain at least one entity',
        }),
    });
    
    /**
     * Zod validation schema for generate-dataset tool parameters.
     *
     * @constant
     * @type {z.ZodObject}
     */
    export const GenerateDatasetParamsSchema = z.object({
      schema: DatasetSchemaSchema,
      seed: z.number().int().optional(),
      locale: z.nativeEnum(SupportedLocale).optional().default(SupportedLocale.EN),
    });
    
    /**
     * Type definition for generate-dataset parameters, inferred from Zod schema.
     *
     * @typedef {z.infer<typeof GenerateDatasetParamsSchema>} GenerateDatasetParams
     */
    export type GenerateDatasetParams = z.infer<typeof GenerateDatasetParamsSchema>;
  • The Tool object definition for generate-dataset, including name, description, and inputSchema derived from Zod schema.
    export const generateDatasetTool: Tool = {
      name: 'generate-dataset',
      description:
        'Generate a structured dataset with multiple related entities and referential integrity. ' +
        'Supports person, company, and custom entity types with one-to-many and many-to-many relationships. ' +
        'Perfect for creating test databases, mock APIs, and complex data scenarios.',
      inputSchema: zodToJsonSchema(GenerateDatasetParamsSchema) as Tool['inputSchema'],
    };
  • src/index.ts:24-28 (registration)
    The server.registerTool call that registers the generate-dataset tool and its handler with the MCP server.
    // Register User Story 2 tool: generate-dataset
    server.registerTool(generateDatasetTool, async (args) => {
      await Promise.resolve();
      return handleGenerateDataset(args);
    });
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. While it mentions the tool generates datasets with referential integrity and supports specific entity/relationship types, it doesn't describe important behavioral aspects like what format the output takes (JSON, CSV, etc.), whether it's deterministic based on seed, performance characteristics, or any limitations. For a complex data generation tool with no annotation coverage, this represents significant gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured in two sentences that each earn their place. The first sentence establishes core functionality, the second provides usage context. No wasted words, appropriately front-loaded with the main purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 3 parameters (including nested objects), 0% schema description coverage, no annotations, and no output schema, the description is insufficient. It doesn't explain the output format, parameter interactions, or behavioral constraints needed for effective use. The mention of use cases helps but doesn't compensate for the missing technical details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage for 3 parameters, the description doesn't explicitly mention any parameters or provide semantic context beyond what's implied by the tool's purpose. The description mentions 'schema' concepts (entities, relationships) and 'locale' (implied by language examples), but doesn't explain parameter meanings, defaults, or constraints. This leaves significant gaps given the complex nested parameter structure.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates structured datasets with multiple related entities and referential integrity, specifying supported entity types (person, company, custom) and relationship types (one-to-many, many-to-many). It distinguishes from sibling tools by handling multiple entity types rather than single types like generate-person or generate-company, though it doesn't explicitly name those alternatives.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: 'Perfect for creating test databases, mock APIs, and complex data scenarios.' This gives practical guidance on appropriate use cases. However, it doesn't explicitly state when NOT to use it or directly compare it to the sibling single-entity generators.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/funsjanssen/faker-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server