Skip to main content
Glama

generateDataModel

Create statistical models from sample documents or text descriptions to generate realistic data for MongoDB-compatible databases without actual storage.

Instructions

Create a statistical model from sample documents or a text description for data generation

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
nameYesName for the model
descriptionNoNatural language description of the data structure
samplesNoSample documents to train the model

Implementation Reference

  • The core handler function that implements the generateDataModel tool logic. It processes input arguments to either infer a JSON schema from sample data or generate one from a description, persists the model using storage, caches it in memory, and returns a success response with model properties.
    async generateDataModel(args) {
        const { name, description, samples } = args;
        
        if (samples && samples.length > 0) {
            const model = this.inferrer.inferSchema(samples);
            model.title = name;
            model.description = description || `DataFlood model: ${name}`;
            
            await this.storage.saveModel(config.storage.defaultDatabase, name, model);
            this.models.set(name, model);
            
            return {
                success: true,
                message: `Model '${name}' created from ${samples.length} samples`,
                properties: Object.keys(model.properties || {})
            };
        } else if (description) {
            const model = this.generateFromDescription(description);
            model.title = name;
            
            await this.storage.saveModel(config.storage.defaultDatabase, name, model);
            this.models.set(name, model);
            
            return {
                success: true,
                message: `Model '${name}' generated from description`,
                properties: Object.keys(model.properties || {})
            };
        } else {
            throw new Error('Either samples or description required');
        }
    }
  • The input schema and metadata definition for the generateDataModel tool, used for validation and advertised via tools/list endpoint.
    {
        name: 'generateDataModel',
        description: 'Generate a DataFlood model from sample data or description',
        inputSchema: {
            type: 'object',
            properties: {
                name: { type: 'string', description: 'Name for the model' },
                description: { type: 'string', description: 'Natural language description of the data structure' },
                samples: { type: 'array', description: 'Sample documents to train the model', items: { type: 'object' } }
            },
            required: ['name']
        }
    },
  • Registration and dispatch point within the handleToolCall method's switch statement that routes tool calls named 'generateDataModel' to the handler function.
    case 'generateDataModel':
        result = await this.generateDataModel(args);
        break;
  • Supporting helper function called by the handler to generate a model schema from a natural language description using a prompt analyzer.
    generateFromDescription(description) {
        const analysis = this.promptAnalyzer.analyze(description);
        return analysis.schema;
    }
  • Alternative input schema definition for generateDataModel in index.js tool definitions array.
    {
      name: 'generateDataModel',
      description: 'Create a statistical model from sample documents or a text description for data generation',
      inputSchema: {
        type: 'object',
        properties: {
          name: { type: 'string', description: 'Name for the model' },
          description: { type: 'string', description: 'Natural language description of the data structure' },
          samples: { type: 'array', description: 'Sample documents to train the model', items: { type: 'object' } }
        },
        required: ['name']
      }
    },
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool creates a model but lacks details on permissions, rate limits, whether the model is saved or transient, or what happens if inputs are invalid. For a creation tool with zero annotation coverage, this is a significant gap, as it doesn't address key behavioral aspects beyond the basic action.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It is front-loaded with the key action ('Create a statistical model') and specifies the sources concisely. Every part of the sentence contributes to understanding, making it well-structured and appropriately sized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of creating a statistical model, no annotations, and no output schema, the description is incomplete. It doesn't explain what the created model entails (e.g., format, storage, usage), potential side effects, or how to handle errors. For a tool with significant implications and no structured support, more detail is needed to guide effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters ('name', 'description', 'samples') with descriptions. The description adds marginal value by implying that 'samples' are for training and 'description' is for natural language input, but it doesn't provide additional syntax, format, or constraints beyond what the schema states. This meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Create a statistical model from sample documents or a text description for data generation.' It specifies the verb ('create'), resource ('statistical model'), and sources ('sample documents or a text description'), making the action concrete. However, it doesn't explicitly differentiate from sibling tools like 'trainModel' or 'queryModel', which prevents a score of 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites, such as needing sample documents or a description, nor does it compare to siblings like 'trainModel' (which might involve training an existing model) or 'queryModel' (which might use a model). Without any context for selection, the score reflects minimal guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/smallmindsco/MongTap'

If you have feedback or need assistance with the MCP directory API, please join our Discord server