Skip to main content
Glama
data-model.md11.8 kB
# Data Model: Fake Data Generation MCP Server **Date**: 2025-11-05 **Feature**: 001-faker-mcp-server ## Overview This document defines the data entities, types, and relationships for the Faker MCP server. Since this is a stateless data generation service, the data model focuses on request/response structures, generation configurations, and output data schemas rather than persistent storage entities. --- ## Core Entities ### 1. GenerationRequest Represents a client request for fake data generation. **Fields**: - `toolName` (string, required): Name of the MCP tool being invoked - `parameters` (object, required): Tool-specific parameters (validated against tool schema) - `requestId` (string, auto-generated): Unique identifier for tracking requests - `timestamp` (ISO 8601 string, auto-generated): Request timestamp **Relationships**: - Has one `GenerationResponse` (output) **Validation Rules**: - `toolName` must be one of: `generate-person`, `generate-company`, `generate-dataset`, `generate-custom` - `parameters` must conform to tool-specific JSON Schema - Request size must not exceed 1MB **State Transitions**: N/A (stateless) --- ### 2. GenerationResponse Represents the result of a data generation request. **Fields**: - `content` (array of objects, required): Generated fake data records - `metadata` (object, required): Response metadata - `count` (number): Number of records generated - `seed` (number, optional): Seed used for generation (if specified or auto-generated) - `locale` (string): Locale used for data generation - `generationTimeMs` (number): Time taken to generate data in milliseconds - `isError` (boolean): Whether response represents an error - `text` (string): Human-readable summary of generated data **Relationships**: - Belongs to one `GenerationRequest` (input) **Validation Rules**: - `content` array must not exceed 10,000 elements - `count` must match actual number of records in `content` - `generationTimeMs` must be positive number --- ### 3. PersonData Represents generated data for a person entity. **Fields**: - `id` (string, required): Unique identifier (format: `person_${seed}_${index}`) - `firstName` (string, required): Person's first name - `lastName` (string, required): Person's last name - `fullName` (string, required): Complete name - `email` (string, required): Email address (format validated) - `phone` (string, optional): Phone number - `dateOfBirth` (ISO 8601 string, optional): Date of birth - `address` (object, optional): Address information - `street` (string): Street address - `city` (string): City name - `state` (string): State/province - `postalCode` (string): Postal/ZIP code - `country` (string): Country name **Validation Rules**: - `email` must match regex: `^[^\s@]+@[^\s@]+\.[^\s@]+$` - `phone` must be valid phone format for specified locale - `dateOfBirth` if provided must be in past and reasonable (18-100 years ago by default) - All string fields must be non-empty and ≤ 255 characters **Example**: ```json { "id": "person_12345_0", "firstName": "John", "lastName": "Doe", "fullName": "John Doe", "email": "john.doe@example.com", "phone": "+1-555-123-4567", "dateOfBirth": "1985-06-15", "address": { "street": "123 Main St", "city": "Springfield", "state": "IL", "postalCode": "62701", "country": "United States" } } ``` --- ### 4. CompanyData Represents generated data for a company entity. **Fields**: - `id` (string, required): Unique identifier (format: `company_${seed}_${index}`) - `name` (string, required): Company name - `industry` (string, required): Industry/sector - `email` (string, required): Company email address - `phone` (string, optional): Company phone number - `website` (string, optional): Company website URL - `address` (object, optional): Company address (same structure as PersonData.address) - `founded` (number, optional): Year founded (1900-current year) - `employeeCount` (number, optional): Number of employees **Validation Rules**: - `name` must be non-empty and ≤ 255 characters - `email` must match email format regex - `website` must be valid URL if provided - `founded` must be between 1900 and current year - `employeeCount` must be positive integer if provided **Example**: ```json { "id": "company_12345_0", "name": "Acme Corporation", "industry": "Technology", "email": "contact@acme.example.com", "phone": "+1-555-987-6543", "website": "https://acme.example.com", "address": { "street": "456 Business Blvd", "city": "San Francisco", "state": "CA", "postalCode": "94105", "country": "United States" }, "founded": 2010, "employeeCount": 250 } ``` --- ### 5. DatasetSchema Defines the structure for multi-entity dataset generation. **Fields**: - `entities` (object, required): Map of entity names to entity definitions - `[entityName]` (object): Entity definition - `count` (number, required): Number of records to generate (1-10,000) - `type` (string, required): Entity type (`person`, `company`, or `custom`) - `fields` (array of strings, optional): Specific fields to include (defaults to all) - `relationships` (object, optional): Foreign key relationships - `[fieldName]` (object): Relationship definition - `references` (string, required): Name of parent entity - `type` (string, required): Relationship type (`one-to-many`, `many-to-many`) - `nullable` (boolean, optional): Whether FK can be null (default: false) **Validation Rules**: - Must define at least one entity - Total record count across all entities must not exceed 10,000 - Referenced entities in relationships must exist in schema - Relationship references must point to entities defined before the current entity (no circular dependencies in generation order) - Entity names must be valid identifiers (alphanumeric + underscore) **Example**: ```json { "entities": { "users": { "count": 100, "type": "person", "fields": ["id", "fullName", "email", "phone"] }, "orders": { "count": 500, "type": "custom", "fields": ["id", "userId", "productName", "price", "orderDate"], "relationships": { "userId": { "references": "users", "type": "one-to-many", "nullable": false } } } } } ``` --- ### 6. CustomPattern Defines custom data generation patterns for specialized fields. **Fields**: - `fieldName` (string, required): Name of the field to generate - `pattern` (object, required): Pattern definition - `type` (string, required): Pattern type (`regex`, `enum`, `format`, `range`) - `value` (string | array | object, required): Type-specific pattern value - For `regex`: Regular expression string - For `enum`: Array of possible values - For `format`: Format string with placeholders - For `range`: Object with `min` and `max` properties **Validation Rules**: - `fieldName` must be valid identifier - `type` must be one of supported pattern types - `value` must be valid for specified type - Regex patterns must be valid and not catastrophically backtracking - Enum arrays must have at least one element - Range min must be less than max **Examples**: ```json // Regex pattern { "fieldName": "productCode", "pattern": { "type": "regex", "value": "PRD-[0-9]{4}-[A-Z]{2}" } } // Enum pattern { "fieldName": "status", "pattern": { "type": "enum", "value": ["pending", "active", "completed", "cancelled"] } } // Format pattern { "fieldName": "reference", "pattern": { "type": "format", "value": "REF-{{year}}-{{random:5}}" } } // Range pattern { "fieldName": "age", "pattern": { "type": "range", "value": { "min": 18, "max": 65 } } } ``` --- ### 7. SeedConfiguration Manages seed values for reproducible data generation. **Fields**: - `seed` (number, optional): User-provided seed value - `autoGenerated` (boolean): Whether seed was auto-generated - `seedString` (string, optional): Alternative string seed (hashed to number) **Validation Rules**: - If provided, `seed` must be safe integer (Number.isSafeInteger) - `seedString` if provided must be non-empty and ≤ 100 characters - Only one of `seed` or `seedString` should be provided **Behavior**: - If no seed provided: auto-generate from current timestamp - If seedString provided: hash to consistent numeric seed - Seed is propagated to all Faker instances in the request --- ## Type Hierarchies ### GeneratedData (Union Type) ```typescript type GeneratedData = PersonData | CompanyData | CustomData; ``` ### EntityType (Enum) ```typescript enum EntityType { PERSON = 'person', COMPANY = 'company', CUSTOM = 'custom' } ``` ### RelationshipType (Enum) ```typescript enum RelationshipType { ONE_TO_MANY = 'one-to-many', MANY_TO_MANY = 'many-to-many' } ``` ### PatternType (Enum) ```typescript enum PatternType { REGEX = 'regex', ENUM = 'enum', FORMAT = 'format', RANGE = 'range' } ``` ### SupportedLocale (Enum) ```typescript enum SupportedLocale { EN = 'en', FR = 'fr', DE = 'de', ES = 'es', JA = 'ja' } ``` --- ## Data Flow Diagrams ### Single Entity Generation Flow ``` Client Request ↓ [Validate Parameters] → [Error Response] (if invalid) ↓ [Generate Seed] (if not provided) ↓ [Initialize Faker Instance] ↓ [Generate N Records] (batched for large N) ↓ [Build Response with Metadata] ↓ Response to Client ``` ### Multi-Entity Dataset Generation Flow ``` Client Request with Schema ↓ [Validate Schema] → [Error Response] (if invalid) ↓ [Sort Entities by Dependencies] → [Error Response] (if circular) ↓ [Generate Seed] (if not provided) ↓ [For Each Entity in Order]: ↓ [Initialize Entity Generator] ↓ [Generate Records with FK References] ↓ [Store IDs for Child Relationships] ↓ [Build Dataset Response] ↓ Response to Client ``` --- ## Validation Matrix | Entity | Field | Validation | Error Code | |--------|-------|------------|------------| | GenerationRequest | toolName | Must be known tool | -32602 | | GenerationRequest | parameters | Must match tool schema | -32602 | | PersonData | email | Must match email regex | -32602 | | PersonData | phone | Must be valid for locale | -32602 | | CompanyData | website | Must be valid URL | -32602 | | CompanyData | founded | Must be 1900-current year | -32602 | | DatasetSchema | entities | Must have ≥1 entity | -32602 | | DatasetSchema | count | Total ≤ 10,000 records | -32602 | | DatasetSchema | relationships | No circular dependencies | -32602 | | CustomPattern | regex | Must be valid regex | -32602 | | SeedConfiguration | seed | Must be safe integer | -32602 | --- ## Performance Considerations ### Memory Estimates (per record) - PersonData (full): ~500 bytes - PersonData (minimal): ~200 bytes - CompanyData (full): ~600 bytes - CompanyData (minimal): ~250 bytes ### Batch Sizes - 1-1,000 records: Single batch, no streaming - 1,001-5,000 records: Batch size 1,000 - 5,001-10,000 records: Batch size 1,000 with GC hints ### Memory Limits - Target: <100MB total memory usage - Reserve: 20MB for overhead and temporary buffers - Per-record budget: ~8KB (conservative estimate for 10k records) --- ## Extension Points Future enhancements may include: 1. **Additional Entity Types**: Product, Event, Transaction, etc. 2. **Advanced Patterns**: Weighted distributions, conditional generation, template-based composition 3. **Export Formats**: CSV, SQL INSERT statements, Parquet files 4. **Streaming Responses**: Server-sent events for very large datasets (>10k records) 5. **Custom Faker Plugins**: User-defined data generators --- **Data Model Version**: 1.0.0 **Last Updated**: 2025-11-05

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/funsjanssen/faker-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server