Skip to main content
Glama
spec.md7.43 kB
# Feature Specification: Fake Data Generation MCP Server **Feature Branch**: `001-faker-mcp-server` **Created**: November 5, 2025 **Status**: Draft **Input**: User description: "Build an MCP server that can be used to generate Fake or Mock data using faker in typescript" ## User Scenarios & Testing *(mandatory)* ### User Story 1 - Basic Data Generation (Priority: P1) A developer working on an application needs realistic test data for their database or API testing. They connect to the Faker MCP server and request generation of common data types like names, emails, addresses, and phone numbers to populate their test environment. **Why this priority**: Core functionality that provides immediate value - developers can generate basic test data without writing custom scripts or managing faker libraries themselves. **Independent Test**: Can be fully tested by connecting to the MCP server, requesting generation of basic data types (name, email, address), and receiving properly formatted fake data that matches expected patterns. **Acceptance Scenarios**: 1. **Given** the MCP server is running, **When** a client requests 10 fake user names, **Then** the server returns 10 unique, realistic names in the requested format 2. **Given** the MCP server is running, **When** a client requests fake email addresses, **Then** the server returns valid email format addresses that are unique 3. **Given** the MCP server is running, **When** a client requests fake addresses, **Then** the server returns complete address objects with street, city, state, and postal code --- ### User Story 2 - Structured Dataset Generation (Priority: P2) A developer needs to generate complete datasets with related entities (e.g., users with profiles, orders with customer information) for integration testing or demo purposes. They specify a schema and receive a structured dataset with consistent relationships between entities. **Why this priority**: Builds on basic functionality to provide more sophisticated testing scenarios that reflect real-world application data structures. **Independent Test**: Can be tested by defining a multi-entity schema (users + orders), requesting dataset generation, and verifying that returned data maintains referential integrity and realistic relationships. **Acceptance Scenarios**: 1. **Given** a schema defining user and order entities, **When** requesting generation of 50 users with 200 orders, **Then** the server returns structured data where each order references a valid user ID 2. **Given** a complex schema with nested relationships, **When** requesting dataset generation, **Then** all foreign key relationships are maintained correctly --- ### User Story 3 - Custom Data Patterns (Priority: P3) A developer working with domain-specific applications needs fake data that follows custom patterns or business rules (e.g., product codes with specific formats, industry-specific terminology). They can define custom generators or patterns and receive data that matches their application's specific requirements. **Why this priority**: Provides flexibility for specialized use cases, making the tool valuable for diverse development contexts beyond standard web applications. **Independent Test**: Can be tested by defining custom data patterns, requesting generation, and verifying that returned data follows the specified custom rules and formats. **Acceptance Scenarios**: 1. **Given** a custom pattern for product SKUs, **When** requesting 100 product records, **Then** all SKUs follow the specified pattern exactly 2. **Given** industry-specific field requirements, **When** requesting specialized data, **Then** generated values are contextually appropriate for the specified industry --- ### Edge Cases - What happens when requesting generation of extremely large datasets (10,000+ records)? - How does the system handle invalid or malformed schema definitions? - What occurs when requesting data types that aren't supported by the faker library? - How does the system respond to concurrent requests from multiple clients? - What happens when memory limits are approached during large data generation? ## Requirements *(mandatory)* ### Functional Requirements - **FR-001**: System MUST provide MCP-compliant server interface for client connections - **FR-002**: System MUST generate fake data for common types (names, emails, addresses, phone numbers, dates) - **FR-003**: System MUST support batch generation of multiple records in a single request - **FR-004**: System MUST allow specification of data schemas for structured dataset generation - **FR-005**: System MUST maintain referential integrity in multi-entity datasets - **FR-006**: System MUST provide configurable localization for generated data with support for English by default and additional locales as requested - **FR-007**: System MUST validate input schemas and return appropriate error messages for invalid requests - **FR-008**: System MUST support custom data patterns and formats for specialized use cases - **FR-009**: System MUST handle concurrent client connections without data corruption - **FR-010**: System MUST provide consistent seed-based generation for reproducible test data, accepting client-provided seeds or generating automatic seeds when none specified ### Key Entities - **DataGenerator**: Represents the core generation engine, manages faker instances and handles different data types - **Schema**: Defines the structure and relationships for complex dataset generation, includes field types and constraints - **GenerationRequest**: Represents a client request with parameters like count, type, schema, and optional seed - **GeneratedDataset**: Contains the generated fake data with metadata like generation timestamp and seed used ## Success Criteria *(mandatory)* ### Measurable Outcomes - **SC-001**: Developers can generate 1,000 basic records (names, emails, addresses) in under 2 seconds - **SC-002**: System maintains 99.9% uptime during continuous operation for 24 hours - **SC-003**: Generated data passes format validation for 100% of standard data types (email, phone, address formats) - **SC-004**: Developers can connect and start generating data within 5 minutes of server startup - **SC-005**: System handles 10 concurrent client connections without performance degradation - **SC-006**: Memory usage remains stable during generation of datasets up to 10,000 records - **SC-007**: 95% of developers can successfully generate their first dataset without consulting documentation ## Assumptions - **Localization**: English locale will be the primary default, with additional locales (French, German, Spanish, Japanese) available as common alternatives for international development teams - **Seed Management**: Client-provided seeds enable reproducible testing scenarios, while server-generated seeds provide convenience for one-time data generation needs - **Data Volume**: Most use cases involve datasets under 10,000 records; larger datasets will be handled with appropriate memory management and streaming techniques - **Client Environment**: Clients connecting to the MCP server have basic understanding of structured data concepts and can provide valid schema definitions - **Performance Expectations**: Response times under 2 seconds for standard operations align with typical developer workflow expectations during testing and development

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/funsjanssen/faker-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server