Skip to main content
Glama
research.md16.6 kB
# Research Document: Fake Data Generation MCP Server **Date**: 2025-11-05 **Feature**: 001-faker-mcp-server ## Overview This document consolidates research findings for implementing a TypeScript-based MCP server that provides fake data generation capabilities using the Faker library. Research covers MCP protocol implementation patterns, Faker.js best practices, TypeScript project setup with Vite, and performance optimization strategies. --- ## 1. MCP Protocol Implementation ### Decision: Use @modelcontextprotocol/sdk for TypeScript **Rationale**: - Official SDK maintained by Anthropic provides type-safe MCP protocol implementation - Handles transport layer (stdio, SSE) abstraction automatically - Built-in support for tool registration, request/response validation - Includes TypeScript definitions for all MCP protocol types - Reduces boilerplate and protocol compliance errors **Alternatives Considered**: - **Custom implementation from scratch**: Rejected due to high complexity of implementing full MCP protocol specification, increased risk of protocol violations, and significant development time - **Generic WebSocket/RPC library**: Rejected because MCP has specific message formats, capability negotiation, and error handling patterns that would require extensive customization **Key Implementation Patterns**: 1. **Server Setup**: Use `Server` class from SDK with `StdioServerTransport` for CLI integration 2. **Tool Registration**: Define tools with JSON Schema validation for parameters 3. **Error Handling**: Use MCP error codes (-32600 to -32603, -32000 to -32099) for standard errors 4. **Resource Management**: MCP tools are stateless; use seed-based generation for reproducibility **References**: - MCP Protocol Specification: https://spec.modelcontextprotocol.io/ - MCP TypeScript SDK: https://github.com/modelcontextprotocol/typescript-sdk --- ## 2. Faker.js Library Usage ### Decision: Use @faker-js/faker v8+ with locale management **Rationale**: - Industry-standard fake data generation library with extensive data types - Type-safe TypeScript API with comprehensive type definitions - Supports 70+ locales for internationalized data generation - Provides seed-based generation for reproducible test data - Active maintenance and regular updates - Minimal dependencies (zero runtime dependencies) **Alternatives Considered**: - **Chance.js**: Rejected due to limited TypeScript support, smaller community, and less comprehensive data type coverage - **Casual**: Rejected due to abandoned maintenance (last update 2016) and lack of TypeScript support - **Custom generators**: Rejected for common data types due to maintenance burden and lower quality/realism compared to Faker **Best Practices**: 1. **Seed Management**: Create separate Faker instances per request with optional seed parameter for reproducibility 2. **Locale Handling**: Default to 'en' locale, allow clients to specify alternative locales via tool parameters 3. **Performance**: Reuse Faker instances when possible, but create new instances for seed-based generation 4. **Data Quality**: Use Faker's semantic methods (faker.person.fullName()) over generic methods (faker.random.word()) **Key API Patterns**: ```typescript import { faker } from '@faker-js/faker'; // Seed-based generation for reproducibility const seededFaker = new Faker({ locale: en, seed: 12345 }); // Common data types faker.person.fullName() faker.internet.email() faker.location.streetAddress() faker.phone.number() faker.company.name() faker.date.between({ from: '2020-01-01', to: '2024-12-31' }) ``` **References**: - Faker.js Documentation: https://fakerjs.dev/ - Faker.js GitHub: https://github.com/faker-js/faker --- ## 3. TypeScript + Vite Project Setup ### Decision: Use Vite with minimal configuration for TypeScript compilation **Rationale**: - Vite provides fast development builds with native ESM support - Simple configuration compared to Webpack or Rollup - Built-in TypeScript support without additional plugins - Lightweight output suitable for Node.js server applications - Aligns with user requirement for "minimal number of libraries" **Alternatives Considered**: - **tsc (TypeScript compiler) only**: Rejected because Vite provides better development experience with HMR and faster incremental builds - **Webpack**: Rejected due to heavy configuration overhead and slower build times - **esbuild directly**: Rejected because Vite provides better defaults and tooling integration while using esbuild under the hood **Configuration Strategy**: 1. **TypeScript**: Enable strict mode, target ES2022 for Node.js 18+ compatibility 2. **Vite**: Use library mode for building Node.js executable 3. **Module System**: Use ES modules (type: "module" in package.json) 4. **Build Output**: Single executable file for distribution **Sample vite.config.ts**: ```typescript import { defineConfig } from 'vite'; import { resolve } from 'path'; export default defineConfig({ build: { lib: { entry: resolve(__dirname, 'src/index.ts'), formats: ['es'], fileName: 'index' }, rollupOptions: { external: [/^node:.*/, '@modelcontextprotocol/sdk', '@faker-js/faker'] }, target: 'node18', outDir: 'dist' } }); ``` **References**: - Vite Library Mode: https://vitejs.dev/guide/build.html#library-mode - TypeScript Configuration: https://www.typescriptlang.org/tsconfig --- ## 4. Testing Strategy ### Decision: Use Vitest for unit/integration tests, MCP Inspector for protocol compliance **Rationale**: - **Vitest**: Native Vite integration, fast execution, Jest-compatible API, built-in TypeScript support - **MCP Inspector**: Official tool for validating MCP protocol compliance and testing tools interactively - Minimal testing dependencies while providing comprehensive coverage **Alternatives Considered**: - **Jest**: Rejected due to ESM module handling complexity and slower execution compared to Vitest - **Mocha + Chai**: Rejected due to additional configuration needed for TypeScript and less modern API - **Manual MCP testing**: Rejected because MCP Inspector provides standardized validation and debugging capabilities **Testing Layers**: 1. **Unit Tests**: Test individual generators and utilities in isolation 2. **Integration Tests**: Test MCP tool handlers with mock SDK calls 3. **Contract Tests**: Validate MCP protocol compliance using MCP Inspector 4. **Performance Tests**: Benchmark data generation throughput and memory usage **Sample Test Structure**: ```typescript import { describe, it, expect } from 'vitest'; import { PersonGenerator } from '../src/generators/person-generator'; describe('PersonGenerator', () => { it('generates person with all required fields', () => { const generator = new PersonGenerator({ seed: 123 }); const person = generator.generate(); expect(person).toHaveProperty('name'); expect(person).toHaveProperty('email'); expect(person.email).toMatch(/^[^\s@]+@[^\s@]+\.[^\s@]+$/); }); }); ``` **References**: - Vitest Documentation: https://vitest.dev/ - MCP Inspector: https://github.com/modelcontextprotocol/inspector --- ## 5. Schema Definition & Validation ### Decision: Use TypeScript types + Zod for runtime validation **Rationale**: - TypeScript provides compile-time type safety for internal code - Zod enables runtime validation of client inputs with automatic type inference - MCP SDK expects JSON Schema for tool parameters; Zod can generate JSON Schema - Minimal additional dependency (Zod is ~14KB minified) **Alternatives Considered**: - **JSON Schema only**: Rejected because maintaining separate JSON Schema and TypeScript types leads to duplication and drift - **Joi**: Rejected due to larger bundle size and less TypeScript integration - **Manual validation**: Rejected because error-prone and high maintenance burden - **AJV**: Rejected because Zod provides better TypeScript integration and developer experience **Implementation Pattern**: ```typescript import { z } from 'zod'; // Define schema with Zod const PersonSchema = z.object({ count: z.number().min(1).max(10000).default(1), locale: z.enum(['en', 'fr', 'de', 'es', 'ja']).default('en'), seed: z.number().optional(), includeAddress: z.boolean().default(true) }); // Extract TypeScript type type PersonRequest = z.infer<typeof PersonSchema>; // Runtime validation const validatePersonRequest = (input: unknown): PersonRequest => { return PersonSchema.parse(input); }; // Convert to JSON Schema for MCP tool definition import { zodToJsonSchema } from 'zod-to-json-schema'; const jsonSchema = zodToJsonSchema(PersonSchema); ``` **References**: - Zod Documentation: https://zod.dev/ - zod-to-json-schema: https://github.com/StefanTerdell/zod-to-json-schema --- ## 6. Performance Optimization ### Decision: Implement batch generation with memory pooling and streaming for large datasets **Rationale**: - Generating 10,000 records in memory could exceed 100MB constraint - Batch processing allows for memory-efficient generation - Seed-based generation must be deterministic regardless of batch size **Key Strategies**: 1. **Memory Management**: Generate records in batches of 1000, clear intermediate results 2. **Caching**: Reuse Faker instances within a single request (but not across requests for isolation) 3. **Lazy Evaluation**: Generate data on-demand rather than pre-computing 4. **Profiling**: Use Node.js built-in profiler to identify bottlenecks **Performance Targets**: - Single record: <1ms generation time - 1,000 records: <1 second total time - 10,000 records: <10 seconds total time - Memory: <100MB for 10,000 records **Optimization Techniques**: ```typescript // Batch generation with memory management function* generateBatch(count: number, batchSize: number = 1000) { const batches = Math.ceil(count / batchSize); for (let i = 0; i < batches; i++) { const currentBatchSize = Math.min(batchSize, count - i * batchSize); const batch = Array.from({ length: currentBatchSize }, () => generateRecord()); yield batch; // Allow GC between batches for large datasets if (count > 5000) { setImmediate(() => {}); } } } ``` **References**: - Node.js Performance Best Practices: https://nodejs.org/en/docs/guides/simple-profiling - V8 Memory Management: https://v8.dev/blog/trash-talk --- ## 7. Referential Integrity for Multi-Entity Datasets ### Decision: Use ID pool management with foreign key tracking **Rationale**: - Structured datasets require relationships between entities (e.g., orders referencing users) - Must maintain referential integrity while using seed-based generation - Need to support various relationship types (one-to-many, many-to-many) **Implementation Approach**: 1. **ID Generation**: Generate unique IDs deterministically from seed 2. **Entity Order**: Generate parent entities before child entities 3. **Foreign Key Assignment**: Randomly select from available parent IDs for child entities 4. **Seed Propagation**: Derive entity-specific seeds from root seed for consistency **Pattern Example**: ```typescript interface DatasetSchema { users: { count: number; fields: string[] }; orders: { count: number; fields: string[]; foreignKeys: { userId: 'users' } }; } function generateDataset(schema: DatasetSchema, seed: number) { const userIds: string[] = []; // Generate users first const users = generateWithSeed(schema.users.count, seed, (index) => { const id = `user_${seed}_${index}`; userIds.push(id); return { id, ...generateUser() }; }); // Generate orders with foreign key references const orders = generateWithSeed(schema.orders.count, seed + 1, (index) => { const userId = userIds[index % userIds.length]; // Distribute evenly return { id: `order_${seed}_${index}`, userId, ...generateOrder() }; }); return { users, orders }; } ``` --- ## 8. Error Handling & Validation ### Decision: Use MCP standard error codes with detailed error messages **Rationale**: - MCP protocol defines standard error code ranges for interoperability - Detailed error messages improve developer experience - Structured errors enable programmatic error handling by clients **Error Categories**: 1. **Invalid Parameters** (-32602): Schema validation failures, invalid data types 2. **Internal Error** (-32603): Unexpected failures in data generation 3. **Custom Errors** (-32000 to -32099): Domain-specific errors (e.g., unsupported locale, seed overflow) **Error Response Pattern**: ```typescript import { McpError, ErrorCode } from '@modelcontextprotocol/sdk/types.js'; // Invalid parameter example throw new McpError( ErrorCode.InvalidParams, 'Invalid count parameter: must be between 1 and 10000', { received: 50000, max: 10000 } ); // Unsupported locale example throw new McpError( -32001, // Custom error code `Unsupported locale: ${locale}. Supported locales: en, fr, de, es, ja`, { received: locale, supported: ['en', 'fr', 'de', 'es', 'ja'] } ); ``` --- ## 9. Development Workflow ### Decision: Use conventional commits with automated testing on commit **Rationale**: - Conventional commits enable automated changelog generation - Pre-commit testing prevents broken commits from entering history - Aligns with TDD principles from constitution **Git Workflow**: 1. Create feature branch from main 2. Write failing tests (Red) 3. Implement minimal code to pass (Green) 4. Refactor for quality 5. Commit with conventional commit message 6. Pre-commit hook runs linting + tests 7. Push and create PR for review **Tooling**: - **Husky**: Git hooks management - **lint-staged**: Run linters on staged files only - **Commitlint**: Validate commit messages **Sample Configuration**: ```json // package.json { "lint-staged": { "*.ts": ["eslint --fix", "prettier --write", "vitest related --run"] }, "commitlint": { "extends": ["@commitlint/config-conventional"] } } ``` --- ## 10. Deployment & Distribution ### Decision: Distribute as npm package with MCP server configuration **Rationale**: - npm is standard distribution method for Node.js applications - Users can install globally or locally per project - MCP clients can reference server via package name in configuration **Distribution Strategy**: 1. **Package Structure**: Include built JS files, type definitions, and README 2. **Binary Entry Point**: Define bin field in package.json for CLI execution 3. **Version Management**: Use semantic versioning 4. **MCP Configuration**: Provide example configuration for MCP clients (Claude Desktop, etc.) **Sample package.json**: ```json { "name": "faker-mcp-server", "version": "1.0.0", "type": "module", "bin": { "faker-mcp-server": "./dist/index.js" }, "files": ["dist", "README.md", "LICENSE"], "engines": { "node": ">=18.0.0" } } ``` **MCP Client Configuration Example**: ```json { "mcpServers": { "faker": { "command": "npx", "args": ["faker-mcp-server"] } } } ``` --- ## Summary of Key Decisions | Area | Decision | Key Rationale | |------|----------|---------------| | MCP Implementation | @modelcontextprotocol/sdk | Official SDK, type-safe, reduces protocol errors | | Data Generation | @faker-js/faker v8+ | Industry standard, 70+ locales, seed support | | Build Tool | Vite + TypeScript strict mode | Fast builds, minimal config, ESM support | | Testing | Vitest + MCP Inspector | Native Vite integration, protocol validation | | Validation | TypeScript + Zod | Compile-time + runtime safety, minimal deps | | Performance | Batch generation with pooling | Memory efficiency for large datasets | | Referential Integrity | ID pool with foreign key tracking | Maintains relationships in multi-entity data | | Error Handling | MCP standard codes + details | Protocol compliance, good DX | | Development | Conventional commits + pre-commit | TDD alignment, quality gates | | Distribution | npm package with CLI entry point | Standard Node.js distribution | --- ## Open Questions & Risks ### Open Questions None - all technical clarifications have been resolved through research. ### Identified Risks 1. **Memory constraints for very large datasets**: Mitigated by batch generation and documentation guidance 2. **Locale data quality variance**: Faker.js quality varies by locale; will document well-supported locales 3. **MCP protocol evolution**: Monitor MCP spec updates and SDK changes; pin SDK version initially --- **Research Complete**: 2025-11-05 **Next Phase**: Phase 1 - Design & Contracts

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/funsjanssen/faker-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server