MCP Server Whisper

architecture.md•10.3 KiB

# MCP Server Whisper Architecture ## Overview MCP Server Whisper is an MCP-compatible server that provides audio transcription and processing capabilities using OpenAI's Whisper and GPT-4o models. It follows a **layered architecture** based on Domain-Driven Design principles with clean separation of concerns. ``` ┌─────────────────┐ │ MCP Client │ (e.g., Claude Desktop) │ (Claude) │ └────────┬────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ MCP Server (FastMCP) │ │ server.py (19 lines) │ └────────┬────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Tools Layer │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ file_tools │ │ audio_tools │ │transcription │ │ │ │ │ │ │ │ tools │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ └────────┬────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Services Layer │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ FileService │ │AudioService │ │ Transcription│ │ │ │ │ │ │ │ Service │ │ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ └─────────┼──────────────────┼──────────────────┼─────────┘ │ │ │ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────┐ │ Domain Layer │ │ ┌──────────────────┐ ┌──────────────────┐ │ │ │ FileFilterSorter │ │ AudioProcessor │ │ │ │ (pure logic) │ │ (pure logic) │ │ │ └──────────────────┘ └──────────────────┘ │ └─────────────────────────────────────────────────────────┘ │ │ ▼ ▼ ┌─────────────────────────────────────────────────────────┐ │ Infrastructure Layer │ │ ┌──────────────────┐ ┌──────────────────────────┐ │ │ │FileSystemRepo │ │ OpenAIClientWrapper │ │ │ │(File I/O) │ │ (API Integration) │ │ │ └────────┬─────────┘ └──────────┬───────────────┘ │ └───────────┼────────────────────────┼───────────────────┘ │ │ ▼ ▼ ┌────────────────┐ ┌────────────────┐ │ Audio Storage │ │ OpenAI API │ │ (File System) │ │ │ └────────────────┘ └────────────────┘ ``` ## Layered Architecture ### 1. MCP Server Layer (server.py - 19 lines) - Minimal FastMCP server initialization - Tool registration only - No business logic ### 2. Tools Layer (tools/) - **file_tools.py**: File management MCP tools - **audio_tools.py**: Audio conversion/compression MCP tools - **transcription_tools.py**: Transcription MCP tools - **tts_tools.py**: Text-to-speech MCP tools - Thin layer: validation + service orchestration + response formatting ### 3. Services Layer (services/) - **FileService**: File discovery, filtering, sorting orchestration - **AudioService**: Audio conversion/compression workflows - **TranscriptionService**: Transcription API coordination - **TTSService**: Text-to-speech generation workflows - Orchestrates domain logic + infrastructure ### 4. Domain Layer (domain/) - **AudioProcessor**: Pure audio processing algorithms (conversion, compression, concatenation) - **FileFilterSorter**: Pure filtering and sorting logic - **Zero external dependencies** - only business rules - Highly testable and reusable ### 5. Infrastructure Layer (infrastructure/) - **FileSystemRepository**: All file I/O operations - **OpenAIClientWrapper**: OpenAI API integration - **AudioFileCache**: LRU caching for metadata - Abstracts external dependencies ### 6. Supporting Modules - **models/**: Pydantic models for data validation - **constants.py**: All constants, enums, type aliases - **exceptions.py**: Custom exception hierarchy - **config.py**: Configuration management - **utils/**: Pure utility functions ## Data Flow Example: Transcribe Audio 1. **MCP Client Request** → Claude calls `transcribe_audio` tool 2. **Tools Layer** → `transcription_tools.py` validates input parameters 3. **Services Layer** → `TranscriptionService` orchestrates the workflow: - Uses `FileSystemRepository` to read audio file - Uses `OpenAIClientWrapper` to call API 4. **Infrastructure Layer** → Executes I/O and API calls: - `FileSystemRepository.read_audio_file()` → async file read - `OpenAIClientWrapper.transcribe_audio()` → API call to OpenAI 5. **Response** → Flows back up: Infrastructure → Service → Tool → MCP → Client ## Architectural Benefits ### Separation of Concerns - Each layer has a single, well-defined responsibility - Changes in one layer don't affect others - Easy to understand and navigate ### Testability - **Domain Layer**: Pure functions, no mocking needed - **Infrastructure Layer**: Easy to mock external dependencies - **Services Layer**: Test orchestration logic with mocked infrastructure - **Tools Layer**: Integration tests with real services ### Maintainability - **server.py**: Reduced from 892 → 19 lines (97.6% reduction) - **Clear boundaries**: Each module has a specific purpose - **Type safety**: Full mypy compliance with native types - **Extensibility**: Add new features without modifying existing code ### Performance - **Structured Concurrency**: Task groups using anyio for safe parallel processing - **LRU Caching**: Smart caching with mtime-based invalidation - **Async I/O**: Non-blocking file operations with aiofiles - **Concurrent API Calls**: Parallel OpenAI requests with result collection ## Project Structure ``` src/mcp_server_whisper/ ├── server.py # MCP server (19 lines) ├── config.py # Configuration management ├── constants.py # Constants & enums ├── exceptions.py # Custom exceptions │ ├── models/ # Pydantic models (4 files) │ ├── base.py │ ├── audio.py │ ├── transcription.py │ └── tts.py │ ├── domain/ # Business logic (2 files) │ ├── audio_processor.py │ └── file_filter.py │ ├── infrastructure/ # External dependencies (4 files) │ ├── file_system.py │ ├── openai_client.py │ └── cache.py │ ├── services/ # Orchestration (4 files) │ ├── file_service.py │ ├── audio_service.py │ ├── transcription_service.py │ └── tts_service.py │ ├── tools/ # MCP tools (4 files) │ ├── file_tools.py │ ├── audio_tools.py │ ├── transcription_tools.py │ └── tts_tools.py │ └── utils/ # Utilities (1 file) └── text_utils.py ``` **Total: 31 files, 2,453 lines** (vs original: 1 file, 892 lines) ## Technology Stack - **Language**: Python 3.10+ - **Audio Processing**: pydub, audioop-lts (Python 3.13+) - **Async Framework**: anyio (structured concurrency), aiofiles - **Concurrency**: aioresult for result collection from task groups - **MCP Framework**: FastMCP - **API Integration**: OpenAI Python client - **Data Validation**: Pydantic models - **Configuration**: pydantic-settings - **Type Checking**: mypy (100% compliant, native types only) - **Code Quality**: ruff for linting and formatting ## Design Patterns - **Repository Pattern**: FileSystemRepository abstracts file I/O - **Dependency Injection**: Services receive dependencies via constructor - **Strategy Pattern**: Enhancement templates for different transcription styles - **Facade Pattern**: Services provide simple interfaces to complex operations - **Singleton Pattern**: Configuration and cache instances

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/arcaputo3/mcp-server-whisper'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

architecture.md•10.3 KiB