mcp-json-yaml-toml

ARCHITECTURE.md•27.1 KiB

# Architecture Research: Loguru Logging & Enhanced Schema Validation **Domain:** MCP server for structured data manipulation (JSON/YAML/TOML) **Researched:** 2026-02-17 **Focus:** Integration of loguru logging replacement and enhanced schema validation into existing layered architecture **Confidence:** HIGH (codebase analysis + verified library documentation) ## Current Architecture Snapshot The codebase completed a v1.1 refactoring that decomposed a monolithic `server.py` into a layered architecture: ``` server.py ~84 lines Entry point: FastMCP init + tool imports tools/ 6 files Thin @mcp.tool decorators delegating to services services/ 7 files Business logic (get, mutation, query, diff, schema_validation, pagination) backends/ 3 files yq binary management + execution abstraction formats/ 2 files Format detection, content parsing, value parsing schemas/ 5 files Schema discovery, loading, scanning, IDE cache, manager facade models/ 2 files Pydantic response models, schema dataclasses config.py Environment-based format configuration telemetry.py OpenTelemetry tracer helper (no-op when SDK absent) lmql_constraints.py LMQL constraint validation ``` ### Current Logging State Three stdlib `logging` usage sites exist, all at DEBUG level: | File | Usage | Logger Pattern | | ---------------------------- | ------------------------------------------------------------------------------- | ------------------------------ | | `backends/binary_manager.py` | `logger = logging.getLogger(__name__)` with `.debug()`, `.info()`, `.warning()` | Module-level named logger | | `schemas/manager.py` | `logging.debug()` (module-level calls) | Direct `logging.debug()` calls | | `schemas/scanning.py` | `logging.debug()` (1 call) | Direct `logging.debug()` call | No centralized logging configuration exists. No log output is visible by default because no handlers are configured for the `mcp_json_yaml_toml` namespace. ### Current Schema Validation State Schema validation lives in `services/schema_validation.py`: - `_validate_against_schema(data, schema_path)` returns `tuple[bool, str]` - Uses `Draft7Validator` or `Draft202012Validator` based on `$schema` field - `referencing.Registry` with httpx retrieval for remote `$ref` resolution - Catches **first** `ValidationError` only, returns `e.message` as a string - Called from `_validate_and_write_content()` in mutation_operations.py (pre-write validation) - Called from `_handle_schema_validate()` in tools/schema.py (explicit validation action) ## Integration Analysis: Loguru Logging ### What Changes **NEW file:** `packages/mcp_json_yaml_toml/logging.py` -- centralized logging configuration This module provides: 1. Loguru configuration (remove default sink, add controlled sinks) 2. InterceptHandler for stdlib logging compatibility 3. Environment-variable-driven sink setup 4. A single `configure_logging()` function called from server startup **MODIFIED files (logging consumers):** | File | Current | After | | ---------------------------- | ------------------------------------------------ | -------------------------------------------------- | | `backends/binary_manager.py` | `import logging` + `logging.getLogger(__name__)` | `from loguru import logger` | | `schemas/manager.py` | `import logging` + `logging.debug(...)` | `from loguru import logger` | | `schemas/scanning.py` | `import logging` + `logging.debug(...)` | `from loguru import logger` | | `server.py` | No logging | Add `configure_logging()` call before tool imports | **NEW logging sites (should be added during integration):** | File | What to Log | Level | | --------------------------------- | ----------------------------------------------- | ------------- | | `backends/yq.py` | yq subprocess commands, execution times, errors | DEBUG/WARNING | | `services/schema_validation.py` | Validation results, schema load failures | DEBUG/WARNING | | `services/mutation_operations.py` | File write operations, optimization results | DEBUG | | `tools/data.py` | Tool invocations with file paths | DEBUG | | `config.py` | Format enable/disable decisions | DEBUG | ### Architecture of logging.py ```python """Centralized logging configuration for MCP JSON/YAML/TOML server. Provides loguru-based structured logging with: - Stdlib logging interception (captures httpx, jsonschema, etc.) - Environment-variable-driven configuration - MCP-safe defaults (stderr only, never stdout) """ from __future__ import annotations import logging import os import sys from loguru import logger class InterceptHandler(logging.Handler): """Route stdlib logging to loguru. Captures log messages from third-party libraries (httpx, jsonschema, opentelemetry) and routes them through loguru's unified pipeline. """ def emit(self, record: logging.LogRecord) -> None: # Get corresponding loguru level try: level = logger.level(record.levelname).name except ValueError: level = str(record.levelno) # Find caller from where the logged message originated frame, depth = logging.currentframe(), 0 while frame and (depth == 0 or frame.f_code.co_filename == logging.__file__): frame = frame.f_back depth += 1 logger.opt(depth=depth, exception=record.exc_info).log( level, record.getMessage() ) def configure_logging() -> None: """Configure logging for the MCP server. Environment variables: MCP_LOG_LEVEL: Minimum log level for stderr output (default: none/disabled) MCP_LOG_DIR: Directory for file logging with rotation (default: disabled) MCP_LOG_FORMAT: Log format string (default: loguru default) CRITICAL: Never writes to stdout -- MCP protocol uses stdout for messages. """ # Remove loguru's default stderr handler logger.remove() # Stderr logging (opt-in via MCP_LOG_LEVEL) log_level = os.getenv("MCP_LOG_LEVEL", "").strip().upper() if log_level: logger.add( sys.stderr, level=log_level, format="<level>{level: <8}</level> | " "<cyan>{name}</cyan>:<cyan>{function}</cyan>:<cyan>{line}</cyan> - " "<level>{message}</level>", diagnose=False, # No sensitive data in tracebacks ) # File logging (opt-in via MCP_LOG_DIR) log_dir = os.getenv("MCP_LOG_DIR", "").strip() if log_dir: logger.add( f"{log_dir}/mcp-json-yaml-toml.log", rotation="10 MB", retention="3 days", level=log_level or "DEBUG", format="{time:YYYY-MM-DD HH:mm:ss.SSS} | {level: <8} | " "{name}:{function}:{line} - {message}", serialize=False, # Human-readable by default; set True for JSON diagnose=False, ) # Intercept stdlib logging -> loguru logging.basicConfig(handlers=[InterceptHandler()], level=0, force=True) ``` ### Integration Point: server.py ```python # server.py -- add before tool imports from mcp_json_yaml_toml.logging import configure_logging configure_logging() # Must run before any library imports that log mcp = FastMCP("mcp-json-yaml-toml") schema_manager = SchemaManager() ``` ### MCP Protocol Safety The MCP stdio transport uses stdout for protocol messages. Loguru's default handler writes to stderr, which is correct. The `configure_logging()` function explicitly targets `sys.stderr` and never adds a stdout sink. This is enforced by design, not by accident. ### Stdlib Interception Scope The InterceptHandler captures logging from: - `httpx` (HTTP request logging in binary_manager.py and schemas/manager.py) - `jsonschema` (validation warnings) - `opentelemetry` (if SDK configured) - Any future library that uses stdlib logging ### Dependency Impact Adding `loguru>=0.7.3` to `dependencies` in pyproject.toml. Loguru has zero dependencies (pure Python). No transitive dependency conflicts. ### Type Checking Compatibility Loguru provides type stubs in its package. The `loguru-mypy` plugin (optional, add to dev dependencies) enables better type inference for `logger.opt()` and `logger.bind()`. Basedpyright works with loguru's built-in stubs without additional plugins. ## Integration Analysis: Enhanced Schema Validation ### What Changes **MODIFIED file:** `services/schema_validation.py` -- the core change Current signature and behavior: ```python def _validate_against_schema(data: Any, schema_path: Path) -> tuple[bool, str] ``` New signature: ```python def _validate_against_schema( data: Any, schema_path: Path, *, collect_all: bool = True ) -> ValidationResult ``` **NEW model:** Add `ValidationResult` dataclass to `models/responses.py` (or a new `models/validation.py`): ```python @dataclass class SchemaValidationError: """A single schema validation error with path context.""" message: str json_path: str # e.g., "$.database.port" schema_path: str # e.g., "/properties/database/properties/port/type" validator: str # e.g., "type", "required", "enum" context: list[str] = field(default_factory=list) # Sub-errors for anyOf/oneOf @dataclass class ValidationResult: """Result of schema validation with all errors collected.""" is_valid: bool errors: list[SchemaValidationError] = field(default_factory=list) error_count: int = 0 @property def message(self) -> str: """Backward-compatible single message string.""" if self.is_valid: return "Schema validation passed" if not self.errors: return "Schema validation failed" return self.errors[0].message def to_dict(self) -> dict[str, Any]: """Structured error output for tool responses.""" ... ``` ### Changes to Callers **`services/mutation_operations.py` -- `_validate_and_write_content()`:** ```python # Before: is_valid, msg = _validate_against_schema(validation_data, schema_path) if not is_valid: raise ToolError(f"Schema validation failed: {msg}") # After: result = _validate_against_schema(validation_data, schema_path) if not result.is_valid: # Include structured errors in ToolError for AI context error_details = "; ".join( f"{e.json_path}: {e.message}" for e in result.errors[:5] ) raise ToolError(f"Schema validation failed ({result.error_count} errors): {error_details}") ``` **`tools/schema.py` -- `_handle_schema_validate()`:** ```python # Before: is_valid, message = _validate_against_schema(result.data, schema_file) validation_results["schema_validated"] = is_valid validation_results["schema_message"] = message # After: validation = _validate_against_schema(result.data, schema_file) validation_results["schema_validated"] = validation.is_valid validation_results["schema_message"] = validation.message validation_results["schema_errors"] = [ { "path": e.json_path, "message": e.message, "validator": e.validator, } for e in validation.errors ] validation_results["schema_error_count"] = validation.error_count ``` ### Implementation Detail: iter_errors The key change in `_validate_against_schema` is replacing: ```python # Current: raises on first error Draft202012Validator(schema, registry=registry).validate(data) ``` With: ```python # New: collects all errors validator_cls = Draft7Validator if "draft-07" in schema_dialect else Draft202012Validator validator = validator_cls(schema, registry=registry) errors = sorted(validator.iter_errors(data), key=lambda e: list(e.absolute_path)) ``` `iter_errors()` is a built-in method on all jsonschema validators. It yields `ValidationError` objects without raising. Each error provides `.absolute_path`, `.schema_path`, `.message`, `.validator`, and `.context` (sub-errors for compound validators like `anyOf`/`oneOf`). ### Backward Compatibility The `ValidationResult.message` property provides a single-string summary identical to the current output. Callers that previously destructured `tuple[bool, str]` need updating, but there are exactly **2 call sites** (identified above). The response dict format is additive -- `schema_message` and `schema_validated` fields are preserved, new `schema_errors` and `schema_error_count` fields are added. ### Response Model Changes The `ValidationResponse` model in `models/responses.py` gains optional fields: ```python class ValidationResponse(ToolResponse): format: str | None = None syntax_valid: bool = False schema_validated: bool = False syntax_message: str | None = None schema_message: str | None = None schema_file: str | None = None overall_valid: bool = False # NEW: schema_errors: list[dict[str, str]] | None = None schema_error_count: int = 0 ``` ## Component Boundary Map ### New Components | Component | Location | Responsibility | Depends On | | ----------------------- | ------------------------------------------------- | ---------------------------------------------------------------- | ---------------------- | | `logging.py` | `packages/mcp_json_yaml_toml/logging.py` | Loguru configuration, InterceptHandler, environment-driven sinks | loguru, stdlib logging | | `SchemaValidationError` | `models/responses.py` (or `models/validation.py`) | Single validation error with path context | None (dataclass) | | `ValidationResult` | `models/responses.py` (or `models/validation.py`) | Aggregated validation result with backward-compatible `.message` | SchemaValidationError | ### Modified Components | Component | What Changes | Why | | --------------------------------- | ---------------------------------------------------------------------------- | --------------------------------------------- | | `server.py` | Add `configure_logging()` call | Initialize logging before any module loads | | `backends/binary_manager.py` | Replace `import logging` with `from loguru import logger` | Unified logging pipeline | | `schemas/manager.py` | Replace `logging.debug()` with `logger.debug()` | Unified logging pipeline | | `schemas/scanning.py` | Replace `logging.debug()` with `logger.debug()` | Unified logging pipeline | | `services/schema_validation.py` | Return `ValidationResult` instead of `tuple[bool, str]`; use `iter_errors()` | Collect all errors, provide structured output | | `services/mutation_operations.py` | Update `_validate_and_write_content()` to use `ValidationResult` | Caller adapts to new return type | | `tools/schema.py` | Update `_handle_schema_validate()` to include structured errors | Caller adapts to new return type | | `models/responses.py` | Add `schema_errors` and `schema_error_count` to `ValidationResponse` | Structured error output | ### Unchanged Components | Component | Why Unchanged | | ---------------------- | ------------------------------------------------------------------------------------ | | `backends/base.py` | No logging, no schema validation | | `backends/yq.py` | No stdlib logging (uses telemetry spans); logging additions are optional enhancement | | `formats/base.py` | Pure utilities, no logging needed | | `tools/data.py` | Delegates to services; logging happens at service level | | `tools/query.py` | Delegates to services | | `tools/convert.py` | Delegates to services | | `tools/diff.py` | Delegates to services | | `config.py` | Pure functions; logging additions are optional enhancement | | `telemetry.py` | OpenTelemetry traces are orthogonal to loguru logging | | `lmql_constraints.py` | No logging, no schema validation | | `schemas/models.py` | Pure dataclasses | | `schemas/loading.py` | Pure extraction functions | | `schemas/ide_cache.py` | No direct logging | ## Data Flow Changes ### Logging Data Flow (New) ``` Application startup (server.py) | v configure_logging() |-- Remove loguru default sink |-- Add stderr sink (if MCP_LOG_LEVEL set) |-- Add file sink (if MCP_LOG_DIR set) |-- Install InterceptHandler on stdlib root logger | v All subsequent code: | |-- loguru.logger.debug/info/warning/error() --> loguru pipeline |-- stdlib logging.debug/info/warning/error() --> InterceptHandler --> loguru pipeline |-- httpx internal logging --> InterceptHandler --> loguru pipeline | v Sinks (configured): |-- sys.stderr (filtered by MCP_LOG_LEVEL) |-- file (rotated, if MCP_LOG_DIR set) ``` ### Schema Validation Data Flow (Changed) ``` BEFORE: _validate_against_schema(data, schema_path) -> (bool, str) |-- Validator.validate(data) # raises on FIRST error |-- catch ValidationError -> return (False, e.message) AFTER: _validate_against_schema(data, schema_path) -> ValidationResult |-- validator.iter_errors(data) # yields ALL errors |-- for each error: | |-- extract absolute_path -> json_path string | |-- extract schema_path -> schema location | |-- extract validator keyword | |-- extract context (sub-errors for anyOf/oneOf) | |-- build SchemaValidationError |-- return ValidationResult(is_valid, errors, error_count) ``` ## Patterns to Follow ### Pattern: Environment-Gated Logging Logging is disabled by default. No file output, no stderr output unless opted in. This matches the existing telemetry pattern (`telemetry.py` returns no-op tracer when OTEL SDK not configured). ```python # Zero overhead when logging disabled logger.remove() # Remove default handler # Opt-in only if os.getenv("MCP_LOG_LEVEL"): logger.add(sys.stderr, level=...) ``` ### Pattern: Structured Context with bind() Loguru's `bind()` adds context to log messages without changing the message string: ```python # In binary_manager.py logger.bind(yq_version=version, platform=f"{system}/{arch}").info("Downloading yq binary") # In mutation_operations.py logger.bind(file=str(path), operation="set", format=input_format).debug("Executing mutation") ``` ### Pattern: Backward-Compatible Return Type Evolution The schema validation change uses a dataclass with a `.message` property that returns the same string the old `tuple[bool, str]` provided. This allows incremental migration of callers. ## Anti-Patterns to Avoid ### Anti-Pattern: Logging to stdout MCP servers use stdout for protocol messages. Any loguru sink pointing to `sys.stdout` will corrupt the MCP protocol. The `configure_logging()` function explicitly prevents this. ### Anti-Pattern: Per-Module Logger Configuration Loguru uses a single global `logger` object. Do not configure sinks in individual modules. All configuration happens in `logging.py` and is called once from `server.py`. ```python # WRONG: configuring loguru in a module # backends/binary_manager.py from loguru import logger logger.add("binary.log") # NO -- creates duplicate sinks # RIGHT: just use the global logger from loguru import logger logger.info("message") # Sinks configured centrally ``` ### Anti-Pattern: Catching All Validation Errors in a Loop Don't manually iterate and try/except. Use `iter_errors()` which is specifically designed for collecting all errors: ```python # WRONG: catching one at a time errors = [] try: validator.validate(data) except ValidationError as e: errors.append(e) # Only gets the first one # RIGHT: iter_errors yields all errors = list(validator.iter_errors(data)) ``` ## Build Order The two features are independent and can be built in either order. However, building loguru first provides logging for schema validation debugging. ### Recommended Order **Step 1: Loguru logging infrastructure (logging.py + server.py integration)** - Create `logging.py` with `configure_logging()` - Add `configure_logging()` call to `server.py` - Add `loguru>=0.7.3` to dependencies - No behavior change until env vars are set **Step 2: Replace stdlib logging in existing files** - `backends/binary_manager.py`: Replace `import logging` / `logging.getLogger` with `from loguru import logger` - `schemas/manager.py`: Replace `logging.debug()` with `logger.debug()` - `schemas/scanning.py`: Replace `logging.debug()` with `logger.debug()` - Total: 3 files, mechanical replacement **Step 3: Add logging to key code paths (optional, can be deferred)** - `backends/yq.py`: Log subprocess commands at DEBUG - `services/mutation_operations.py`: Log file write operations - `tools/*.py`: Log tool invocations **Step 4: Schema validation models** - Create `SchemaValidationError` and `ValidationResult` in `models/` - Update `ValidationResponse` in `models/responses.py` **Step 5: Enhanced schema validation logic** - Modify `_validate_against_schema()` in `services/schema_validation.py` - Replace `validate()` with `iter_errors()` - Build `ValidationResult` from error list **Step 6: Update callers** - `services/mutation_operations.py`: Update `_validate_and_write_content()` - `tools/schema.py`: Update `_handle_schema_validate()` **Step 7: Tests** - Test `configure_logging()` with env vars - Test InterceptHandler captures stdlib logging - Test `_validate_against_schema()` returns multiple errors - Test backward compatibility of `ValidationResult.message` - Test `schema_errors` field in validation response ### Dependency Graph ``` Step 1 (logging.py) | v Step 2 (replace stdlib logging) Step 4 (validation models) | | v v Step 3 (add new logging sites) Step 5 (iter_errors logic) | v Step 6 (update callers) | v Step 7 (tests) ``` Steps 1-3 (logging) and Steps 4-6 (schema validation) are independent tracks. Step 7 covers both. ## Testing Strategy ### Logging Tests ```python # Test that configure_logging removes default handler def test_configure_logging_removes_default(): configure_logging() # No output when MCP_LOG_LEVEL not set # Test stderr sink activation def test_configure_logging_with_level(monkeypatch, capsys): monkeypatch.setenv("MCP_LOG_LEVEL", "DEBUG") configure_logging() logger.debug("test message") assert "test message" in capsys.readouterr().err # Test InterceptHandler routes stdlib to loguru def test_intercept_handler_captures_stdlib(monkeypatch, capsys): monkeypatch.setenv("MCP_LOG_LEVEL", "DEBUG") configure_logging() import logging logging.getLogger("test").warning("stdlib message") assert "stdlib message" in capsys.readouterr().err # Test file sink rotation def test_file_logging(monkeypatch, tmp_path): monkeypatch.setenv("MCP_LOG_DIR", str(tmp_path)) monkeypatch.setenv("MCP_LOG_LEVEL", "DEBUG") configure_logging() logger.info("file test") log_file = tmp_path / "mcp-json-yaml-toml.log" assert log_file.exists() ``` ### Schema Validation Tests ```python # Test multiple errors collected def test_validate_collects_all_errors(tmp_path): schema = {"type": "object", "properties": {"a": {"type": "integer"}, "b": {"type": "string"}}, "required": ["a", "b"]} schema_file = tmp_path / "schema.json" schema_file.write_text(json.dumps(schema)) result = _validate_against_schema({"a": "not_int", "b": 42}, schema_file) assert not result.is_valid assert result.error_count == 2 # both type violations # Test backward-compatible message def test_validation_result_message_compat(tmp_path): # ... setup with one error result = _validate_against_schema(invalid_data, schema_file) assert isinstance(result.message, str) assert len(result.message) > 0 # Test json_path extraction def test_validation_error_includes_path(tmp_path): # ... nested schema with deep path result = _validate_against_schema(data, schema_file) assert any("database.port" in e.json_path for e in result.errors) ``` ## Sources - [Loguru GitHub](https://github.com/Delgan/loguru) -- HIGH confidence (primary source) - [Loguru migration guide](https://loguru.readthedocs.io/en/stable/resources/migration.html) -- HIGH confidence (official docs) - [Loguru InterceptHandler recipe](https://github.com/Delgan/loguru/issues/78) -- HIGH confidence (author's recipe) - [Loguru type hints](https://loguru.readthedocs.io/en/stable/api/type_hints.html) -- HIGH confidence (official docs) - [loguru-mypy plugin](https://github.com/kornicameister/loguru-mypy) -- MEDIUM confidence (community plugin) - [jsonschema error handling](https://python-jsonschema.readthedocs.io/en/latest/errors/) -- HIGH confidence (official docs) - [jsonschema iter_errors](https://python-jsonschema.readthedocs.io/en/stable/errors/) -- HIGH confidence (official docs) - [MCP stdio transport logging constraint](https://medium.com/@laurentkubaski/understanding-mcp-stdio-transport-protocol-ae3d5daf64db) -- MEDIUM confidence (community article, verified against MCP spec) - [FastMCP logging docs](https://gofastmcp.com/servers/logging) -- HIGH confidence (official FastMCP docs) - [Loguru PyPI (v0.7.3)](https://pypi.org/project/loguru/) -- HIGH confidence (primary source) - Current codebase analysis: direct file reading of all source modules -- HIGH confidence (primary evidence) --- _Architecture research for: Loguru logging replacement and enhanced schema validation integration_ _Researched: 2026-02-17_

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bitflight-devops/mcp-json-yaml-toml'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

ARCHITECTURE.md•27.1 KiB