AI Debugger

Overview Schema Related Servers Score Discussions

checksum-services.md•9.92 KiB

# Checksum Services Architecture ## Overview AIDB uses checksum-based cache invalidation to intelligently avoid unnecessary rebuilds and reinstallations. The checksum service pattern provides a shared foundation for services that track file changes and determine when cached artifacts need regeneration. **Key Benefits:** - Avoid unnecessary Docker image rebuilds (saves minutes per build) - Skip redundant dependency installations (especially important in containers) - Container lifecycle tracking (detect when containers restart) - Consistent hash-based decision making across all services **For implementation examples:** See [checksum-service-implementations.md](checksum-service-implementations.md) ## ChecksumServiceBase Pattern ### Abstract Base Class **Location:** `src/aidb_common/io/checksum_service_base.py` Provides common functionality for tracking file checksums and managing cache files. All checksum services inherit from this base class. **Core Responsibilities:** - Compute and cache file hashes - Compare current vs cached hashes - Track artifact context (e.g., container lifecycle) - Determine when updates are needed - Manage cache file I/O ### Abstract Methods (Must Implement) Subclasses must implement these three methods: #### 1. `_get_hash_cache_file(identifier: str) -> Path` Return the cache file path for a given identifier. **Purpose:** Define where to store hash cache for each artifact **Example:** ```python def _get_hash_cache_file(self, identifier: str) -> Path: return self.cache_dir / f"{identifier}-image-hash" ``` #### 2. `_compute_hash(identifier: str) -> str` Compute the current hash for a given identifier. **Purpose:** Define what files/content to track for changes **Example:** ```python def _compute_hash(self, identifier: str) -> str: files_to_track = [Path("pyproject.toml"), Path("Dockerfile")] return compute_files_hash(files_to_track) ``` #### 3. `_exists(identifier: str) -> bool` Check if the cached artifact exists. **Purpose:** Determine if artifact needs creation vs update **Example:** ```python def _exists(self, identifier: str) -> bool: image_name = f"aidb-test-{identifier}:latest" result = docker_inspect(image_name) return result.returncode == 0 ``` ### Optional Override #### `_get_artifact_context(identifier: str) -> dict[str, str]` Return context metadata that invalidates cache when changed. **Purpose:** Track environment-specific state (e.g., container ID) **Default:** Returns empty dict (no context tracking) **Example:** ```python def _get_artifact_context(self, identifier: str) -> dict[str, str]: if container_marker_file.exists(): container_id = container_marker_file.read_text().strip() return {"container_id": container_id} return {} ``` ### Public API #### `needs_update(identifier: str) -> tuple[bool, str]` Check if cached artifact needs updating. **Returns:** `(needs_update: bool, reason: str)` **Possible Reasons:** - `"Artifact '{identifier}' not found"` - Artifact doesn't exist - `"Artifact context changed (e.g., container restart)"` - Context mismatch - `"No cached hash found (first run)"` - No cached hash file - `"Source files changed (hash mismatch)"` - Files changed - `"Up-to-date"` - No update needed **Decision Logic:** 1. Check if artifact exists → if not, return `(True, "not found")` 1. Compare artifact context → if changed, return `(True, "context changed")` 1. Compare file hashes → if different, return `(True, "hash mismatch")` 1. Otherwise → return `(False, "up-to-date")` **Example:** ```python service = DockerImageChecksumService(repo_root) needs_rebuild, reason = service.needs_update("python") if needs_rebuild: print(f"Rebuild required: {reason}") # Build image... service.mark_updated("python") ``` #### `mark_updated(identifier: str) -> None` Mark artifact as updated by saving current hash. **Purpose:** Update cache after successful artifact generation **What it does:** 1. Compute current hash 1. Get current artifact context 1. Save both to cache file **Example:** ```python # After successfully building Docker image service.mark_updated("python") ``` ## Cache File Format ### Two-Line Format ``` <sha256_hash> {"container_id": "abc123", "other_context": "value"} ``` **Line 1:** SHA256 hash of tracked files (hex digest) **Line 2:** JSON object with context metadata (optional) ### Examples **Without context:** ``` e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 ``` **With container context:** ``` e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 {"container_id": "aidb-test-python-abc123"} ``` ### File I/O **Reading:** ```python def _get_cached_hash(self, identifier: str) -> str | None: content = read_cache_file(cache_file) if not content: return None lines = content.strip().split("\n") return lines[0] if lines else None ``` **Writing:** ```python def _save_hash(self, identifier: str, hash_value: str) -> None: content = hash_value + "\n" context = self._get_artifact_context(identifier) if context: content += json.dumps(context) + "\n" write_cache_file(cache_file, content.strip()) ``` ## Concrete Implementations AIDB includes two concrete checksum service implementations: 1. **DockerImageChecksumService** - Tracks Docker test image rebuilds 1. **FrameworkDepsChecksumService** - Tracks framework dependency installations with container lifecycle awareness **See:** [checksum-service-implementations.md](checksum-service-implementations.md) for detailed documentation, usage examples, and integration patterns ## Creating a New Checksum Service ### Step-by-Step Guide 1. **Identify what to track:** - What files affect the artifact? - Does the artifact actually exist (image) or is it ephemeral (dependencies)? - Does environment context matter (container ID, runtime version)? 1. **Create service class:** ```python from pathlib import Path from aidb_common.io import ChecksumServiceBase, compute_files_hash class MyChecksumService(ChecksumServiceBase): def __init__(self, repo_root: Path): cache_dir = repo_root / ".cache" / "my-service" super().__init__(cache_dir) self.repo_root = repo_root def _get_hash_cache_file(self, identifier: str) -> Path: return self.cache_dir / f"{identifier}-hash" def _compute_hash(self, identifier: str) -> str: files_to_track = [ self.repo_root / "relevant-file.txt", self.repo_root / "another-file.yaml", ] return compute_files_hash(files_to_track) def _exists(self, identifier: str) -> bool: # Check if artifact exists artifact_path = self.repo_root / "artifacts" / identifier return artifact_path.exists() # Optional: Track context def _get_artifact_context(self, identifier: str) -> dict[str, str]: # Example: Track Python version import sys return {"python_version": f"{sys.version_info.major}.{sys.version_info.minor}"} ``` 3. **Add public convenience methods:** ```python def needs_regeneration(self, identifier: str) -> tuple[bool, str]: """Alias for needs_update with domain-specific name.""" return self.needs_update(identifier) def mark_generated(self, identifier: str) -> None: """Alias for mark_updated with domain-specific name.""" self.mark_updated(identifier) ``` 4. **Use the service:** ```python service = MyChecksumService(Path("/repo")) needs_regen, reason = service.needs_regeneration("my-artifact") if needs_regen: # Generate artifact... service.mark_generated("my-artifact") ``` ### Best Practices **✅ DO:** - Use descriptive identifier names (e.g., "python-image" not "py") - Log hash values (first 8-12 chars) for debugging - Handle missing files gracefully in `_compute_hash()` - Validate cache directory exists (base class handles this) - Use `compute_files_hash()` for consistent hashing **❌ DON'T:** - Include volatile data in hash computation (timestamps, random values) - Track files that change frequently without affecting artifact (logs, temp files) - Forget to call `mark_updated()` after successful artifact creation - Assume cache files exist (base class handles missing files) - Mix artifact types in same service (create separate services) ## Performance Considerations ### Hash Computation Cost - **Small files (\<1MB)**: ~1-5ms per file - **Large files (>10MB)**: ~50-100ms per file - **Many files (>100)**: Consider caching or sampling **Optimization tips:** - Cache hash computation results within same process - Only track files that actually affect artifact - Use glob patterns efficiently (avoid recursive scans) - Consider file modification time as quick pre-check ### Cache File I/O - **Read cost**: ~0.1-1ms per cache file - **Write cost**: ~1-5ms per cache file (includes ensure_dir) - **Negligible** compared to artifact generation (seconds to minutes) ### When to Use Checksums **✅ Good use cases:** - Docker image builds (minutes saved) - Dependency installations (10s-100s saved) - Code generation (seconds saved) - File transformations (compilation, transpilation) **❌ Poor use cases:** - Operations already \<100ms (overhead not worth it) - Volatile inputs (constant cache misses) - Write-once artifacts (no benefit from caching) ## Related Files **Base Pattern:** - `src/aidb_common/io/checksum_service_base.py` - Abstract base class - `src/aidb_common/io/hashing.py` - `compute_files_hash()` utility - `src/aidb_common/io/files.py` - Cache file I/O utilities (`read_cache_file()`, `write_cache_file()`, `ensure_dir()`) **Implementations:** See [checksum-service-implementations.md](checksum-service-implementations.md) **Further Reading:** - [checksum-service-implementations.md](checksum-service-implementations.md) - DockerImageChecksumService and FrameworkDepsChecksumService details - [docker-compose-generation.md](docker-compose-generation.md) - Compose generation system - [service-patterns.md](service-patterns.md) - Service architecture patterns

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ai-debugger/aidb'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

checksum-services.md•9.92 KiB