# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [0.4.0] - 2026-02-01
### Added
- Markdown support — indexes `.md` files by heading-based sections (`ChunkType.section`)
- Setext heading support (`Title\n=====` syntax) in addition to ATX headings (`# Title`)
- `ChunkType.section` enum value for document sections (backward-compatible with existing indexes)
- YAML frontmatter correctly excluded from chunks
- Decision doc: `docs/decisions/005-markdown-chunking.md`
### Changed
- `tree-sitter-markdown` added as dependency
- `CompositeChunker` now routes `.md` files to `MarkdownChunker`
## [0.3.0] - 2026-01-31
### Added
- Multi-language chunker architecture (`BaseTreeSitterChunker` + `CompositeChunker` dispatcher)
- Rust support — functions, structs, enums, traits, impl blocks, `//!` module doc comments
- Published to PyPI — installable via `uvx semantic-code-mcp`
- GitHub Actions workflow for automated publishing on tag push (trusted publishers OIDC)
- Platform-specific install docs (macOS/Windows vs Linux CPU-only torch)
- "Adding a New Language" guide in README
### Changed
- Flattened package — `chunkers/` and `embedder.py` at top level, `indexer/` collapsed to single module
- `IndexService` owns scanning, change detection, chunking, status; `Indexer` handles embedding and storage only
- `CompositeChunker` with extension collision detection (renamed from `MultiLanguageChunker`)
- `IndexService` takes `CompositeChunker` directly — no separate `supported_extensions` parameter
- All enums standardized to `StrEnum` + `auto()` — `ChunkType`, Python `NodeType`, Rust `NodeType`
- `__init__.py` files are docstring-only (no re-exports); all imports use full module paths
- Container uses `cached_property` for model/embedder singletons, per-project store caching
### Fixed
- Tree-sitter `Parser` thread-safety — create fresh parser per `chunk_string` call (was shared, mutates on `parse()`)
- `mock_embedder.embed_batch` returns correct embedding count via `side_effect` (was hardcoded to 1)
- `Indexer` is now pure data pipeline — no `Settings`, no `FileChangeCache`, no `cache_dir`; all cache bookkeeping owned by `IndexService`
- Clean shutdown on Ctrl+C (SIGINT handler instead of traceback)
- Removed redundant `.value` calls on `StrEnum` members
## [0.1.0] - 2026-01-31
### Added
- `IndexService` orchestrating full index pipeline (scan → detect → chunk → embed) with progress callbacks
- `SearchService` with auto-indexing via `IndexService` (replaces manual orchestration in server.py)
- `services/` package replacing `search/` directory
- `duration_seconds` on `IndexResult` for end-to-end timing
- Strict ruff rules: C901, DTZ, ASYNC, SLF, PIE, T20, PERF, FURB, PLC0415
- ty type-checking rules in pre-commit and pyproject.toml
- Profiling support with pyinstrument for dev (enable with `SEMANTIC_CODE_MCP_PROFILE=1`)
- MCP server with three tools: `search_code`, `index_codebase`, `index_status`
- Semantic code search using sentence-transformers embeddings (all-MiniLM-L6-v2)
- LanceDB vector storage for embeddings
- Module-level docstring chunking — conceptual queries now match files by their self-description (decision 003)
- Tree-sitter based AST chunking for Python (functions, classes, methods, module docstrings)
- Incremental indexing with mtime-based change detection
- Debug timing info in search results (status_check_ms, embedding_ms, search_ms)
- Hybrid search: keyword boost (up to 20%) and recency boost (up to 5% for files < 1 week old)
- Score threshold filtering (< 0.3 filtered as noise)
- Result truncation (> 50 lines shows "... truncated")
- Results grouped by file for cleaner output
- Pre-load embedding model at startup (avoids 2s cold start)
- Parallel file chunking with asyncio.gather
- Project documentation structure (CLAUDE.md, README.md, TODO.md, CHANGELOG.md)
- Claude Code rules in `.claude/rules/`
- Architecture decision records in `docs/decisions/`
- Pre-commit hooks (ruff, bandit, conventional commits)
### Fixed
- SQL injection in `delete_by_file` — escape single quotes in file paths
- Force reindex now clears vector store to prevent duplicate results
- Atomic cache writes (tempfile + rename) to prevent corruption on crash
- FTS index failures logged at WARNING instead of DEBUG
- Timezone-aware datetimes throughout (DTZ compliance)
### Changed
- CPU-only PyTorch via `[tool.uv.sources]` — venv reduced from 7.8GB to 1.7GB (no CUDA/nvidia/triton)
- Lazy `sentence-transformers` import — startup no longer loads torch (~4s saved)
- `server.py` is now a thin tool layer delegating to `IndexService`/`SearchService`
- `SearchOutcome.index_result` always present (default zeros) — eliminates None guards
- `container.create_search_service()` reuses indexer instead of creating duplicates
- Removed unused `status_cache_ttl` config option
- Chunker complexity reduced by extracting `_extract_decorated` and `_extract_class_with_methods`
- File scanning uses `git ls-files` for 100x speedup (falls back to os.walk for non-git repos)
- Removed sync `index()` method - only async version remains (no code duplication)
- Removed unused Searcher class
- Skip FTS index rebuild if already exists (~80ms saved per search)