codebrain_scan_repo
Scan a repository's source files to generate or refresh .brain files, skipping unchanged files via hash comparison. Filters by file extension and excludes directories like .git and node_modules.
Instructions
Scan every source file under root and generate/refresh its .brain file.
Walks the directory tree, filters by file extension, prunes excluded
directories, and runs codebrain_scan_file on each match. Hash-gated:
unchanged files skip the model call. Per-file failures do not abort the
batch — they are reported at the end.
Defaults:
extensions: .py .js .ts .tsx .jsx .java .go .rs
exclude_dirs: .git .venv venv node_modules pycache dist build target
Args: root: Directory to scan recursively. force: If true, regenerate every brain file even when source hash matches. extensions: Override default source extensions (e.g. [".py", ".rb"]). exclude_dirs: Override default directory-name exclusion list.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| root | Yes | ||
| force | No | ||
| extensions | No | ||
| exclude_dirs | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |
Implementation Reference
- codebrain/server.py:301-327 (handler)MCP tool handler for codebrain_scan_repo. Decorated with @mcp.tool(), it delegates to brain_scanner.scan_repo().
@mcp.tool() async def codebrain_scan_repo( root: str, force: bool = False, extensions: list[str] | None = None, exclude_dirs: list[str] | None = None, ) -> str: """Scan every source file under `root` and generate/refresh its `.brain` file. Walks the directory tree, filters by file extension, prunes excluded directories, and runs `codebrain_scan_file` on each match. Hash-gated: unchanged files skip the model call. Per-file failures do not abort the batch — they are reported at the end. Defaults: - extensions: .py .js .ts .tsx .jsx .java .go .rs - exclude_dirs: .git .venv venv node_modules __pycache__ dist build target Args: root: Directory to scan recursively. force: If true, regenerate every brain file even when source hash matches. extensions: Override default source extensions (e.g. [".py", ".rb"]). exclude_dirs: Override default directory-name exclusion list. """ return await brain_scanner.scan_repo( root, force=force, extensions=extensions, exclude_dirs=exclude_dirs ) - codebrain/brain_scanner.py:353-397 (handler)Core logic implementing scan_repo. Walks source files via iter_source_files(), calls scan_file() for each, and reports generated/skipped/failed counts.
async def scan_repo( root: str, force: bool = False, extensions: list[str] | None = None, exclude_dirs: list[str] | None = None, ) -> str: """Scan every source file under `root` and generate/refresh its `.brain` file. Hash-gated: files whose source hash matches the existing `.brain` are skipped without invoking the model. Use `force=True` to override. Per-file failures do not abort the batch — they are reported at the end. """ root_path = Path(root) if not root_path.exists(): return f"[codebrain error] root not found: {root}" if not root_path.is_dir(): return f"[codebrain error] root is not a directory: {root}" generated: list[str] = [] skipped: list[str] = [] failed: list[tuple[str, str]] = [] for source in iter_source_files(root_path, extensions, exclude_dirs): display = resolve_display_path(source) result = await scan_file(str(source), force=force) if result.startswith("generated:"): generated.append(display) elif result.startswith("skipped"): skipped.append(display) else: failed.append((display, result)) total = len(generated) + len(skipped) + len(failed) lines = [ f"Scanned {total} files: {len(generated)} generated, " f"{len(skipped)} skipped, {len(failed)} failed." ] if generated: lines.append("\nGenerated:") lines.extend(f" - {p}" for p in generated) if failed: lines.append("\nFailed:") lines.extend(f" - {p} — {reason}" for p, reason in failed) return "\n".join(lines) - codebrain/brain_scanner.py:67-69 (helper)Helper: compute SHA256 hash of source file content, used for skip-gating unchanged files.
def compute_source_hash(content: bytes) -> str: """Return `sha256:<hex>` digest of raw file bytes.""" return "sha256:" + hashlib.sha256(content).hexdigest() - codebrain/brain_scanner.py:332-350 (helper)Helper: iterates source files under root matching extensions, pruning excluded dirs via os.walk.
def iter_source_files( root: Path, extensions: list[str] | None = None, exclude_dirs: list[str] | None = None, ) -> Iterator[Path]: """Yield source files under `root` matching `extensions`, pruning `exclude_dirs`. Walks the tree with `os.walk` and mutates the dirs list in-place to prune excluded directories before descending. Does NOT yield `.brain` files (the extension whitelist takes care of that implicitly). """ ext_set = _normalise_extensions(extensions) exclude_set = frozenset(exclude_dirs) if exclude_dirs is not None else DEFAULT_EXCLUDE_DIRS for dirpath, dirnames, filenames in os.walk(root): dirnames[:] = [d for d in dirnames if d not in exclude_set] for fname in filenames: if Path(fname).suffix.lower() in ext_set: yield Path(dirpath) / fname