Skip to main content
Glama

What Is It?

ConTXT BOX is a strict, local-first knowledge layer that sits beside any project or document folder. It gives coding agents such as Claude Code, Codex, Cursor, and other MCP clients a fast external memory: indexed filenames, folders, neighbors, summaries, cached document/image context, and durable chat preservation.

The design is intentionally narrow. Documents and images are the core path because they cover most real user context. Heavy extraction uses exactly one configured engine: MarkItDown or Docling. No multi-tool fallback chain is used in core extraction.

Related MCP server: ContextAtlas

Features

  • Lazy indexing with rel_path, filename, folder, mtime, size, type, neighbors, folder summaries, and cheap file summaries.

  • On-demand extraction only through MarkItDown or Docling.

  • Permanent Markdown sidecars under .contextbox/history/media/.

  • MCP tools for coding agents.

  • Watchdog-based watch command for continuous index updates.

  • Preview-only smart reorganization.

  • Auto preservation into .contextbox/CONTEXT.md plus JSONL history.

Quick Start

uv sync
uv run contxtbox --help
uv run contxtbox init --root "S:\Papers"
uv run contxtbox config-show --root "S:\Papers"
uv run contxtbox index --root "S:\Papers"
uv run contxtbox health --root "S:\Papers"
uv run contxtbox search "computer vision" --root "S:\Papers"

When commands are run from inside the target workspace, --root can be omitted.

Install the document/image engines:

uv sync --extra media

Extract one file with the strict default engine:

uv run contxtbox extract-media "Computer Vision\paper.pdf" --root "S:\Papers"

Use Docling explicitly:

uv run contxtbox extract-media "Computer Vision\paper.pdf" --root "S:\Papers" --engine docling

Watch a folder:

uv run contxtbox watch --root "S:\Papers"

Run production readiness checks:

uv run contxtbox health --root "S:\Papers" --fail-on-error

Show the effective workspace config:

uv run contxtbox config-show --root "S:\Papers"

Production and MCP setup guides:

How It Works

workspace/
`-- .contextbox/
    |-- index.json
    |-- config.toml
    |-- CONTEXT.md
    |-- preservation.jsonl
    `-- history/
        `-- media/
            `-- sanitized__file__path.context.md

Indexing Rules

index, update_index, and watch always record:

  • rel_path

  • filename

  • folder_path

  • mtime

  • size

  • file_type

  • neighbors

  • parent_folder_summary

  • last_indexed

  • context_summary

The default summary is cheap and deterministic. It uses filename, folder name, and 5-7 nearby files. It does not open PDFs or images during indexing.

Configuration

init creates .contextbox/config.toml:

extraction_engine = "markitdown"
max_inline_bytes = 512000
large_file_bytes = 50000000
max_neighbors = 10
debounce_seconds = 2.0
auto_watch = true

ignored_dirs = [
  ".git",
  ".venv",
  "node_modules",
]

priority_folders = [
  "codebases/",
  "research/",
  "specs/",
  "decisions/",
  "assets/images/",
]

Use "docling" when you want Docling as the strict extraction engine.

Extraction Rules

Heavy extraction only happens when:

  • extract-media path is called,

  • or an MCP client calls get_file(path, depth="full").

The result is cached as Markdown in .contextbox/history/media/, and index.json receives:

  • extracted_at

  • context_ref

  • extraction_method

  • extraction_status

  • extraction_warnings

  • extraction_duration_seconds

Sidecars include the same audit header before extracted content. Status values are conservative: success, partial, metadata-only, or cached.

MCP Tools

  • update_index()

  • server_info()

  • set_root(root, index=true)

  • health()

  • search(query, limit=10)

  • get_file(path, depth="metadata" | "full")

  • pull_context(task, limit=5)

  • extract_media(path, force=false)

  • reorganize(instruction)

  • auto_preserve_context(summary, metadata=null)

Start the MCP server:

uv run contxtbox mcp --root "S:\Papers"

Attribution

Roadmap

  • Stronger semantic search over sidecars.

  • Reorganization scoring based on folder summaries and neighbor cues.

  • MCP client recipes for Claude Code, Codex, Cursor, and others.

  • Safe apply/undo flow for reorganization.

  • Configurable ignore rules and extraction engine policy.

Contributing

New ideas, bug fixes, documentation improvements, integration recipes, and production hardening work are welcome. Open an issue for discussion, or submit a focused pull request with a clear description, tests where relevant, and the verification commands you ran.

Useful contribution areas:

  • MCP client setup recipes for more coding tools.

  • Better document/image extraction quality checks.

  • Faster indexing and retrieval for large workspaces.

  • Safer reorganization previews and apply/undo flows.

  • Clearer docs, examples, and real-world testing notes.

See CONTRIBUTING.md for the development checks.

Connect

License

MIT. See LICENSE.

Release

PyPI publishing is configured for Trusted Publishing through GitHub Actions. See Production readiness.

A
license - permissive license
-
quality - not tested
B
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Oshadha345/contxt-box'

If you have feedback or need assistance with the MCP directory API, please join our Discord server