loom

Connect Loom to Claude Desktop and ask your vault anything. Loom is a local MCP server that gives Claude cited answers from your notes — nothing leaves your machine. Claude Desktop, Cursor, or any MCP client works. → Setup below.

Point it at your notes folder. Loom indexes everything, maps typed relationships between ideas (which notes cite the same sources, which contradict each other, which extend the same argument), and builds a persistent graph that gets smarter as you add more. Your vault, vector index, and graph stay on your machine. Only the specific passages behind an answer go to Claude for synthesis — or skip that and run a local model instead.

What this lets you do that search can't: ask "what connects my notes on topic A and topic B" and get the actual relationship chain, not just documents that mention both. Ask "what have I been reading extensively but never synthesized" and get your blind spots surfaced through graph centrality. Those are structural queries — they need a graph, not a bigger context window.

Who this is for: researchers, lawyers, clinicians, journalists, and anyone with thousands of documents who needs to reason across all of them privately, on their own machine. If 50 notes and grep is enough, this is overkill. If you have 5,000 and you've started forgetting what you know — that's exactly what Loom was built for.

Query: "Search my vault first, then answer" — a clinical polypharmacy question about a 74-year-old CHF patient. Loom retrieved passages from her clinical reference notes on Beers Criteria and drug interactions. Claude synthesized across them — flagging the ibuprofen contraindication, oxybutynin's ACB score of 3 as the likely cause of confusion and falls, and the triple AKI risk.

Query across a research vault — Loom pulled from Cornell CS4780 lectures, a standalone SGD note, and papers, then identified a gap: no dedicated note on adaptive methods (Adam, AdaGrad, RMSProp).

Related MCP server: brainMD

Download

Windows: Download loom-installer.exe and run it. No Python or Neo4j required. The installer handles setup and connects Claude Desktop automatically.

Mac / Linux: Install from source with pipx:

git clone https://github.com/KlossKarl/loom
cd loom
pipx install .   # includes the embedded graph backend (FalkorDB) — no server needed
loom init        # interactive wizard: pick your vault folder
loom chat

loom init walks you through picking a vault folder, then loom chat is live. On Mac/Linux the full knowledge graph is on by default via an embedded FalkorDB backend that installs with Loom itself — no Neo4j, no Docker, no server (requires Python ≥ 3.12; older Pythons run vector-only). On Windows, embedded vector-only mode is the default and Neo4j is the optional graph upgrade. Full options (Docker, dev install) are in Install below. On a Mac? See the Mac setup guide for the pip3/python3 conventions and SSL/cert notes.

Connect to Claude Desktop (the primary way to use Loom)

Loom runs as a local MCP server (loom_mcp.py) that connects your vault and knowledge graph to Claude Desktop, Cursor, Continue.dev, or any MCP-compatible client. You ask questions in Claude Desktop; Loom does the retrieval locally and returns cited passages and graph connections.

Setup (Claude Desktop):

Install Loom from source (see Install) and run loom init once. The wizard auto-detects your OS paths, writes config.yaml, and registers Loom in Claude Desktop's config automatically on Windows and Mac. The next two steps are only needed if you want to wire it up by hand.
Add Loom to your Claude Desktop config: %APPDATA%\Claude\claude_desktop_config.json on Windows, ~/Library/Application Support/Claude/claude_desktop_config.json on Mac:

{
  "mcpServers": {
    "loom": {
      "command": "python",
      "args": ["C:\\path\\to\\loom\\loom_mcp.py"]
    }
  }
}

Restart Claude Desktop. Ask: "Search my vault for anything about knowledge graphs."

On Windows, if python doesn't resolve in Claude Desktop's environment, use the full interpreter path (e.g. C:\\Python314\\python.exe). Full instructions (Mac paths, the python resolution fix, and troubleshooting) are in docs/claude-desktop-config.md.

What stays local: your vault, the ChromaDB vector index, and the Neo4j graph never leave your machine. Claude Desktop only ever sees the specific passages Loom retrieves for a given question. It's the same data flow as any retrieval-augmented chat.

The 14 tools Claude gets:

Tool	What it does
`search_vault`	Hybrid search (semantic + keyword + graph) with citations
`get_entity_neighbors`	Explore the knowledge graph around any concept
`get_archaeology_report`	Find notes that are semantically related but never linked
`ingest_url`	Add a web URL to your vault
`get_vault_stats`	Vault statistics and top concepts
`add_fact`	Write an entity or typed relationship into the graph
`delete_edge`	Remove a single relationship by edge id
`forget_entity`	Remove an entity and all its edges
`get_vault_health`	Orphan notes, broken links, stale content, tag hygiene
`create_note`	Create a new markdown file in the vault
`edit_section`	Edit a specific section of a note by heading
`read_section`	Read a specific section of a note by heading
`append_to_note`	Append content to an existing note
`run_loop`	Trigger a background loop on demand (archaeology, audit, etc.)

Quick Start: UI Mode

Prefer a browser to Claude Desktop? Loom also exposes an OpenAI-compatible API (loom_serve.py), and Open WebUI connects to it for a polished chat experience: citations, model switching, conversation history, no terminal.

Prerequisites: Loom installed (see above), Docker Desktop for Open WebUI.

One-time setup:

# 1. Start Open WebUI (one Docker command)
docker run -d -p 3030:8080 --add-host=host.docker.internal:host-gateway ^
  -v open-webui:/app/backend/data --name open-webui --restart always ^
  ghcr.io/open-webui/open-webui:main

# 2. Start the Loom API server
pip install fastapi uvicorn
python loom_serve.py

Connect Open WebUI to Loom (one-time, in the browser):

Open http://localhost:3030
Admin Settings → Connections → + (OpenAI-compatible)
URL: http://host.docker.internal:11435 · Key: loom
Add model ID loom → Save

Daily use, just run:

python loom_serve.py
# then open http://localhost:3030 and pick the "loom" model

On Windows, double-click Loom.bat in the install directory for one-click launch.

Every reply runs through the full retrieval pipeline (vector + graph + BM25 + HyDE + rerank) and cites the specific vault files behind each answer.

Synthesis Modes

Retrieval (embedding, vector search, BM25, graph traversal, reranking) always runs locally and in-process: sentence-transformers + ChromaDB, no external services required for indexing or search. Only the final answer-generation step needs a model, and you choose where that runs:

API mode (recommended). Set ANTHROPIC_API_KEY in a .env file at the repo root. Loom uses Claude for synthesis: fast, high quality, no local GPU required. Because embedding and retrieval are in-process, this is the lightest possible setup.

Local mode. Start Ollama and pull a chat model (ollama pull deepseek-r1:14b). Synthesis runs fully on-device: slower and needs a capable GPU, but zero API cost and fully air-gapped.

Why Loom

Most LLM-over-notes tools are stateless: they read your files fresh every conversation, re-derive connections on every query, and degrade as your vault grows past a few hundred notes.

Loom is stateful. A persistent vector index and knowledge graph accumulate over time. The relationships between your ideas are stored, traversable, and queryable, not re-derived on every request. When your vault hits 5,000 notes and you need to know what you forgot you knew two years ago, stateless tools can't help. Loom can.

What Loom does:

Ask your notes anything: semantic search + knowledge graph traversal with cited answers. Retrieval runs locally; synthesis uses Claude by default (or a local model)
See what you forgot: weekly reports find notes you read months ago that connect to what you're working on now (Knowledge Archaeology, live)
Track how your thinking evolved: monthly reports show how your understanding of a topic shifted over time (Temporal Reasoning, live)
Know where you're thin: weekly reports rank concepts where your coverage is shallow (Epistemic Audit, live)
Ingest everything automatically: drop PDFs, audio, and web URLs into one folder. Loom does the rest.

What Loom will never do:

Store your vault, index, or graph anywhere but your machine
Require a subscription or a vendor account
Stop working because a vendor got acquired

The only thing that ever leaves your machine is the handful of retrieved passages sent to Claude for synthesis. Run Ollama locally to keep even that on-device.

How Loom differs from agent-memory tools. Mem0, Graphiti, and Cognee store facts extracted from conversation histories. They're optimized for "remember what this user prefers across sessions." Loom is built for document corpora: thousands of notes, papers, transcripts, and web threads you've accumulated over years. The retrieval problem is different, the graph schema is different, and the use case is different. If you need your agent to remember a conversation, use Mem0. If you need to reason across your document library, that's Loom.

How Loom Compares

	Loom	NotebookLM	Khoj	Smart Connections	InfraNodus	Mem0
Surfaces forgotten connections	Yes (local)	No	No	Semantic only	Yes (cloud, €12+/mo)	No
Knowledge graph	Full entity-relation	No	No	Semantic neighborhoods	Yes	Cloud only ($249/mo)
Temporal reasoning	Yes (local)	No	No	No	No	No
Knowledge gap detection	Yes (local)	No	No	No	Yes (cloud)	No
Local-first (your data stays on device)	Yes	No (Google)	Self-host	Yes (plugin)	No (EU cloud)	No (cloud)
Automated intake	Yes	Manual upload	Connectors	No	Import-based	No
Works without Obsidian	Yes	Yes	No	No	Partial	Yes

How Loom is different from InfraNodus

InfraNodus is the closest competitor: it also surfaces structural gaps and forgotten connections in a knowledge graph, and it ships as an Obsidian plugin. The difference is shape, not feature checkboxes. InfraNodus is cloud SaaS, single-purpose, and import-based: you upload a body of text, it analyzes that snapshot, you read the result. Loom is local, continuous, and integrated: it runs on your machine with no subscription, the graph grows every time you drop a file into intake, and the analysis lives inside a queryable second brain you can chat with. You don't export to Loom. Loom is where your notes live.

Features

	Feature	Status
⚡	Semantic search: ask your notes anything, get cited answers	✅ Live
🕸️	Knowledge graph: typed relationships; embedded FalkorDB (Mac/Linux) or Neo4j (Windows)	✅ Live
🎙️	Whisper transcription: audio and video → indexed notes	✅ Live
📄	PDF extraction: drop any PDF, it lands in your vault	✅ Live
🌐	Web digest: Wikipedia, arXiv, HN, SEC, IRS, LessWrong	✅ Live
📚	Topic packs: 46 curated corpora via `loom pack install`	✅ Live
📦	Loom Capture: standalone intake daemon, pipx installable	✅ Live
🖥️	Web UI via Open WebUI: chat interface, no terminal needed	✅ Live
🏛️	Knowledge Archaeology: weekly report of forgotten connections	✅ Live (loop)
⏱️	Temporal Reasoning: monthly arc report of concept evolution	✅ Live (loop)
🧠	Epistemic Audit: weekly knowledge gap analysis	✅ Live (loop)
🔍	Vault health diagnostics: orphans, broken links, stale notes	✅ Live
⏰	Temporal search lane: "last week" queries activate recency scoring	✅ Live
📝	Note write tools: create, edit, append via MCP	✅ Live
🔄	Real-time vault sync: file watcher re-indexes on save	✅ Live
🔁	Loop runner: 6 scheduled background analyses	✅ Live

Install

The one-click Windows installer is live (v0.2.1). On Mac/Linux, or prefer source? The pipx path below is the quickest, and Docker brings up the full stack. Works on Windows, Mac, and Linux.

Prefer to inspect the installer before running it? See Installer Trust — exactly what it touches, how to read it yourself, and how to pin to a release.

Quickest: install from PyPI

pip install loom-pkm
loom init
loom chat

Loom is on PyPI as loom-pkm — no clone needed. pipx install loom-pkm works too if you prefer an isolated install. All the source-based paths below still work.

1. Claude Desktop (MCP): the recommended way to use Loom

Install Loom from source (the pipx step below is fine), then wire it into Claude Desktop and ask your vault questions right from the chat. Full walkthrough: Connect to Claude Desktop.

git clone https://github.com/KlossKarl/loom
cd loom
LOOM_EMBEDDED=true pipx install .   # or: pip install -e .
loom init                           # creates config.yaml (required before connecting)
# then add loom to claude_desktop_config.json (see the section linked above)

2. pipx: the default first run (no Neo4j, no Ollama, no server)

pipx install .
loom init
loom chat

On Mac/Linux (Python ≥ 3.12) this includes the embedded FalkorDB graph backend, so full graph features — typed relationships, graph search, archaeology — work out of the box with zero extra services. On Windows (or older Pythons) the same install runs in vector-only embedded mode, and Neo4j is the optional graph upgrade — nothing to migrate, your index carries over. Backend selection is automatic (second_brain.graph_backend: auto).

3. Docker: full graph stack (Neo4j + Ollama), no manual dependency setup

Want the optional full graph features from day one? Docker brings up the vector store, Neo4j graph database, and local LLM with no manual Python/Neo4j/Ollama install:

git clone https://github.com/KlossKarl/loom
cd loom

# 1. Configure: copy the env template and set your vault path
cp .env.example .env
# then edit .env: LOOM_VAULT_PATH is required (point it at your notes folder)

# 2. Bring up the services (Neo4j + Ollama + Loom)
#    Ollama auto-detects the accelerator (Metal on Apple Silicon, else CPU);
#    no GPU config needed. Nvidia homelab users can add passthrough via a
#    docker-compose.nvidia.yml override.
docker compose up -d

docker compose up starts the services but does not index or chat on its own. The Loom container idles until you run commands against it:

docker compose run --rm loom python src/second_brain/second_brain.py --index
docker compose run --rm loom python src/second_brain/second_brain.py --chat

⚠️ Never run cat .env in a shared screen, terminal recording, or chat session. Your .env file contains your Anthropic API key. If exposed, rotate it immediately at console.anthropic.com → API Keys. To safely inspect your config without printing secrets: grep -v API_KEY .env

4. Development install (from source)

git clone https://github.com/KlossKarl/loom
cd loom
pip install -e .
LOOM_EMBEDDED=true loom init
loom chat

Drop files into intake/, then ask your notes anything.

Gotchas

First graph index is slow: the --graph-index pass uses your local LLM (or Claude Haiku) to extract entities from every file. On a mid-range GPU, expect it to run overnight. Vector index is fast (~10 minutes). For a free, fast first pass, --quick-graph-index uses GLiNER v2.1 locally. See below.
Tested on Windows: Loom uses Path() throughout and should work on Mac/Linux, but it's only been tested on Windows. PRs welcome.
Graph backends: on Mac/Linux the knowledge graph runs on an embedded FalkorDB backend by default (installs with Loom, no server). On Windows, Neo4j is the graph backend; without it Loom runs vector-only. Force a choice with second_brain.graph_backend: falkor|neo4j, or set LOOM_EMBEDDED=true for vector-only mode anywhere.

Entity extraction: local and free, or Claude Haiku. The knowledge graph is built by --graph-index (Claude Haiku via the Anthropic API when a key is set, falling back to your local chat_model) or by --quick-graph-index, which uses GLiNER v2.1 (urchade/gliner_multi-v2.1), a 209M-param local NER model with zero API cost that runs entirely on your machine. See docs/usage.md for the trade-offs.

Control Panel

Loom ships with a desktop control panel (src/whisper/intake_tray.py):

Intake tab: drag-and-drop files + YouTube URL queue
Search tab: vector and graph search against your vault
Chat tab: persistent local LLM conversation
Status tab: vault stats, watcher status, index runner

Runs in the system tray. Start it with:

python src/whisper/intake_tray.py

The Windows installer launches this automatically on first run.

What it actually does

Drop a file into the intake folder. It routes itself.

lecture.mp3          ->  Whisper transcription -> vault -> indexed
paper.pdf            ->  PDF to markdown -> vault -> indexed
thread.txt           ->  Web digest -> structured note -> vault -> indexed  
https://...url       ->  Same as above
note.md              ->  Copied directly to vault -> indexed

Then ask questions:

> what did the stanford cs229 lecture say about attention mechanisms?
> compare the risk frameworks across my last 5 papers
> what connects OODA loop to predictive coding?
> find everything I've read about CLO structures

Retrieves from ChromaDB (vector search), traverses Neo4j (knowledge graph), and generates the answer with Claude by default (or a local LLM via Ollama). Indexing, retrieval, and your data stay on your hardware.

Example: 30 Papers on RAG Over 6 Months

You've been reading about retrieval-augmented generation for months. Papers, blog posts, podcast transcripts, HN threads. All dropped into intake/ as they came in. You never organized them.

Six months later, you need to write a synthesis. You type:

loom chat
> What are the main approaches to RAG and how do they compare?

Loom searches the vector index, walks the knowledge graph, and returns an answer citing 14 of your notes across 8 sources, including a podcast transcript from January you completely forgot about, and a connection between two papers you never would have made manually.

The answer includes citations back to the exact vault files. You click through, verify, and start writing. The system got smarter while you weren't looking, because the graph accumulated relationships every time you fed it something new.

That's what stateful retrieval means. A stateless tool would have started from scratch.

Architecture

intake/                     <- drop anything here
    |
src/whisper/intake_watcher.py   <- watches folder, routes by file type
    |
+--------------------------------------------------+
|  transcribe.py   pdf_to_md.py   web_digest.py   |
|        Whisper      pymupdf        Claude Code   |
+--------------------------------------------------+
    |
Obsidian Vault              <- all content lands here as markdown
    |
src/second_brain/second_brain.py --index         <- chunks + embeds into ChromaDB
src/second_brain/second_brain.py --graph-index   <- extracts entities/relationships into Neo4j
    |
    +--> src/second_brain/second_brain.py --chat <- CLI: hybrid retrieval (vector + graph + HyDE + rerank)
    |
    +--> loom_serve.py                            <- OpenAI-compatible HTTP API
              |
              +--> Open WebUI (browser)           <- full chat UI with citations
              +--> Continue.dev / any OpenAI client

graph LR
    A[📁 intake/] --> B[Folder Watcher]
    C[🎙️ Audio] --> D[Whisper]
    E[📄 PDF] --> F[PyMuPDF]
    G[🌐 Web URL] --> H[Web Digest]
    B --> I[(ChromaDB\nVector Index)]
    D --> I
    F --> I
    H --> I
    I --> J[Neo4j\nKnowledge Graph]
    I --> K[💬 loom chat]
    J --> K
    I --> M[loom_serve.py\nOpenAI-compatible API]
    J --> M
    M --> N[🖥️ Open WebUI]
    K --> L[📝 Obsidian Vault]

For details on the knowledge graph schema, see docs/graph_schema.md.

What's actually different

Loom is opinionated about graph schema and entity resolution. It trades flexibility for long-term coherence: every relationship is typed, every entity is resolved, and the graph gets smarter the more you feed it.

A few specific things, since "local-first RAG" is a crowded space.

Constrained typed relationships, not unconstrained predicate generation. The Neo4j schema uses a fixed set of relationship types: CITES, INFLUENCES, EXTENDS, CONTRASTS_WITH, UNCERTAIN_SAME_AS, UNTYPED_RELATION, CO_OCCURS_WITH, and REFERS_TO. Anything the LLM tries to emit outside that set is rewritten to UNTYPED_RELATION with the original predicate preserved on r.raw_type. This is more restrictive than letting the model invent edge types, but the graph stays coherent at scale instead of fragmenting into thousands of one-off predicate names. The validation happens at the graph write layer, not just in the prompt.

Evidence-backed graph extraction. Every semantic edge (and every MENTIONS link from a chunk to an entity) carries an evidence_span (a ≤200-char verbatim quote from the source chunk) and an edge_confidence score. Every :Entity node carries an extraction_confidence (max of all scores ever seen for it). You can ask "where did this come from?" and get the actual line of text the LLM was looking at when it made the claim. Most graph-RAG tools throw this provenance away the moment extraction finishes.

Entity resolution with canonical keys + aliases. "PAC-learning", "PAC learning", and "pac learning" all collapse to the same canonical key in the graph (lowercase, hyphens → spaces, stripped possessives/articles/accents, conservative plural strip), with the original surface forms preserved as Alias nodes linked via HAS_ALIAS. This handles the entity dedup problem most LLM-extracted graphs ignore. Without it, the graph fills with near-duplicate nodes and cross-document traversal breaks down.

Wikilink-aware graph. Obsidian [[wikilinks]] between vault notes are extracted and written as Document→Document REFERS_TO edges: user-curated structure, no LLM call. The graph respects the connections you drew by hand, not just the ones a model inferred.

Adaptive query routing, not blind hybrid retrieval. Most personal RAG tools run the same retrieval pipeline regardless of query type. This one classifies the query first (semantic, relational, or hybrid), routes to the appropriate store (ChromaDB, Neo4j, or both), then runs a sufficiency check and loops up to 3 times if context is insufficient. The router falls back to vector if graph comes up empty, or expands into graph if vector results don't answer the question. The route taken is logged so you can see how the system is thinking.

Cross-document queries vector search cannot answer. Because entities are shared nodes across documents, you can traverse Document -> Chunk -> Entity <- Chunk <- Document to find pairs of documents that both reference the same concept. That's a single Cypher traversal. Pure vector RAG cannot answer this structurally no matter how big the context window gets.

Retrieval Benchmarks

Benchmarks: Recall@10 0.947 · MRR@10 0.861 (hybrid + rerank, measured on an NVIDIA RTX 4070 Ti with CUDA — Mac/MPS and CPU runs rerank slower and may score slightly differently) · full methodology →

Note: most AI memory benchmarks test conversation recall — a different task from document retrieval.

Requirements

This list is for source installs. The Windows installer (v0.2.1, live) bundles Python and handles setup automatically.

Python 3.10+

Ollama - local LLM inference for chat. Pull a model based on your hardware:

Profile	VRAM	Chat model	Quality
Budget	4-8GB	`ollama pull llama3:8b`	Good for chat, basic graph
Mid (default)	8-16GB	`ollama pull deepseek-r1:14b`	Solid all-around
High	16-24GB	`ollama pull deepseek-r1:32b`	Better local graph extraction
Workstation	48GB+	`ollama pull llama3:70b`	Near-frontier quality

Embeddings run in-process via sentence-transformers (current default mxbai-embed-large). No separate Ollama pull required. For new installs we recommend qwen3-embedding, which leads current retrieval benchmarks; mxbai-embed-large remains a solid fallback.

Graph extraction defaults to Claude Haiku via the Anthropic API when anthropic.api_key is set in config.yaml. This is the recommended path: it's faster, more accurate, has prompt caching (~5× cost reduction), and costs about $5 for a full 18K-chunk index. If no API key is configured, Loom falls back to your local chat_model for extraction.

Neo4j Desktop (optional; Windows graph backend) - on Mac/Linux the knowledge graph uses the embedded FalkorDB backend instead (installs with Loom, no server). Install Neo4j on Windows when you want the knowledge graph (see below)
Obsidian - vault is just a folder of markdown, Obsidian is optional but recommended
Claude Code - used for free-tier web digest processing (optional but recommended)
Decent hardware. 16GB RAM minimum. A GPU with 8GB+ VRAM makes graph indexing significantly faster.

Loom Capture

If you only want the intake half (automatic Whisper transcription, PDF → markdown, web/HN/Reddit/Wikipedia digests, all dropping into any folder you point at), there's a standalone product:

pipx install loom-capture
loom-capture init
loom-capture watch

→ loom-capture/. Free, MIT, no API key, no vector DB, no graph. Works with Obsidian, Logseq, or any folder of markdown files.

What a day with Loom Capture looks like

Morning: you listen to a podcast and drop the mp3 into your vault. Capture transcribes it via Whisper and files it as markdown.

Afternoon: you paste three URLs into a text file in intake/: an arXiv paper, a Wikipedia article, and an HN thread. Capture digests all three into clean markdown notes.

Evening: you open Obsidian and everything is there, searchable, formatted, filed. You did zero manual work.

Manual config

The loom init wizard handles this for you, but if you prefer to set things up by hand:

cp config.template.yaml config.yaml
# edit config.yaml with your paths

Neo4j setup (optional upgrade for full graph features — the default embedded mode doesn't need it):

Install Neo4j Desktop
Create a new Project, add a Local DBMS
Set a password, start the instance
Put the password in config.yaml under second_brain.neo4j_password

config.yaml reference

paths:
  obsidian_vault: C:\Users\you\Documents\Obsidian Vault
  chroma_dir: C:\Users\you\Documents\second_brain_db

second_brain:
  vaults:
    - C:\Users\you\Documents\Obsidian Vault
  
  embed_model: mxbai-embed-large
  chat_model: deepseek-r1:14b
  
  neo4j_uri: neo4j://127.0.0.1:7687
  neo4j_password: yourpassword
  
  # customize entity types for your domain
  entity_types:
    - Person
    - Concept
    - Method
    - Paper
    - Organization
    - Dataset

intake:
  folder: C:\Users\you\Documents\loom\intake
  auto_index: true
  web_digest_free: true    # true = Claude Code (free), false = Anthropic API

Usage

The fastest path is the intake watcher. Run it once and drop files into intake/; they get transcribed/digested and indexed automatically:

python src/whisper/intake_watcher.py

The full command reference covers indexing, graph building (LLM and GLiNER), chat, web digests, transcription, research-source batches, and the in-chat retrieval toggles. See docs/usage.md.

Pre-built topic packs

loom ships with 46 pre-built semantic corpora, curated knowledge bases (AI, quant finance, mathematics, law, philosophy, and more) you can ingest in a few hours to start with a connected, queryable foundation instead of an empty vault.

List and install them straight from the CLI:

loom pack list                 # show all 46 packs with URL and install counts
loom pack install ai_agents    # ingest a pack via the web-digest pipeline, then index

Installs are resumable, so a failed URL or an interrupted run picks up where it left off.

→ See docs/topic-packs.md for the full list and usage.

Known issues

Tested on Windows. Paths use Path() throughout so it should work on Mac/Linux, but that hasn't been tested. PRs welcome.
Graph indexing is slow on large vaults: roughly 0.5 seconds per chunk on the current default hardware profile. For a vault with thousands of files this means running overnight. Batched UNWIND writes (Phase 2) will cut this significantly on multi-core machines.
~~No web UI.~~ Web UI is now live via loom_serve.py + Open WebUI. See Quick Start: UI Mode.

Roadmap

P0: Onboarding (shipped)

docker-compose.yml: one command full stack
LOOM_EMBEDDED=true: zero-dependency first run
pipx installable: pipx install .
loom init wizard: OS-aware path detection, auto-configures Claude Desktop
One-click Windows installer (v0.2.1): bundles Python, no Neo4j required
loom pack list / loom pack install: 46 curated topic packs from the CLI

P1: Loom Capture (shipped)

Standalone intake pipeline: pipx install loom-capture
Three commands: loom-capture init, watch, digest

Extraction Quality: Phase 0 + Phase 1 (shipped 2026-05-20)

Claude Haiku as default graph extraction model (with prompt caching, ~5× cost reduction)
Evidence spans + confidence scores on every entity and every semantic/MENTIONS edge
Typed exception handling + quality metrics summary at end of every --graph-index run
UNCERTAIN_SAME_AS / UNTYPED_RELATION split (retires POSSIBLY_SAME_AS)
Hallucination guard: entity names validated against source text
Obsidian [[wikilinks]] → Document→Document REFERS_TO edges
In-process sentence-transformers embedding (eliminates the old Ollama HTTP 500s under load)
BM25 disk cache (avoids cold rebuild every session)
ingested_at on every graph node: temporal foundation

Extraction Quality: Phase 2 (pending, new hardware)

Batched UNWIND Neo4j writes: utilises full CPU core count
Reranker upgrade to bge-reranker-v2-m3 (Recall@10 0.853 → 0.947 on CUDA, now on by default)
Degree cap on graph traversal (hub-node protection)
Retrieval deduplication
First full clean graph index run on the new PC

Performance

CUDA-accelerated reranking (CUDA PyTorch wheel; reranked queries now ~350ms on an RTX 4070 Ti — CPU fallback still works, just slower)

P2: Next Generation

Knowledge Archaeology: weekly loop surfaces forgotten notes ✅ shipped
Epistemic Audit: weekly knowledge gap report ✅ shipped
Temporal Reasoning: monthly arc report of how your thinking evolved ✅ shipped
loom_serve.py: OpenAI-compatible HTTP API (Open WebUI, Continue.dev) ✅
Argument layer: extract claims/evidence/debate structure into the graph
Graph visualization UI
Mac/Linux testing and fixes
Browser extension for web digest

Infrastructure

Embedded graph backend (Mac/Linux): FalkorDBLite (falkordblite) ships as the default graph backend — pip-only, no server, no JVM. The 2026-06 FalkorDB spike failed because the hyphenated package didn't exist; falkordblite does, and passed the full compat + parity gates (FALKORDBLITE_SPIKE_RESULTS.md).
Embedded graph backend (Windows): falkordblite has no win32 support, so Windows still needs Neo4j for graph features. SQLite-graph spike spec remains the candidate path to close this last gap.

Project structure

loom/
├── src/                  # all Python modules
│   ├── whisper/          # audio/video to markdown + intake watcher/tray
│   ├── second_brain/     # core: index, chat, graph
│   └── web_digest/       # all ingestion scripts + topic files
├── experiments/          # exploratory scripts (lightrag_test, etc.)
├── docs/                 # guides, ADRs, usage, topic packs
├── config.template.yaml  # starting point, copy to config.yaml
└── config.yaml           # your config (gitignored)

Support

If Loom is useful to you, a ⭐ on GitHub helps more people find it.

Questions or ideas? Open an issue.

Advanced: GPU Acceleration

The installer uses CPU-only PyTorch by default (~200 MB). For GPU-accelerated embeddings, install CUDA torch manually:

pip install torch --index-url https://download.pytorch.org/whl/cu121

Note: Loom's LLM inference (chat, analysis, graph extraction) runs through Ollama, which manages GPU access independently. CUDA torch only affects the embedding model (sentence-transformers).

Benchmarking Embedding Models

To compare embedding models on your actual vault content:

python experiments/embedding_benchmark.py --sample 500

Samples chunks from your existing ChromaDB collection, re-embeds them with each candidate model, runs a set of queries against each, and reports latency + retrieval results. Defaults to comparing mxbai-embed-large, nomic-embed-text-v1.5, and bge-m3. Results saved to experiments/embedding_benchmark_results.json.

Contributing

Loom is a solo project but issues and PRs are welcome.

Bug reports: open a GitHub issue with steps to reproduce
Feature requests: check the roadmap first, then open an issue
Pull requests: keep them focused. One thing per PR

License

MIT

loom

loom

Table of Contents

Download

Connect to Claude Desktop (the primary way to use Loom)

Quick Start: UI Mode

Synthesis Modes

Why Loom

How Loom Compares

How Loom is different from InfraNodus

Features

Install

Gotchas

Control Panel

What it actually does

Example: 30 Papers on RAG Over 6 Months

Architecture

What's actually different

Retrieval Benchmarks

Requirements

Loom Capture

What a day with Loom Capture looks like

Manual config

config.yaml reference

Usage

Pre-built topic packs

Known issues

Roadmap

Project structure

Support

Advanced: GPU Acceleration

Benchmarking Embedding Models

Contributing

License

Maintenance

Resources

Tools

Latest Blog Posts

MCP directory API