Skip to main content
Glama

loom

Loom shows you what you forgot and how your thinking changed: a local-first second brain that builds a knowledge graph of your notes and surfaces the connections you'd never find by searching.

Connect Loom to Claude Desktop and ask your vault anything — cited answers from your local knowledge graph, your knowledge base never leaves your machine. Loom runs as a local MCP server: Claude Desktop, Cursor, or any MCP client searches your vault, walks your knowledge graph, and surfaces forgotten connections without you leaving the chat. → 60-second setup below.

Point Loom at your notes folder and it indexes everything, maps the relationships between your ideas, and surfaces forgotten connections automatically. Your vault, vector index, and knowledge graph live entirely on your machine; only the specific passages behind an answer are sent to Claude for synthesis — and you can keep even that on-device with a local model. Prefer a web UI or the terminal? Those work too.

Download Loom

Windows installer coming soon. A one-click .exe that auto-installs Python, Ollama, and a starter model is in the works. Until it lands, install from source with pipx — it's two commands and works on Windows, Mac, and Linux. See Install.

LOOM_EMBEDDED=true pipx install .
loom init
loom chat

Who this is for: researchers, lawyers, quants, journalists, and anyone with thousands of documents who needs to reason across all of them privately, on their own machine. If you have 50 notes, a simple LLM-over-files setup works fine. If you have 5,000 and need to find what you forgot you knew two years ago, you need infrastructure underneath. That's Loom.

I built this because I kept losing things I had already read.

A local-first personal knowledge base. It ingests audio, PDFs, web threads, browser history, and markdown notes, indexes everything with vector search and a knowledge graph, and lets you chat with all of it — synthesis runs through Claude by default, or a local LLM via Ollama if you'd rather stay fully offline. Your vault, index, and graph stay on your machine. No subscription, no lock-in.

Most RAG tools are stateless retrieval pipelines: they find chunks and forget them. loom builds durable semantic memory. It accumulates an entity-resolved, relation-typed knowledge graph that answers structural questions pure vector systems can't express, and grows more connected the more you feed it.


Related MCP server: MemoryMesh

Table of Contents


Download

Windows installer coming soon. A one-click .exe — auto-downloads Python, silently installs Ollama, pulls a starter model, and drops Desktop + Start Menu shortcuts — is in progress. It's not published yet, so don't wait on it.

For now, the primary install path is pipx from source (Windows, Mac, Linux):

git clone https://github.com/KlossKarl/loom
cd loom
LOOM_EMBEDDED=true pipx install .
loom init     # interactive wizard: pick your vault folder
loom chat

loom init walks you through picking a vault folder, then loom chat is live. Full options (Docker, dev install) are in Install below.


Connect to Claude Desktop (the primary way to use Loom)

This is the fastest and best way to use Loom. Loom runs as a local MCP server (loom_mcp.py) that exposes your vault and knowledge graph to Claude Desktop, Cursor, Continue.dev, or any MCP-compatible assistant. You ask questions in Claude Desktop; Loom does the retrieval locally and hands back cited passages and graph connections — no app-switching, no copy-paste.

Setup (Claude Desktop):

  1. Install Loom from source (see Install) and run loom init once so a config.yaml exists.

  2. Add Loom to your Claude Desktop config — %APPDATA%\Claude\claude_desktop_config.json on Windows, ~/Library/Application Support/Claude/claude_desktop_config.json on Mac:

{
  "mcpServers": {
    "loom": {
      "command": "python",
      "args": ["C:\\path\\to\\loom\\loom_mcp.py"]
    }
  }
}
  1. Restart Claude Desktop. Ask: "Search my vault for anything about knowledge graphs."

On Windows, if python doesn't resolve in Claude Desktop's environment, use the full interpreter path (e.g. C:\\Python314\\python.exe). Full instructions — Mac paths, the python resolution fix, and troubleshooting — are in docs/claude-desktop-config.md.

What stays local: your vault, the ChromaDB vector index, and the Neo4j graph never leave your machine. Claude Desktop only ever sees the specific passages Loom retrieves for a given question — the same data flow as any retrieval-augmented chat.

The 8 tools Claude gets:

Tool

What it does

search_vault

Hybrid search (semantic + keyword + graph) with citations

get_entity_neighbors

Explore the knowledge graph around any concept

get_archaeology_report

Surface forgotten connections between your notes

ingest_url

Add a web URL to your vault

get_vault_stats

Vault statistics and top concepts

add_fact

Write an entity or typed relationship into the graph

delete_edge

Remove a single relationship by edge id

forget_entity

Remove an entity and all its edges


Quick Start — UI Mode

Prefer a browser to Claude Desktop? Loom also exposes an OpenAI-compatible API (loom_serve.py), and Open WebUI connects to it for a polished chat experience: citations, model switching, conversation history, no terminal.

Prerequisites: Loom installed (see above), Docker Desktop for Open WebUI.

One-time setup:

# 1. Start Open WebUI (one Docker command)
docker run -d -p 3030:8080 --add-host=host.docker.internal:host-gateway ^
  -v open-webui:/app/backend/data --name open-webui --restart always ^
  ghcr.io/open-webui/open-webui:main

# 2. Start the Loom API server
pip install fastapi uvicorn
python loom_serve.py

Connect Open WebUI to Loom (one-time, in the browser):

  1. Open http://localhost:3030

  2. Admin Settings → Connections → + (OpenAI-compatible)

  3. URL: http://host.docker.internal:11435  ·  Key: loom

  4. Add model ID loom → Save

Daily use, just run:

python loom_serve.py
# then open http://localhost:3030 and pick the "loom" model

On Windows, double-click Loom.bat in the install directory for one-click launch.

Every reply goes through Loom's full retrieval pipeline (vector + graph + BM25 + HyDE + rerank) and comes back cited against your vault files.


Synthesis Modes

Retrieval (embedding, vector search, BM25, graph traversal, reranking) always runs locally and in-process: sentence-transformers + ChromaDB, no external services required for indexing or search. Only the final answer-generation step needs a model, and you choose where that runs:

API mode (recommended). Set ANTHROPIC_API_KEY in a .env file at the repo root. Loom uses Claude for synthesis: fast, high quality, no local GPU required. Because embedding and retrieval are in-process, this is the lightest possible setup.

Local mode. Start Ollama and pull a chat model (ollama pull deepseek-r1:14b). Synthesis runs fully on-device: slower and needs a capable GPU, but zero API cost and fully air-gapped.


Why Loom

Most LLM-over-notes tools are stateless: they read your files fresh every conversation, re-derive connections on every query, and degrade as your vault grows past a few hundred notes.

Loom is stateful. A persistent vector index and knowledge graph accumulate over time. The relationships between your ideas are stored, traversable, and queryable, not re-derived on every request. When your vault hits 5,000 notes and you need to know what you forgot you knew two years ago, stateless tools can't help. Loom can.

What Loom does:

  • Ask your notes anything: semantic search + knowledge graph traversal with cited answers — retrieval runs locally; synthesis uses Claude by default (or a local model)

  • See what you forgot: graph algorithms surface connections across thousands of notes you'd never find manually (Knowledge Archaeology, coming in P2)

  • Track how your thinking evolved: temporal reasoning shows how your understanding of a topic changed over time (coming in P2)

  • Know where you're thin: a weekly audit ranks concepts where your coverage is shallow (Epistemic Audit, coming in P2)

  • Ingest everything automatically: drop PDFs, audio, and web URLs into one folder. Loom does the rest.

What Loom will never do:

  • Store your vault, index, or graph anywhere but your machine

  • Require a subscription or a vendor account

  • Stop working because a vendor got acquired

Your knowledge base is local-first. The only thing that ever leaves your machine is the handful of retrieved passages sent to Claude for the final answer — and you can run synthesis fully on-device with Ollama to keep even that local.


How Loom Compares

Loom

NotebookLM

Khoj

Smart Connections

InfraNodus

Surfaces forgotten connections

Yes (local)

No

No

Semantic only

Yes (cloud, €12+/mo)

Knowledge graph

Full entity-relation

No

No

Semantic neighborhoods

Yes

Temporal reasoning

Coming

No

No

No

No

Knowledge gap detection

Coming

No

No

No

Yes (cloud)

Local-first (your data stays on device)

Yes

No (Google)

Self-host

Yes (plugin)

No (EU cloud)

Automated intake

Yes

Manual upload

Connectors

No

Import-based

Works without Obsidian

Yes

Yes

No

No

Partial

How Loom is different from InfraNodus

InfraNodus is the closest competitor: it also surfaces structural gaps and forgotten connections in a knowledge graph, and it ships as an Obsidian plugin. The difference is shape, not feature checkboxes. InfraNodus is cloud SaaS, single-purpose, and import-based: you upload a body of text, it analyzes that snapshot, you read the result. Loom is local, continuous, and integrated: it runs on your machine with no subscription, the graph grows every time you drop a file into intake, and the analysis lives inside a queryable second brain you can chat with. You don't export to Loom. Loom is where your notes live.


Features

Feature

Status

Semantic search — ask your notes anything, get cited answers

✅ Live

🕸️

Knowledge graph — typed relationships, Neo4j-powered

✅ Live

🎙️

Whisper transcription — audio and video → indexed notes

✅ Live

📄

PDF extraction — drop any PDF, it lands in your vault

✅ Live

🌐

Web digest — Wikipedia, arXiv, HN, SEC, IRS, LessWrong

✅ Live

📦

Loom Capture — standalone intake daemon, pipx installable

✅ Live

🖥️

Web UI via Open WebUI — chat interface, no terminal needed

✅ Live

🏛️

Knowledge Archaeology — surface forgotten connections

🔜 P2

⏱️

Temporal Reasoning — how your thinking evolved over time

🔜 P2

🧠

Epistemic Audit — weekly report on knowledge gaps

🔜 P2


Install

A one-click Windows installer is coming soon. For now, install from source — the pipx path below is the quickest, and Docker brings up the full stack. Works on Windows, Mac, and Linux.

1. Claude Desktop (MCP) — the recommended way to use Loom

Install Loom from source (the pipx step below is fine), then wire it into Claude Desktop and ask your vault questions right from the chat. Full walkthrough: Connect to Claude Desktop.

git clone https://github.com/KlossKarl/loom
cd loom
LOOM_EMBEDDED=true pipx install .   # or: pip install -e .
loom init                           # creates config.yaml (required before connecting)
# then add loom to claude_desktop_config.json — see the section linked above

2. Docker: full stack, no manual dependency setup

Brings up the vector store, graph database, and local LLM with no manual Python/Neo4j/Ollama install:

git clone https://github.com/KlossKarl/loom
cd loom

# 1. Configure: copy the env template and set your vault path
cp .env.example .env
# then edit .env — LOOM_VAULT_PATH is required (point it at your notes folder)

# 2. CPU-only or Mac? Comment out the `deploy: resources: ... nvidia` block
#    in docker-compose.yml first, or the Ollama container will fail to start.

# 3. Bring up the services (Neo4j + Ollama + Loom)
docker compose up -d

docker compose up starts the services but does not index or chat on its own — the Loom container idles until you run commands against it:

docker compose run --rm loom python src/second_brain/second_brain.py --index
docker compose run --rm loom python src/second_brain/second_brain.py --chat

3. pipx: quickest start (no Neo4j, no Ollama required)

LOOM_EMBEDDED=true pipx install .
loom init
loom chat

Embedded mode skips Neo4j entirely. You get full vector search immediately, and graph features unlock when you're ready.

4. Development install (from source)

git clone https://github.com/KlossKarl/loom
cd loom
pip install -e .
LOOM_EMBEDDED=true loom init
loom chat

Drop files into intake/, then ask your notes anything.

Gotchas

  • First graph index is slow: the --graph-index pass uses your local LLM (or Claude Haiku) to extract entities from every file. On a mid-range GPU, expect it to run overnight. Vector index is fast (~10 minutes). For a free, fast first pass, --quick-graph-index uses GLiNER v2.1 locally. See below.

  • Tested on Windows: Loom uses Path() throughout and should work on Mac/Linux, but it's only been tested on Windows. PRs welcome.

  • Neo4j is optional: LOOM_EMBEDDED=true mode skips Neo4j entirely. You get full vector search without it. Graph features unlock when you're ready.

Entity extraction: local and free, or Claude Haiku. The knowledge graph is built by --graph-index (Claude Haiku via the Anthropic API when a key is set, falling back to your local chat_model) or by --quick-graph-index, which uses GLiNER v2.1 (urchade/gliner_multi-v2.1), a 209M-param local NER model with zero API cost that runs entirely on your machine. See docs/usage.md for the trade-offs.


Control Panel

Loom ships with a desktop control panel (src/whisper/intake_tray.py):

  • Intake tab: drag-and-drop files + YouTube URL queue

  • Search tab: vector and graph search against your vault

  • Chat tab: persistent local LLM conversation

  • Status tab: vault stats, watcher status, index runner

Runs in the system tray. Start it with:

python src/whisper/intake_tray.py

Once the Windows installer ships, it will launch this automatically on first run.


What it actually does

Drop a file into the intake folder. It routes itself.

lecture.mp3          ->  Whisper transcription -> vault -> indexed
paper.pdf            ->  PDF to markdown -> vault -> indexed
thread.txt           ->  Web digest -> structured note -> vault -> indexed  
https://...url       ->  Same as above
note.md              ->  Copied directly to vault -> indexed

Then ask questions:

> what did the stanford cs229 lecture say about attention mechanisms?
> compare the risk frameworks across my last 5 papers
> what connects OODA loop to predictive coding?
> find everything I've read about CLO structures

Retrieves from ChromaDB (vector search), traverses Neo4j (knowledge graph), and generates the answer with Claude by default (or a local LLM via Ollama). Indexing, retrieval, and your data stay on your hardware.


Example: 30 Papers on RAG Over 6 Months

You've been reading about retrieval-augmented generation for months. Papers, blog posts, podcast transcripts, HN threads. All dropped into intake/ as they came in. You never organized them.

Six months later, you need to write a synthesis. You type:

loom chat
> What are the main approaches to RAG and how do they compare?

Loom searches the vector index, walks the knowledge graph, and returns an answer citing 14 of your notes across 8 sources, including a podcast transcript from January you completely forgot about, and a connection between two papers you never would have made manually.

The answer includes citations back to the exact vault files. You click through, verify, and start writing. The system got smarter while you weren't looking, because the graph accumulated relationships every time you fed it something new.

That's what stateful retrieval means. A stateless tool would have started from scratch.


Architecture

intake/                     <- drop anything here
    |
src/whisper/intake_watcher.py   <- watches folder, routes by file type
    |
+--------------------------------------------------+
|  transcribe.py   pdf_to_md.py   web_digest.py   |
|        Whisper      pymupdf        Claude Code   |
+--------------------------------------------------+
    |
Obsidian Vault              <- all content lands here as markdown
    |
src/second_brain/second_brain.py --index         <- chunks + embeds into ChromaDB
src/second_brain/second_brain.py --graph-index   <- extracts entities/relationships into Neo4j
    |
    +--> src/second_brain/second_brain.py --chat <- CLI: hybrid retrieval (vector + graph + HyDE + rerank)
    |
    +--> loom_serve.py                            <- OpenAI-compatible HTTP API
              |
              +--> Open WebUI (browser)           <- full chat UI with citations
              +--> Continue.dev / any OpenAI client
graph LR
    A[📁 intake/] --> B[Folder Watcher]
    C[🎙️ Audio] --> D[Whisper]
    E[📄 PDF] --> F[PyMuPDF]
    G[🌐 Web URL] --> H[Web Digest]
    B --> I[(ChromaDB\nVector Index)]
    D --> I
    F --> I
    H --> I
    I --> J[Neo4j\nKnowledge Graph]
    I --> K[💬 loom chat]
    J --> K
    I --> M[loom_serve.py\nOpenAI-compatible API]
    J --> M
    M --> N[🖥️ Open WebUI]
    K --> L[📝 Obsidian Vault]

For details on the knowledge graph schema, see docs/graph_schema.md.


What's actually different

Loom is opinionated about graph schema and entity resolution. It trades flexibility for long-term coherence: every relationship is typed, every entity is resolved, and the graph gets smarter the more you feed it.

A few specific things, since "local-first RAG" is a crowded space.

Constrained typed relationships, not LLM-emergent slop. The Neo4j schema uses a fixed set of relationship types: CITES, INFLUENCES, EXTENDS, CONTRASTS_WITH, UNCERTAIN_SAME_AS, UNTYPED_RELATION, CO_OCCURS_WITH, and REFERS_TO. Anything the LLM tries to emit outside that set is rewritten to UNTYPED_RELATION with the original predicate preserved on r.raw_type. This is more restrictive than letting the model invent edge types, but the graph stays coherent at scale instead of fragmenting into thousands of one-off predicate names. The validation happens at the graph write layer, not just in the prompt.

Evidence-backed graph extraction. Every semantic edge (and every MENTIONS link from a chunk to an entity) carries an evidence_span (a ≤200-char verbatim quote from the source chunk) and an edge_confidence score. Every :Entity node carries an extraction_confidence (max of all scores ever seen for it). You can ask "where did this come from?" and get the actual line of text the LLM was looking at when it made the claim. Most graph-RAG tools throw this provenance away the moment extraction finishes.

Entity resolution with canonical keys + aliases. "PAC-learning", "PAC learning", and "pac learning" all collapse to the same canonical key in the graph (lowercase, hyphens → spaces, stripped possessives/articles/accents, conservative plural strip), with the original surface forms preserved as Alias nodes linked via HAS_ALIAS. This handles the entity dedup problem most LLM-extracted graphs ignore. Without it, the graph fills with near-duplicate nodes and cross-document traversal breaks down.

Wikilink-aware graph. Obsidian [[wikilinks]] between vault notes are extracted and written as Document→Document REFERS_TO edges: user-curated structure, no LLM call. The graph respects the connections you drew by hand, not just the ones a model inferred.

Adaptive query routing, not blind hybrid retrieval. Most personal RAG tools run the same retrieval pipeline regardless of query type. This one classifies the query first (semantic, relational, or hybrid), routes to the appropriate store (ChromaDB, Neo4j, or both), then runs a sufficiency check and loops up to 3 times if context is insufficient. The router falls back to vector if graph comes up empty, or expands into graph if vector results don't answer the question. The route taken is logged so you can see how the system is thinking.

Cross-document queries vector search cannot answer. Because entities are shared nodes across documents, you can traverse Document -> Chunk -> Entity <- Chunk <- Document to find pairs of documents that both reference the same concept. That's a single Cypher traversal. Pure vector RAG cannot answer this structurally no matter how big the context window gets.


Retrieval Benchmarks

Loom ships with a reproducible retrieval-evaluation harness (eval/). It samples 50 factual chunks from the vault, uses Claude to generate 150 questions whose answers are stated in those chunks, then runs every question through five retrieval modes built from Loom's own retrieve() primitives and scores Recall@5/@10 and MRR@10.

Latest run — 150 queries, 18,175-chunk corpus, mxbai-embed-large:

Mode

Recall@5

Recall@10

MRR@10

BM25 only

0.773

0.827

0.648

Vector only

0.653

0.720

0.468

Hybrid (BM25 + vector + RRF)

0.760

0.853

0.589

Hybrid + rerank

0.593

0.673

0.508

Hybrid + rerank + HyDE

pending

pending

pending

Honest takeaways: RRF fusion gives the best Recall@10 (0.853), and on this query distribution BM25 is a remarkably strong baseline — the eval questions reuse source-passage vocabulary, which favours lexical matching. The current small cross-encoder reranker (ms-marco-MiniLM-L-6-v2) actually regresses on this corpus: adding it drops Recall@10 from 0.853 to 0.673 (and MRR@10 from 0.589 to 0.508). That's a real negative result, not a tuning artifact — and it's exactly the motivation for the planned reranker upgrade to bge-reranker-v2-m3 (Roadmap). The Hybrid + rerank + HyDE row is pending a valid ANTHROPIC_API_KEY — that mode makes one Claude call per query, so it only runs when a working key is configured.

Full methodology, analysis, and reproduction steps: eval/results.md.

python eval/sample_chunks.py   # sample factual chunks (seeded, reproducible)
python eval/gen_queries.py     # Claude generates 3 Qs/chunk -> eval/queries.jsonl
python eval/run_eval.py        # score all modes -> eval/results.md

Requirements

This list is for source installs. The Windows installer (coming soon) will handle Python, Ollama, and the starter model automatically.

Be honest with yourself about this list before starting.

  • Python 3.10+

  • Ollama - local LLM inference for chat. Pull a model based on your hardware:

    Profile

    VRAM

    Chat model

    Quality

    Budget

    4-8GB

    ollama pull llama3:8b

    Good for chat, basic graph

    Mid (default)

    8-16GB

    ollama pull deepseek-r1:14b

    Solid all-around

    High

    16-24GB

    ollama pull deepseek-r1:32b

    Better local graph extraction

    Workstation

    48GB+

    ollama pull llama3:70b

    Near-frontier quality

    Embeddings run in-process via sentence-transformers (current default mxbai-embed-large). No separate Ollama pull required. For new installs we recommend qwen3-embedding, which leads current retrieval benchmarks; mxbai-embed-large remains a solid fallback.

    Graph extraction defaults to Claude Haiku via the Anthropic API when anthropic.api_key is set in config.yaml. This is the recommended path: it's faster, more accurate, has prompt caching (~5× cost reduction), and costs about $5 for a full 18K-chunk index. If no API key is configured, Loom falls back to your local chat_model for extraction.

  • Neo4j Desktop - knowledge graph database

    • Free, but requires manual setup (see below)

  • Obsidian - vault is just a folder of markdown, Obsidian is optional but recommended

  • Claude Code - used for free-tier web digest processing (optional but recommended)

  • Decent hardware. 16GB RAM minimum. A GPU with 8GB+ VRAM makes graph indexing significantly faster.


Loom Capture

If you only want the intake half (automatic Whisper transcription, PDF → markdown, web/HN/Reddit/Wikipedia digests, all dropping into any folder you point at), there's a standalone product:

pipx install loom-capture
loom-capture init
loom-capture watch

loom-capture/. Free, MIT, no API key, no vector DB, no graph. Works with Obsidian, Logseq, or any folder of markdown files.

What a day with Loom Capture looks like

Morning: you listen to a podcast and drop the mp3 into your vault. Capture transcribes it via Whisper and files it as markdown.

Afternoon: you paste three URLs into a text file in intake/: an arXiv paper, a Wikipedia article, and an HN thread. Capture digests all three into clean markdown notes.

Evening: you open Obsidian and everything is there, searchable, formatted, filed. You did zero manual work.


Manual config

The loom init wizard handles this for you, but if you prefer to set things up by hand:

cp config.template.yaml config.yaml
# edit config.yaml with your paths

Neo4j setup (only needed for the full stack, not embedded mode):

  1. Install Neo4j Desktop

  2. Create a new Project, add a Local DBMS

  3. Set a password, start the instance

  4. Put the password in config.yaml under second_brain.neo4j_password


config.yaml reference

paths:
  obsidian_vault: C:\Users\you\Documents\Obsidian Vault
  chroma_dir: C:\Users\you\Documents\second_brain_db

second_brain:
  vaults:
    - C:\Users\you\Documents\Obsidian Vault
  
  embed_model: mxbai-embed-large
  chat_model: deepseek-r1:14b
  
  neo4j_uri: neo4j://127.0.0.1:7687
  neo4j_password: yourpassword
  
  # customize entity types for your domain
  entity_types:
    - Person
    - Concept
    - Method
    - Paper
    - Organization
    - Dataset

intake:
  folder: C:\Users\you\Documents\loom\intake
  auto_index: true
  web_digest_free: true    # true = Claude Code (free), false = Anthropic API

Usage

The fastest path is the intake watcher. Run it once and drop files into intake/; they get transcribed/digested and indexed automatically:

python src/whisper/intake_watcher.py

The full command reference covers indexing, graph building (LLM and GLiNER), chat, web digests, transcription, research-source batches, and the in-chat retrieval toggles. See docs/usage.md.


Pre-built topic packs

loom ships with ~28 pre-built semantic corpora, curated knowledge bases (AI, quant finance, mathematics, law, philosophy, and more) you can ingest in a few hours to start with a connected, queryable foundation instead of an empty vault. All free via Claude Code.

→ See docs/topic-packs.md for the full list and usage.


Known issues

  • Tested on Windows. Paths use Path() throughout so it should work on Mac/Linux, but that hasn't been tested. PRs welcome.

  • Graph indexing is slow on large vaults: roughly 0.5 seconds per chunk on the current default hardware profile. For a vault with thousands of files this means running overnight. Batched UNWIND writes (Phase 2) will cut this significantly on multi-core machines.

  • No web UI. Web UI is now live via loom_serve.py + Open WebUI. See Quick Start — UI Mode.


Roadmap

P0: Onboarding (shipped)

  • docker-compose.yml: one command full stack

  • LOOM_EMBEDDED=true: zero-dependency first run

  • pipx installable: pipx install .

  • loom init wizard

P1: Loom Capture (shipped)

  • Standalone intake pipeline: pipx install loom-capture

  • Three commands: loom-capture init, watch, digest

Extraction Quality: Phase 0 + Phase 1 (shipped 2026-05-20)

  • Claude Haiku as default graph extraction model (with prompt caching, ~5× cost reduction)

  • Evidence spans + confidence scores on every entity and every semantic/MENTIONS edge

  • Typed exception handling + quality metrics summary at end of every --graph-index run

  • UNCERTAIN_SAME_AS / UNTYPED_RELATION split (retires POSSIBLY_SAME_AS)

  • Hallucination guard: entity names validated against source text

  • Obsidian [[wikilinks]] → Document→Document REFERS_TO edges

  • In-process sentence-transformers embedding (eliminates the old Ollama HTTP 500s under load)

  • BM25 disk cache (avoids cold rebuild every session)

  • ingested_at on every graph node: temporal foundation

Extraction Quality: Phase 2 (pending, new hardware)

  • Batched UNWIND Neo4j writes: utilises full CPU core count

  • Reranker upgrade to bge-reranker-v2-m3

  • Degree cap on graph traversal (hub-node protection)

  • Retrieval deduplication

  • First full clean graph index run on the new PC

P2: Next Generation (coming)

  • Knowledge Archaeology: graph algorithms surface forgotten notes

  • Epistemic Audit: weekly knowledge gap report

  • Temporal Reasoning: reconstruct how your thinking evolved

  • loom_serve.py: OpenAI-compatible HTTP API (Open WebUI, Continue.dev) ✅

  • Argument layer: extract claims/evidence/debate structure into the graph

  • Graph visualization UI

  • Mac/Linux testing and fixes

  • Browser extension for web digest


Project structure

loom/
├── src/                  # all Python modules
│   ├── whisper/          # audio/video to markdown + intake watcher/tray
│   ├── second_brain/     # core: index, chat, graph
│   └── web_digest/       # all ingestion scripts + topic files
├── experiments/          # exploratory scripts (lightrag_test, etc.)
├── docs/                 # guides, ADRs, usage, topic packs
├── config.template.yaml  # starting point — copy to config.yaml
└── config.yaml           # your config (gitignored)

Support

If Loom is useful to you, a ⭐ on GitHub helps more people find it.

Questions or ideas? Open an issue.


Advanced: GPU Acceleration

The installer uses CPU-only PyTorch by default (~200 MB). For GPU-accelerated embeddings, install CUDA torch manually:

pip install torch --index-url https://download.pytorch.org/whl/cu121

Note: Loom's LLM inference (chat, analysis, graph extraction) runs through Ollama, which manages GPU access independently. CUDA torch only affects the embedding model (sentence-transformers).

Benchmarking Embedding Models

To compare embedding models on your actual vault content:

python experiments/embedding_benchmark.py --sample 500

Samples chunks from your existing ChromaDB collection, re-embeds them with each candidate model, runs a set of queries against each, and reports latency + retrieval results. Defaults to comparing mxbai-embed-large, nomic-embed-text-v1.5, and bge-m3. Results saved to experiments/embedding_benchmark_results.json.


Contributing

Loom is a solo project but issues and PRs are welcome.

  • Bug reports: open a GitHub issue with steps to reproduce

  • Feature requests: check the roadmap first, then open an issue

  • Pull requests: keep them focused. One thing per PR


License

MIT

A
license - permissive license
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
1Releases (12mo)
Commit activity

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/KlossKarl/loom'

If you have feedback or need assistance with the MCP directory API, please join our Discord server