Skip to main content
Glama
GRU-953

Memorised them All

PyPI Release CI Python License: MIT 100% local Claude tokens


🧠 What is this?

Imagine you could hand Claude a filing cabinet of your documents and say "remember all of this." Later you just ask questions, and Claude answers from what it remembers — citing which document each fact came from.

That's Memorised them All. It's a small add-on (an MCP server) for Claude Desktop and Claude Code that:

  1. Reads your files — PDFs, Word/Excel/PowerPoint, web pages, images (with OCR), even audio — and converts them to clean text on your computer.

  2. Builds a memory — a searchable map of the people, topics, and facts inside them (a "knowledge graph"), plus a tidy summary and an interactive mind map.

  3. Lets you ask — Claude recalls just the relevant bits when you ask, instead of you pasting whole files into the chat.

The one-line idea: Claude tokens cost money; your computer's effort is free. So all the heavy lifting happens locally, and Claude only ever receives a tiny answer. Memorising a 500-page folder costs roughly zero chat tokens.

💬 See it in action

Once it's installed, you just talk to Claude normally:

You:    Memorise everything in ~/Documents/contracts.
Claude: ✅ Digested 38 files → 421 facts across 7 themes. (took ~30s, all local)

You:    Which contracts mention an auto-renewal clause, and when do they renew?
Claude: Three do — the Globex MSA (renews 1 Jan, 60-day notice), … [cites each source]

You:    Open the mind map.
Claude: Here's your interactive map: /…/mindmap.html

Nothing left your machine. Claude never saw the 38 files — only the small answers.


Related MCP server: MCP-RAG

🚀 Get started in about a minute

You need Python 3.10 or newer (most Macs and Linux PCs already have it; Windows users can install it from python.org — tick "Add to PATH"). Everything else installs automatically the first time you use it.

Pick whichever matches how you use Claude:

▶ Claude Desktop (easiest — no terminal)

  1. Download memorised-them-all.mcpb from the latest release.

  2. Double-click it. Claude Desktop opens and offers to install the extension — click Install.

  3. Done. Start a chat and say "Memorise my Documents folder."

▶ Claude Code

claude
# then, inside Claude Code:
/plugin marketplace add GRU-953/memorised-them-all
/plugin install memorised-them-all

▶ Any other setup (pip)

pip install memorised-them-all

Then register it with Claude — easiest is to let it configure itself:

mta setup-claude     # writes the MCP server into Claude Desktop (and Claude Code) config

(The install.sh installer runs this for you automatically.) Or add it by hand — it just runs mta serve:

{
  "mcpServers": {
    "memorised-them-all": { "command": "mta", "args": ["serve"] }
  }
}

💡 Prefer Homebrew or Docker? brew install GRU-953/memorised-them-all/mta, or see Run it in Docker. All paths give you the same thing.

Do I need to install AI models?

No — it works the moment it's installed. Out of the box it uses fast, built-in techniques (no downloads, fully offline).

For sharper summaries and search, it can optionally use a free local AI model via Ollama (still 100% on your machine). If Ollama is present it's used automatically; if not, you're never blocked. To check what you have and get one-line setup tips, run:

mta doctor

📁 Your first memory

  1. Tell Claude what to remember — point it at a folder, a file, or a pattern:

    "Memorise everything in ~/Documents/research."

    (Behind the scenes Claude calls the digest tool. The first run may take a little longer while it sets things up.)

  2. Ask away — in plain language:

    "What do my documents say about the Q3 budget?" "Summarise everything about Project Apollo." "Who is mentioned most often, and in which files?"

  3. Explore visually (optional):

    "Open the mind map." — an interactive, offline map of how everything connects.

  4. Keep it tidy — separate memories per topic with projects:

    "Memorise ~/work/clientA into a project called clientA." "Using the clientA project, what were the agreed deliverables?"

Your memory lives in a folder on your computer (~/.memorised-them-all by default) and persists between chats. Re-running "memorise" updates it.


🎯 What can I use it for?

  • 📚 Research & study — digest a pile of papers or a textbook, then ask for explanations, comparisons, and citations.

  • 📑 Contracts & policies — load all your agreements and ask "which ones auto-renew?" or "what are the termination clauses?"

  • 🗂️ Personal knowledge base — point it at years of notes, receipts, or manuals and actually find things.

  • 🎧 Meetings & lectures — drop in audio recordings; it transcribes locally and remembers the content.

  • 🖼️ Scanned documents & images — it reads text from photos and scans (OCR) so they become searchable.

  • 🔒 Sensitive material — legal, medical, financial, or confidential files that must never leave your machine.


✨ Why is it free of token cost?

When you normally share a document with Claude, the whole thing is sent into the conversation — and you pay (in tokens) for every word, every time. A few big PDFs can blow your whole context window.

Memorised them All flips that around:

  • Converting, reading, and summarising your files happens on your computer.

  • Claude only ever receives a tiny result — a count, a short summary, or a few relevant snippets (capped small).

  • So memorising a giant folder, and asking about it again and again, stays near-zero context tokens.

It's the difference between mailing someone an entire library versus asking a librarian a question.


🔒 Is my data private?

Yes — that's the whole point. By default:

  • 100% local. Your files are read, converted, and remembered on your own machine. Their contents are never sent to Claude's servers, to us, or to anyone.

  • No telemetry, no tracking, no accounts, no API keys.

  • Works fully offline. Disconnect the internet and it still memorises and answers.

  • Open source (MIT). You (or anyone) can read exactly what it does.

The only times anything touches the network are clearly optional and on your command: installing/updating software, an occasional check for a new version (turn off with MTA_AUTO_UPDATE=off), or if you explicitly choose to point it at a remote AI backend. With the defaults, your documents stay with you. See SECURITY.md for the full threat model.


🛡️ Built to be reliable

It's been hardened through repeated, deliberate stress-testing so it stays calm on real, messy folders:

  • It always finishes. A broken, enormous, or stuck file can't freeze the job — every file has a time limit and is skipped if it jams, so the rest still go through.

  • One bad file won't sink the batch. Unreadable files, looping shortcuts, and odd or over-long filenames are skipped, never fatal.

  • Your memory is crash-safe. If the computer is interrupted mid-write, your memory isn't corrupted or lost — writes are atomic, and anything that looks damaged is backed up before it's touched.

  • It's honest about its mode. If the local AI engine isn't responding, it tells you plainly and labels the memory as "basic" instead of pretending it's fine (see the FAQ below).

  • It reads awkward files. Windows "Unicode" (UTF-16) text, scanned images (OCR), audio, and even legacy Bengali (Bijoy/SutonnyMJ) fonts are handled; empty and binary files are skipped cleanly.

  • It can't be tricked into reading elsewhere. A shortcut planted in a folder that points outside that folder is ignored — a digest only ever reads what you pointed it at.


❓ Questions & troubleshooting

Yes — the software is free and open-source (MIT), and it runs on your own computer, so there are no per-use fees or token charges. The only "cost" is a little of your computer's time and disk space.

Make sure the extension/plugin is installed and enabled, then fully restart Claude Desktop (or your Claude Code session). To confirm the engine itself works, run mta status in a terminal — it should print your setup. Still stuck? Run mta doctor.

The first run sets things up (and, if you have Ollama, may load a model). Later runs are much faster, and re-memorising only processes what changed. Add fast ("memorise … in fast mode") to skip the AI step entirely for a quick, fully-deterministic pass.

PDFs, Word/Excel/PowerPoint, plain text/Markdown, HTML, CSV/JSON/XML, RTF, EPUB, common images (via OCR), and audio (transcribed locally). Beyond those, any other text-based file is digested too (source code, .log, .ini, .tex, …); only genuine binaries are skipped — so a whole folder gets captured. Ask Claude to "list what's digestible in this folder" to see.

Text in any language works (it's Unicode throughout). For scanned documents and images, OCR runs English + Bangla by default (eng+ben); set MTA_OCR_LANG to other Tesseract codes (e.g. eng+hin+ara). Any language pack you don't have installed is dropped automatically, so it never errors.

Tell Claude "forget the clientA project" (it asks you to name the project, on purpose). It deletes that project's memory from your disk — irreversibly.

No. It's built to work completely offline. Internet is only used for optional, opt-in things like installing updates.

It's being honest with you. "Basic mode" (also called classical) means a memory was built without the local AI model — either because that's the default for your machine (the safe micro profile), or because the AI engine (Ollama) wasn't responding. The memory is still complete and searchable; it's just less detailed than the AI-assisted "accurate" mode.

If you expected the sharper mode, ask Claude to "check memory status". The health line tells you plainly what's wrong — most often "Ollama is running but its AI engine isn't responding", which is fixed by fully quitting and reopening Ollama and making sure your model is installed (ollama pull <model>). The tool no longer hides this behind a long silent run — it warns you at the start of memorising and labels the result.


🧰 The tools Claude gets

Once installed, Claude can use these nine tools on your behalf (you just talk normally — Claude picks the right one):

Tool

What it does for you

digest

Reads files/folders and builds (or updates) the memory.

convert

Just converts files to clean Markdown (no memory) — handy for exporting or fixing legacy Bengali.

recall

Answers a question from memory with a few relevant, cited snippets.

memory_overview

Gives the big picture — a synopsis and the main themes.

list_digestible

Shows which files in a folder it can read.

open_mindmap

Opens the interactive, offline mind map.

export_memory

Saves the memory as portable Markdown notes you can keep or share.

memory_status

Reports your local setup (models, tools, projects).

forget

Deletes a project's memory (you name it explicitly).

Every tool returns only small results — never your documents' contents.


🛠️ For power users

You don't need any of this to use the app — but it's here if you want it.

The same engine ships as an mta command:

mta digest ~/Documents/research        # build/update memory (--fast to skip the LLM)
mta recall "what about the Q3 budget?" # query it
mta overview                            # synopsis + themes
mta status                              # local stack health   ·   mta doctor  (fix deps)
mta export ./notes                      # export portable Markdown
mta mindmap --open                      # open the mind map

The same eight tools can be served beyond Claude:

mta serve --http     # MCP over HTTP (loopback + an auto-generated bearer token)
mta serve --rest     # plain JSON:  POST http://127.0.0.1:8765/tools/<name>
mta export-schema    # tool schemas as OpenAI / Gemini / OpenAPI 3.1 (no drift)
mta recipes          # copy-paste connection snippets for every client

Both HTTP modes are loopback-only by default and require a bearer token. See mta recipes for ready-to-paste setup.

A multi-arch image (amd64 + arm64) is published to GHCR:

docker run -d --name mta -p 127.0.0.1:8765:8765 -v mta-data:/data \
  ghcr.io/gru-953/memorised-them-all:latest
docker logs mta     # copy the printed bearer token + the `claude mcp add …` line

It serves the tools over MCP HTTP and keeps memory in the /data volume. Mount documents read-only (-v /path/to/docs:/docs:ro) and digest the in-container path.

By default the optional AI step runs on local Ollama. To use another local server (LM Studio, llama.cpp, vLLM, …) set MTA_BACKEND:

MTA_BACKEND=lmstudio mta digest ~/docs          # OpenAI-compatible server on :1234
MTA_BACKEND=openai MTA_BACKEND_URL=http://127.0.0.1:8080/v1 mta digest ~/docs

Set MTA_EXTRACT_MODEL / MTA_EMBED_MODEL to that server's model names. Pointing it at a non-local URL sends content off your machine — that's your explicit choice (you'll get a one-time warning).

Everything has sensible defaults. Common knobs (set as environment variables):

Variable

Default

Meaning

MTA_HOME

~/.memorised-them-all

where memory is stored

MTA_OCR_LANG

eng+ben

OCR languages (Tesseract codes; missing packs dropped automatically)

MTA_EXTRACT_MODEL

(set by profile)

extraction LLM — overrides the profile; alternatives under "Choosing a model" below

MTA_EMBED_MODEL

qwen3-embedding:0.6b

multilingual embeddings (1024-d, incl. Bangla)

MTA_VISION_MODEL

qwen3-vl:4b-instruct

image caption / OCR-assist (32-language)

MTA_WHISPER_MODEL

small

on-device speech-to-text size

MTA_NO_OLLAMA

unset

force fully-offline mode (no AI model)

MTA_AUTO_UPDATE

on

daily update check (off to disable)

MTA_PROFILE

micro

sizing tier: micro (4 GB / no-GPU — the safe default) · auto (size to this machine) · small · standard (16 GB) · large (32 GB+) · offline

MTA_CONVERT_TIMEOUT

120

per-file conversion timeout (seconds); a file that hangs the parser is skipped, never stalls the batch. 0 disables

MTA_MEMORY_GB

auto

override detected RAM (for containers/VMs that misreport it, to pick the right profile)

MTA_BACKEND / MTA_BACKEND_URL

auto

use another local model server (see above)

MTA_HTTP_*

off

options for the opt-in HTTP/REST servers

The default is safe on a 4 GB machine with no graphics card. Out of the box (profile micro) the plugin runs fully offline — classical extraction + a tiny embedding model — so a digest always completes and never thrashes, on any computer. To use sharper local AI, pick a bigger profile with one setting — the easiest is MTA_PROFILE=auto, which sizes the models to your computer (a 16 GB machine gets the qwen3 stack below; a 4 GB one stays on micro). You can also set any model directly (env var or extension setting), which overrides the profile. All tags are verified-real Ollama models pulled on demand. Sizes are q4-class downloads.

Profile

Good for

What it uses

micro (default)

4 GB, no GPU

offline/classical + tiny embedder + vision off

auto

recommended

sizes the stack to your machine's RAM

standard

~16 GB

the qwen3 stack below

large

32 GB+

qwen3:8b + vision

The model tables below apply when a profile (or you) turns the local LLM on:

Extraction LLM — MTA_EXTRACT_MODEL (entity/relation/fact extraction + summaries):

Model

Size

Best for

qwen3:4b-instruct (default)

2.5 GB

Optimum on 16 GB — newer-gen, non-thinking (clean JSON), 119 languages incl. Bangla

qwen3:8b

5.2 GB

Higher quality if you have RAM — best Bangla + instruction-following

gemma3:4b-it-qat

4.0 GB

QAT ≈ BF16 quality, 140+ languages

llama3.2:3b

2.0 GB

Lightest solid English-centric option

qwen2.5:7b

4.7 GB

Previous default (older generation)

Pin the -instruct builds. Bare qwen3:4b / qwen3-vl:4b are thinking models that emit chain-of-thought (bad for strict JSON / captions). Newest/experimental (mid-2026, less battle-tested): qwen3.5:4b (text) and gemma4:e2b-it-qat (text+vision) exist now — fine to try, but the picks above are the stable, instruct-guaranteed defaults.

Embeddings — MTA_EMBED_MODEL (entity resolution + recall):

Model

Size

Best for

qwen3-embedding:0.6b (default)

0.64 GB

Optimum — 1024-d, 100+ languages incl. Bangla, top multilingual retrieval (MMTEB ≈ 64)

bge-m3

1.2 GB

Explicit Bengali + hybrid dense/sparse (helps fuzzy entity matching)

embeddinggemma:300m

0.62 GB

768-d multilingual; smaller footprint

nomic-embed-text

0.27 GB

English-only (previous default)

Switching the embedding model changes the vector dimension, so re-digest with reset: true afterwards — recall transparently falls back to lexical scoring until you do (it never errors).

Vision — MTA_VISION_MODEL (captions images OCR can't read):

Model

Size

Best for

qwen3-vl:4b-instruct (default)

3.3 GB

32-language OCR incl. Bangla; reads charts / diagrams / forms

qwen3-vl:2b-instruct

1.9 GB

Same OCR engine, lighter

gemma3:4b

3.3 GB

140+ languages

granite3.2-vision:2b

2.4 GB

Document / table / chart OCR (IBM)

moondream

1.7 GB

Tiniest / fastest (English-only; previous default)

Speech-to-text — MTA_WHISPER_MODEL: default small (good speed/accuracy on 16 GB); medium or large-v3-turbo for maximum accuracy, tiny/base for low-resource. Runs on the Apple GPU via MLX-Whisper.

The default stack is already optimal for 16 GB. To favour maximum quality (needs more RAM), escalate the extractor and re-digest:

MTA_EXTRACT_MODEL=qwen3:8b mta digest ~/docs --reset

Millions of Bengali documents were typed with the Bijoy keyboard in SutonnyMJ (and 110+ other ANSI fonts); read as plain text they come out as mojibake. Memorised them All upgrades them to standard Unicode Bengali automatically during conversion, so digest / recall / embeddings work on real text instead of garbage.

  • Font-aware for Office files (.docx / .pptx / .xlsx): only runs whose font is a Bijoy-family font are converted, so mixed English + Bengali documents come out clean (the English is left exactly as-is). Plain text uses a conservative density check that never touches ordinary English.

  • A faithful pure-Python port of the Mukti converter — no new dependency, fully local, on by default (MTA_BANGLA_LEGACY=off to disable).

Convert a folder to Markdown (with the legacy upgrade) without building memory:

mta convert ~/docs                 # writes ~/docs/markdown_converted/*.md
mta convert ~/docs --out ~/md_out  # …or choose the output folder

digest runs the very same conversion as its first step, so converting to Markdown is the default everywhere — reach for convert only when you want the .md files themselves.

convert (files → Markdown, locally) → extract (entities, relations, facts) → graph (build + detect communities/themes) → summarise (layered: per-theme + a global synopsis) → embed (vectors for search) → materialise (memory.md, per-doc notes, mind map). recall embeds your question and returns the closest, capped, cited snippets. With no AI model available it falls back to fast classical techniques, so a digest always succeeds. See CHANGELOG.md and SECURITY.md for details.


💻 Platforms

macOS (Apple-silicon optimised), Linux, and Windows · Python 3.10–3.12 · tested on all three in CI.

🙏 Credits & license

Built on the shoulders of MarkItDown, Ollama, NetworkX, and the Model Context Protocol. Optional community-detection extras (python-igraph, leidenalg) are GPL-licensed and not installed by the MIT core. See ACKNOWLEDGEMENTS.md.

MIT licensed · made by GRU-953. Issues and contributions welcome — start with SECURITY.md for the security model.

A
license - permissive license
-
quality - not tested
A
maintenance

Maintenance

Maintainers
Response time
0dRelease cycle
22Releases (12mo)
Commit activity

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/GRU-953/memorised-them-all'

If you have feedback or need assistance with the MCP directory API, please join our Discord server