Grove
Provides tools for searching, reading, writing, and analyzing notes in an Obsidian vault, with features like hybrid search, frontmatter validation, git versioning, and graph analysis.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Grovesearch my notes about machine learning"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Grove
Open-source MCP server that makes your Obsidian vault accessible from any AI client.
One URL. Claude, ChatGPT, Cursor, or any MCP-compatible client connects and gets structured access — search, read, write-back, graph analysis. Your vault stays yours: markdown files in a git repo, versioned forever.
Connect any client:
https://api.grove.md/mcp┌─────────────────────────────────────────────────────────┐
│ Claude (any surface) │
│ phone · web · desktop · Code │
└──────────────────────┬──────────────────────────────────┘
│ MCP (Streamable HTTP + OAuth)
▼
┌─────────────────────────────────────────────────────────┐
│ Grove Server │
│ Auth · Rate Limiting · Write Queue · Trails · Graph │
└──────────────────────┬──────────────────────────────────┘
│
┌──────────┼──────────┐
▼ ▼ ▼
Hybrid Vault Write
Search (git) Queue
BM25+Vec (mutex)
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────┐
│ Your Obsidian Vault (git-tracked) │
│ markdown files · frontmatter · wikilinks │
└─────────────────────────────────────────────────────────┘Karpathy's LLM Knowledge Bases thread described the problem: "I think there is room here for an incredible new product instead of a hacky collection of scripts." He's using brute-force context windows and LLM-maintained index files to manage a personal knowledge base in Obsidian. That works at 100 articles. It doesn't work at 1,000 notes accumulated over years — journal entries, concept notes, people, recipes, projects — where the connections between ideas matter as much as the ideas themselves.
Grove is the infrastructure layer. Not another note-taking app. Not a RAG pipeline. A self-hosted server that exposes six structured MCP tools — carefully designed to compose into higher-level workflows without overwhelming agent tool selection.
How I got here
I keep my life in an Obsidian vault. ~1,000 notes, PARA-organized, git-tracked. Journal entries going back years. Concept notes on ideas I've been developing across multiple jobs. People, recipes, a financial plan, business notes. Connected with wikilinks into a knowledge graph.
I built a set of Claude skills to tend this vault like a garden — searching for notes, planting new concepts, harvesting entities from journal entries, detecting withering ideas that need attention. It worked. But only from my laptop. Only in Claude Code. Only when Obsidian was open and the local search server was running.
Then I opened Claude on my phone during a conversation and realized: it had no idea who I was. Every concept, every person, every connection — gone. I was starting from zero in every conversation that wasn't on my one machine.
So I put the search engine on a VPS. Added auth. Added write-back so Claude could plant notes from anywhere, not just read them. Added frontmatter validation so agents couldn't corrupt the vault. Added a graph analyzer so Claude could understand the shape of my knowledge, not just search it. Every write creates a git commit. Every commit is auditable. The vault is the source of truth — the index is derived.
Three weeks in, I haven't manually searched my own notes once. Claude finds what I need in ~30ms from any surface. When it learns something new in a conversation, it plants it. The knowledge compounds.
The six tools
Grove exposes exactly six MCP tools. Not twelve, not twenty. Six. Agent tool selection degrades past ~10 tools, so these are carefully designed to compose into higher-level workflows.
query — hybrid search
Combines BM25 keyword matching and vector embeddings via Reciprocal Rank Fusion. Supports three sub-query types: lex (exact terms), vec (semantic meaning), and hyde (hypothetical document — describe what the answer looks like).
{
"searches": [
{"type": "lex", "query": "taste graph"},
{"type": "vec", "query": "how design preferences propagate through social networks"}
],
"intent": "research on aesthetic preference modeling"
}~30ms per query. Embeddings via Voyage AI (voyage-4-large, 1024-dim). No OpenAI dependency. Your notes stay on your server — only query text is sent for embedding.
get — read a note
Fuzzy path resolution. Say "Taste Graph" instead of "Resources/Concepts/Taste Graph.md". Returns frontmatter (parsed), content, and a content hash for optimistic concurrency on writes.
Resolution order: direct path → strip prefixes → journal date patterns → case-insensitive basename → alias lookup → BM25 fallback.
multi_get — batch read
Read multiple notes by glob pattern or comma-separated list. "Resources/People/*.md" returns all people notes. Capped at 50 per request.
write_note — create or update
The part that makes this more than a search engine. Claude creates a note → server validates frontmatter (type, tags, required fields, path/type consistency) → writes to disk → git add → git commit → synchronous reindex → fire-and-forget embedding → return.
Every write goes through a serialized mutex queue. No concurrent git operations, ever. Optimistic concurrency via content hashing — if the note changed since you last read it, the write is rejected.
{
"path": "Resources/Concepts/Context Engineering.md",
"frontmatter": "{\"type\": \"concept\", \"tags\": [\"concept\", \"ai\"]}",
"content": "The practice of shaping what goes into an LLM's context window..."
}list_notes — browse the vault
Glob-based listing with metadata. Check for duplicates before creating. Get the entity vocabulary for a folder. Scan inbox items.
vault_status — five modes
Mode | What it does |
| Note count, last commit, vault path |
| Recent git log, filterable by date and path |
| Orphan notes, broken wikilinks, missing frontmatter, stale inbox |
| Brandes' centrality, BFS clusters, bridge detection, most-connected hubs |
| Lifecycle classification: seeds → sprouts → growing → mature → dormant → withering |
The digest mode is what powers the daily garden practice. It stratifies every note by age, backlink count, word count, and modification recency. Seeds are ideas less than a week old. Withering notes haven't been touched in six months and have almost no connections. The agent surfaces what needs attention without you having to remember what's in the vault.
Why not the 24 existing Obsidian MCP servers?
There are already 24 Obsidian MCP servers on the registry. Every one of them is:
Local-only. Runs on your laptop, works from that laptop. Open Claude on your phone — nothing.
Read-only. Search and retrieve, but no write-back. Knowledge flows one direction.
Flat filesystem. Treats your vault as a bag of text files. No frontmatter validation, no type system, no vault conventions.
Grove is remote, bidirectional, and opinionated. It treats your vault as a structured knowledge base with rules — types, required fields, path conventions — that agents must respect. It won't let Claude corrupt your vault.
Local MCP servers | Grove | |
Works from phone/web | No | Yes |
Write-back | No | Yes, with validation + git commit |
Search | Keyword or vector | Hybrid BM25 + vector (RRF) |
Graph analysis | No | Centrality, clusters, lifecycle |
Auth | None needed | OAuth 2.0 + bearer tokens |
Scoped sharing | No | Trails — topic-scoped access with deny lists |
Embeddings | Usually OpenAI API | Voyage AI, your infrastructure |
The write flow
This is the part I spent the most time getting right. Agents are eager writers and sloppy validators. The write path is intentionally strict:
1. Proxy validates bearer token, checks rate limit (20 writes/min)
2. Server validates:
- Path: no traversal, inside vault, .md only, no symlinks
- Frontmatter: type in whitelist, required fields present, tags include type
- Path/type consistency (Resources/Concepts/* must be type:concept)
- File size < 100KB
- Optimistic concurrency: if_hash must match the recorded source_hash
3. Write queue (mutex):
- Write file to disk
- git add → git commit (with API key identity)
- Record provenance (path → source_hash + commit_sha)
- Fire-and-forget: QMD reindex, embed new content
4. Return source_hash + content_hash + commit SHA
5. Batched git push every 30 secondsServer is the sole writer to git. Local machines pull. One direction. No split-brain.
The two-hash model
The discovery worker mutates notes after they land (it auto-wires wikilinks based on concept extraction). That makes the on-disk content hash unstable: a caller that writes content X and receives hash H may find that two seconds later the file on disk has a different hash, because the worker rewrote links.
Grove solves this with two hashes in every write/get response:
source_hash— hash of what the caller wrote, pinned to caller intent. Stays stable across discovery mutations. Use this asif_hashandIf-Matchfor subsequent updates.content_hash— hash of the on-disk content at return time. Equal to source_hash immediately after write; may diverge once discovery runs. Useful if you specifically care about the current file bytes.
The server validates if_hash against the recorded source_hash first, and
falls back to the disk hash for files that have no provenance entry yet
(pre-migration notes, or notes created directly by the discovery worker).
The REST API returns source_hash as the HTTP ETag, so standard
If-Match round-trips work correctly.
Batch writes
For multi-note workflows (creating a cluster of related notes, or chaining
create + update), write_note accepts an operations: [] array executed in
a single mutex acquisition:
write_note({
operations: [
{ path: "a.md", frontmatter: ..., content: "..." },
{ path: "b.md", frontmatter: ..., content: "..." },
{ path: "a.md", frontmatter: ..., content: "...", if_hash_from_op: 0 }
],
atomic: true
})if_hash_from_opchains an op'sif_hashto an earlier op'ssource_hash— no round-trip needed for the intermediate hash.atomic: truerolls the entire batch back on any failure (git reset + provenance restore).atomic: false(default) leaves earlier successes committed when a later op fails.Pre-flight validation runs before the mutex — invalid ops fail fast, no partial state.
Architecture
┌──────────────┐
│ Auth Proxy │ :8420
│ OAuth + PKCE │
│ Rate limiting│
│ Audit log │
└──────┬───────┘
│
┌──────┴───────┐
│ Grove Server │ :8190
│ 6 MCP tools │
│ Write queue │
└──────┬───────┘
│
┌────────────┼────────────┐
│ │ │
┌──────┴──┐ ┌─────┴────┐ ┌───┴────┐
│ QMD │ │ Voyage │ │ git │
│ BM25 │ │ Vectors │ │ vault │
│ :8177 │ │ (API) │ │ ~/life │
└─────────┘ └──────────┘ └────────┘TypeScript, ~7,600 LOC, raw
node:http(no frameworks)Auth: OAuth 2.0 with PKCE for Claude.ai custom connectors, bearer tokens for CLI/API
Search: QMD (BM25) + Voyage AI (voyage-4-large embeddings) + RRF fusion
Persistence: Git repo. Every note is a markdown file. Every mutation is a commit.
Rate limiting: 120 reads/min, 20 writes/min per API key. LRU idempotency cache.
Self-hosting
Grove runs on an AWS t3.medium (~$30/mo). Here's how to deploy your own:
Prerequisites
Node.js >= 22
Git
A git-tracked Obsidian vault (or any folder of markdown files)
QMD for search indexing
A Voyage AI API key (for vector embeddings, optional — falls back to BM25-only)
Setup
git clone https://github.com/jmilinovich/grove.git
cd grove
npm install
# Create an API key
grove keys create my-key
# → grove_live_abc123... (save this, it's shown once)
# Start QMD (search engine)
qmd serve --vault ~/your-vault --port 8177
# Start Grove
npm run proxyVPS deployment
# On your VPS:
git clone https://github.com/jmilinovich/grove.git
cd grove && npm install && npm run build
# PM2 for process management
pm2 start dist/proxy.js --name grove-proxy
pm2 start "qmd serve --vault /root/vault --port 8177" --name qmd-server
# Nginx for TLS (use certbot for Let's Encrypt)
# Proxy pass to localhost:8420Vault syncs every 5 minutes via cron. Embeddings are computed on the server via Voyage AI API at index time.
The garden: how I actually use this
Grove is the infrastructure. The garden is the practice. I have seven Claude skills that compose Grove's six tools into a daily knowledge workflow:
Skill | What it does | Grove tools used |
| Daily review — surfaces seeds, sprouts, withering notes |
|
| Hybrid search across the vault |
|
| Create new entity notes with proper scaffolding |
|
| Extract entities from journal entries, wire up wikilinks |
|
| Vault diagnostics — orphans, broken links, stale inbox |
|
| Random walks on the knowledge graph |
|
| Evaluate bookmarks and promote aligned ones to vault notes |
|
The lifecycle: forage brings in raw material → plant scaffolds new entities → harvest extracts connections from journal entries → garden surfaces what needs attention → tend finds structural issues → wander discovers unexpected connections. Knowledge compounds across every conversation.
These skills are open source and designed to be adapted. The pattern generalizes — you don't need my vault structure to use Grove.
The lineage
Obsidian (5M+ users, local-first markdown vaults)
│
├── 24 MCP servers (local-only, read-only, flat filesystem)
│
├── Karpathy's "LLM Knowledge Bases" (Apr 2026)
│ Context windows + LLM-maintained indices
│ "There is room for an incredible new product"
│
└── Grove (this)
Remote, bidirectional, vault-aware
6 structured tools, hybrid search, graph analysis
Self-hosted, privacy-first, git-nativeMCP is now an open standard under the Linux Foundation, backed by Anthropic, Google, and OpenAI. "Knowledge & Memory" is the largest category in the MCP registry at 283 servers. The protocol is infrastructure, not a fad.
What's next
Multi-vault — Add a second vault (work knowledge base) with per-vault keys and cross-vault search
grove.md — A hosted version where you connect your GitHub repo and get an MCP endpoint. No VPS required. Cross-client: Claude, ChatGPT, Cursor, anything MCP.
Trail portal — Web dashboard for managing trails, viewing per-trail usage, and a consumer onboarding page with MCP connection instructions
Semantic filtering — LLM judge for trail edge cases where tag/type/path filtering is too coarse (deferred until real edge cases prove the need)
Trails: scoped sharing
Share slices of your knowledge without exposing the whole vault. A trail is a topic-scoped window into your grove — you define what's visible (tags, types, paths) and what's hidden, then hand someone a token. They connect via MCP and see only what the trail allows.
┌─────────────────────────────────────────────────────────┐
│ Your vault (1,000 notes) │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ AI Research │ │ Journal │ │ Finances │ │
│ │ trail ✓ │ │ hidden ✗ │ │ hidden ✗ │ │
│ └─────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
│
▼ trail-scoped token
┌─────────────────────────────────────────────────────────┐
│ Consumer sees: 200 notes on AI, ML, tech, design │
│ Consumer can't see: journal, health, finances, private │
│ Consumer gets 404 (not 403) for hidden notes │
└─────────────────────────────────────────────────────────┘Creating a trail
grove trails create "AI Research" \
--allow-tags ai,ml,tech,design \
--deny-tags private,personal,finance,health \
--allow-paths "Resources/" \
--deny-paths "Journal/,Areas/Finances/,Areas/Health/"
# → Trail created: trail_a1b2c3d4
# → Token (shown once, give to consumer):
# → grove_live_abc123...The token is a standard Grove bearer token, scoped to this trail. Give it to a collaborator, plug it into a Claude custom connector, or use it in any MCP client. They'll never know what they can't see — hidden notes return 404, not 403.
How tools behave under a trail
Every one of the six MCP tools respects trail scope automatically:
Tool | Trail behavior |
| Searches the full index for recall, then strips non-trail notes before returning results |
| Returns 404 for notes outside the trail (doesn't leak that the note exists) |
| Silently omits non-trail notes from results |
| Only returns trail-visible notes |
| Constrains writes to trail-allowed paths/tags (if write access is enabled) |
| Returns scoped stats — note count and types within trail only |
Filtering model
Trails use a deterministic prefilter — no LLM in the loop, sub-millisecond per note. Filters combine with AND logic:
allow_tags — note must have at least one matching tag
deny_tags — note must NOT have any of these tags
allow_types — note type must be one of these (empty = all types)
deny_types — note type must NOT be one of these
allow_paths — note path must start with one of these prefixes (empty = all paths)
deny_paths — note path must NOT start with any of these prefixes
The test suite verifies 100% precision (zero sensitive notes leak through) and 100% recall (all on-topic notes pass through) on labeled datasets of 20 sensitive and 20 on-topic notes.
Managing trails
grove trails # list all trails
grove trails disable trail_id # temporarily disable (consumer gets auth errors)
grove trails delete trail_id # permanently remove trail + revoke its keyEach trail has independent rate limits (default: 60 reads/min, 0 writes/min). All trail access is logged with the trail ID, tool used, total results found, and how many were filtered.
Consumer setup
Give your consumer the token and the MCP endpoint. That's it:
{
"mcpServers": {
"grove": {
"url": "https://api.grove.md/mcp",
"headers": { "Authorization": "Bearer grove_live_abc123..." }
}
}
}They get the same six tools, same hybrid search, same write validation — just scoped to what you've chosen to share.
When you don't need Grove
You don't need it if your vault is small enough to paste into a context window. You don't need it if you only use one AI client on one machine. You don't need it if your notes don't have structure worth preserving.
You need it when you want your AI to know who you are regardless of which surface you're talking to it from. When you want knowledge to compound across conversations instead of evaporating. When you have a vault with real structure — types, conventions, connections — and you want agents to respect that structure, not steamroll it.
License
MIT
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/jmilinovich/grove'
If you have feedback or need assistance with the MCP directory API, please join our Discord server