Which integrations are available for this server?

Supports indexing and semantic search for Markdown files, preserving document structure and metadata for accurate context retrieval. Provides semantic search capabilities for local Obsidian vaults, allowing AI tools to find and reference notes within a knowledge base.

cowork-semantic-search

by ZhuBit

Overview Schema Related Servers Score Discussions

Python

Local

cowork-semantic-search

If you find this useful, consider giving it a ⭐ — it helps others discover the project.

Local semantic search for your documents. No API keys. No cloud. Works with any MCP client.

Why

AI coding tools are powerful, but they have blind spots when it comes to your local files:

Frozen knowledge -- training data has a cutoff. Your latest reports, notes, and contracts don't exist in the model's world.
Context window limits -- you can't paste 500 documents into a prompt.
No cross-file search -- your AI tool can read one file at a time, but can't search across your entire document library for the relevant pieces.

This plugin bridges that gap. It indexes your local documents into a small, fast vector database. When you ask a question, it retrieves only the relevant pieces -- so your AI tool can answer with your actual data.

Your documents --> chunked --> embedded --> local vector DB
                                                 |
         Your question --> embedded --> similarity search --> relevant chunks --> AI answers

Related MCP server: ickyMCP

Features

Fully offline -- one-time model download (~120MB), then no network calls. No data leaves your machine.
Incremental indexing -- SHA-256 content hashing. Only changed files get reprocessed. Re-indexing 1000 files where 3 changed takes seconds.
Multilingual -- handles 50+ languages natively. Search in one language, find results in another.
Hybrid search -- combines semantic similarity with full-text keyword search via Reciprocal Rank Fusion. Catches what pure vector search misses.
Multiple formats -- txt, md, pdf, docx, pptx, csv out of the box.
Any MCP client -- works with Claude Code, Cursor, Windsurf, Cline, and any other MCP-compatible tool.
Zero infrastructure -- LanceDB stores everything as local files. No server, no Docker, no database to manage.

Supported Formats

Format	Extension	Details
Plain text	`.txt`	UTF-8 with fallback
Markdown	`.md`	Raw text preserved
PDF	`.pdf`	Page-level extraction with metadata
Word	`.docx`	Full paragraph extraction
PowerPoint	`.pptx`	Slide-level extraction with metadata
CSV	`.csv`	Row-based text extraction

Quick Start

1. Install

git clone https://github.com/ZhuBit/cowork-semantic-search.git
cd cowork-semantic-search
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[all]"

2. Configure your MCP client

Add the server to your MCP client's config. Replace paths with your own.

{
  "mcpServers": {
    "semantic-search": {
      "command": "/absolute/path/to/.venv/bin/python",
      "args": ["-m", "server.main"],
      "cwd": "/absolute/path/to/cowork-semantic-search",
      "env": {
        "PYTHONPATH": "/absolute/path/to/cowork-semantic-search"
      }
    }
  }
}

{
  "mcpServers": {
    "semantic-search": {
      "command": "/absolute/path/to/.venv/bin/python",
      "args": ["-m", "server.main"],
      "env": {
        "PYTHONPATH": "/absolute/path/to/cowork-semantic-search"
      }
    }
  }
}

{
  "mcpServers": {
    "semantic-search": {
      "command": "/absolute/path/to/.venv/bin/python",
      "args": ["-m", "server.main"],
      "env": {
        "PYTHONPATH": "/absolute/path/to/cowork-semantic-search"
      }
    }
  }
}

Open Cline > MCP Servers icon > Configure > Advanced MCP Settings, then add:

{
  "mcpServers": {
    "semantic-search": {
      "command": "/absolute/path/to/.venv/bin/python",
      "args": ["-m", "server.main"],
      "env": {
        "PYTHONPATH": "/absolute/path/to/cowork-semantic-search"
      }
    }
  }
}

3. Restart your MCP client and go

"Index all documents in ~/Documents/projects"

"Search for 'quarterly revenue report'"

First run downloads the embedding model (~120MB), then everything runs offline.

Example: Search Your Obsidian Vault

If you keep notes in Obsidian (or any folder of markdown files), this plugin turns your AI tool into a search engine for your knowledge base.

You: "Index my vault at ~/Documents/ObsidianVault"
AI:  Indexed 847 files -> 3,291 chunks in 42s

You: "What did I write about API rate limiting?"
AI:  Found 6 relevant chunks across 3 files:
       - notes/backend/rate-limiting-strategies.md
       - projects/acme-api/design-decisions.md
       - daily/2025-11-03.md
       ...

You: "Find anything about the client meeting last November, use hybrid search"
AI:  Found 4 results using hybrid search (vector + keyword):
       - meetings/2025-11-12-acme-kickoff.md
       - daily/2025-11-12.md
       ...

Works the same with PDFs, Word docs, PowerPoints, and CSVs -- just point it at a folder.

Tools

Tool	Description
`index_folder`	Index or re-index all documents in a folder. Incremental -- skips unchanged files.
`semantic_search`	Search indexed documents using natural language. Supports `vector` and `hybrid` modes.
`get_index_status`	Show total chunks, file count, and list of indexed files.
`reindex_file`	Force re-index a single file, bypassing the hash cache.

How It Works

Parse -- extract text from each document, preserving structure (pages, slides)
Chunk -- split into ~400 character overlapping pieces for precise retrieval
Embed -- convert each chunk into a 384-dimensional vector using paraphrase-multilingual-MiniLM-L12-v2
Store -- save chunks + vectors in a LanceDB database (a local file, no server needed)
Search -- embed your query, find nearest chunks by cosine similarity, optionally combine with full-text keyword search via RRF

Advanced Usage

from server.indexer import index_folder
from server.search import semantic_search

# Index a folder
result = index_folder("/path/to/docs")
print(f"{result['files_indexed']} files -> {result['total_chunks']} chunks")

# Search
results = semantic_search("project deadline", mode="hybrid")
for r in results["results"]:
    print(f"  {r['file_name']}: {r['text'][:100]}...")

Architecture

server/
  main.py       # MCP server + tool definitions
  parsers.py    # Per-format text extraction
  chunker.py    # Text splitting with metadata
  indexer.py    # Discovery, hashing, embedding pipeline
  store.py      # LanceDB vector store + FTS + hybrid search
  search.py     # Query embedding + search orchestration

Component	Choice	Why
MCP framework	FastMCP	Clean tool definitions, async support
Embeddings	sentence-transformers	Offline, multilingual, fast
Vector DB	LanceDB	Serverless, embedded, FTS built-in
Chunking	langchain-text-splitters	Battle-tested recursive splitting
PDF	PyMuPDF	Fast, accurate extraction
DOCX	python-docx	Lightweight, no system deps
PPTX	python-pptx	Slide-level extraction

Development

source .venv/bin/activate
pytest tests/ -v

56 tests covering parsers, chunking, indexing, search, and MCP tool integration.

Contributions welcome -- open an issue or submit a PR.

Roadmap

ONNX runtime for faster embeddings (drop PyTorch dependency)
Configurable chunk size and overlap via tool params
Multi-folder named indexes
Metadata filtering (date ranges, tags, custom fields)
Watch mode (auto-reindex on file changes)

Support

If this is useful to you, consider giving it a ⭐ — it helps others find the project.

License

AGPL-3.0 -- free to use, modify, and self-host. If you offer this as a network service, you must share your source code. See LICENSE for details.

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

1Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Related MCP Servers

Local RAG
Knowledge & Memory RAG Systems Vector Databases
shinpr
A
license
A
quality
B
maintenance
Privacy-first local document search using semantic search. Runs entirely on your machine with no cloud services, supporting PDF, DOCX, TXT, and Markdown files.
Last updated 2026-07-31
7
4,509
358
MIT
ickyMCP
RAG Systems Vector Databases Legal & Compliance
dl1683
A
license
-
quality
D
maintenance
RAG-powered document search server that enables semantic search across large collections of legal and business documents (PDF, Word, Excel, PowerPoint) using local embeddings with no API costs.
Last updated 2026-04-28
4
MIT
punt-quarry
RAG Systems Search Vector Databases
punt-labs
A
license
-
quality
A
maintenance
Enables local semantic search over documents (PDFs, code, etc.) using local embedding models, allowing AI agents and users to find information by meaning without API keys or cloud services.
Last updated 2026-08-03
3
MIT
rag-mcp
RAG Systems Knowledge & Memory Search
Kamalesh-Kavin
F
license
A
quality
D
maintenance
Enables indexing local documents (PDF, Markdown, text, code) into a knowledge base and querying them via semantic search using local embeddings, all running privately on your machine.
Last updated 2026-03-18
4

View all related MCP servers

Related MCP Connectors

Darwin RAG
Local-first RAG engine with MCP server for AI agent integration.
Ragora
Search your knowledge bases from any AI assistant using hybrid RAG.
Amber
Long-term memory for AI assistants. Hybrid retrieval, query expansion, auto-topics.

View all MCP Connectors

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ZhuBit/cowork-semantic-search'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

cowork-semantic-search

Why

Features

Supported Formats

Quick Start

1. Install

2. Configure your MCP client

3. Restart your MCP client and go

Example: Search Your Obsidian Vault

Tools

How It Works

Advanced Usage

Architecture

Development

Roadmap

Support

License

Maintenance

Resources

Looking for Admin?

Related MCP Servers

Local RAG

ickyMCP

punt-quarry

rag-mcp

Related MCP Connectors

Latest Blog Posts

MCP directory API