Folio
Folio is a privacy-first document assistant for interacting with files in explicitly granted folders using your own LLM — no data leaves your machine. It can:
List authorized folders (
list_roots): Discover which directories the assistant may access.Browse directory contents (
list_dir): List files and sub-folders within a granted root.Read file contents (
read_file): Retrieve the full text of any file within a granted root.Search documents (
search): Perform keyword-based searches across granted files, returning matching lines with file name and line number.Answer natural-language questions (
answer): Ask questions in plain English — the server retrieves relevant context and uses the host's LLM to generate a grounded answer.Summarize files (
summarize): Generate a concise summary of a file's key points using the host's own LLM.Edit files (
edit_file): Make targeted in-place edits by specifying an exact string to find and replace.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Foliosummarize the file README.md"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Folio
A privacy-first, zero-AI-cost "chat with your documents" assistant. One engine, two front doors: a fully offline command-line app, and an OAuth-secured web app.
Point Folio at a folder and it can read, search, summarize, answer questions about, and reformat the files inside it — and nothing else. Your documents never leave your machine, and the generation runs on your own model, so there is no per-token AI bill.
Folio is built on the Model Context Protocol (MCP) and is designed to exercise its most advanced features in a way that is essential to the use case, not bolted on.

The Aurora web UI answering with live search progress. Demo recorded on a cloud provider for speed — Folio runs identically on local Ollama (just slower).
Why the design is meaningful
Feature | Why it matters here |
Roots | The assistant can only touch the folders you explicitly grant — enforced on every file operation. The rest of your disk is unreachable. This is privacy by construction. |
Sampling | The generation is done by the host's own model (local Ollama by default), not the server. A hosted Folio therefore never racks up AI bills, and each person's documents are processed by their own model. |
Dual transport | The same server runs locally over stdio (the CLI) or remotely over Streamable HTTP (the web app). |
OAuth 2.1 | The web app identifies users with GitHub sign-in; the server validates every request's bearer token before running a tool. |
Logging & progress | Long jobs ("search the whole folder") stream live status, so you can see real work happening instead of a frozen spinner. |
Related MCP server: Local RAG
Features
🔒 Granted-folder access only — a non-negotiable path guard on every tool.
🔎 Search across files with live progress.
📝 Summarize a file and answer questions grounded in your documents.
✏️ Reformat / edit files in place.
💻 Offline CLI (stdio) — private, $0, works with no internet.
🌐 Web app (FastAPI + GitHub login) — a designed browser UI: upload or pick documents, ask questions, and watch live search progress stream in over SSE.
🔁 Provider-agnostic — local Ollama by default, with Cerebras and OpenRouter as drop-in cloud fallbacks, plus Anthropic and OpenAI as optional bring-your-own-key upgrades in the CLI (switch via two lines in
.env).
Architecture
┌───────────────────────────────────────┐
CLI (local): │ mcp_server.py │
main.py ──stdio────▶│ ONE FastMCP engine: │
│ • tools (list/read/search/...) │
Web (remote): │ • resources (roots, files) │
Browser ⇄ FastAPI ──┤ • prompts (/summarize, /format) │
host │ roots guard · sampling · logging · │
──HTTP + OAuth─────▶│ progress · OAuth (HTTP mode) │
└───────────────────────────────────────┘Both hosts speak to the same server; only the transport differs. In each case the host (the CLI or the FastAPI app) is the MCP client: it runs the agent loop, holds the model keys, and answers the server's sampling / roots / logging / progress callbacks.
Requirements
(First) run and build the previous repository to setup local ollama + litellm tool-calling environment repo-here
ollama pull qwen2.5:7b(No API key needed for the local path. Cerebras / OpenRouter are optional cloud fallbacks.)
Python 3.10+ (developed on 3.13)
uv for dependency + run management
Install
uv syncConfigure
Copy the template and fill in values locally (the real .env is git-ignored):
cp .env.example .envThe default configuration uses local Ollama and needs no keys:
LLM_PROVIDER=ollama
LLM_MODEL=ollama_chat/qwen2.5:7bSwitch provider by changing LLM_PROVIDER + LLM_MODEL together (see .env.example for the
Cerebras / OpenRouter forms).
Run the CLI
Grant one or more folders and start chatting:
uv run main.py path/to/your/folderWith no folder it defaults to the bundled sample-docs/. At the > prompt you can:
ask questions in plain English (e.g. "how long are backups retained?"),
mention a file with
@, e.g.@policies/data-retention.md what does this say?,run a command, e.g.
/summarize README.md.
Exit with Ctrl+C.
Run the web app
The OAuth-secured FastAPI web app runs with:
uv run uvicorn web.app:app --port 8000Then open http://localhost:8000 and sign in with GitHub. From there you can load the bundled
sample documents or upload your own, click a file to ground a question, and watch live search
progress as Folio answers. It runs the same MCP engine as the CLI, just over HTTP.
Screenshots
|
|
Sign in → load the bundled sample set or upload your own. | Click a document to drop its exact |
|
|
Live search log + progress stream while Folio works. | The fully-offline CLI host (stdio), grounded in your docs. |
Benchmarks & tradeoffs
Folio is provider-agnostic, so which model you point it at is a real tradeoff. These numbers come
from running the actual agent loop over a small e-commerce document set (5 grounded questions
with known answers), paced to respect free-tier rate limits — see benchmarks/ for
the reproducible harness and full results.
Model | Correct | Median latency/call | Notes |
Cerebras | 5/5 | ~0.5s | fast + accurate |
Cerebras | 5/5 | ~0.7s | fast + accurate |
OpenRouter | 3/5* | ~3.1s | *2 misses were free-tier rate-limit 429s, not wrong answers |
Ollama | 3/5 | ~9.5s | private + $0, but ~15–20× slower and less consistent |
Three takeaways:
Speed — Cerebras answers ~15–20× faster per call than the local 7B (~0.5s vs ~9.5s).
Accuracy — the bigger cloud models are consistently correct; the small local 7B is inconsistent (it confabulated a non-existent file path and sometimes answered "no information").
Free-tier reality — free cloud tiers rate-limit/throttle under load (OpenRouter's free
llama-3.3-70bwas entirely unusable in a burst). For real throughput, bring your own key.
The honest tradeoff triangle: privacy (local Ollama) ↔ speed + quality (Cerebras) ↔ cost (free,
but throttled). Reproduce with uv run python benchmarks/benchmark.py.
Which provider should I use?
Privacy / offline / $0 →
ollama(local; slower and less consistent, but nothing leaves your machine).Fast + accurate, free →
cerebras(near-instant; free tier throttles under heavy use).Maximum quality (paid, CLI only) →
anthropicoropenaiwith your own key (e.g.LLM_MODEL=anthropic/claude-opus-4-8). The web app never accepts keys — this is a CLI upgrade.
Limitations (honest)
Local 7B is slow and inconsistent.
qwen2.5:7bis private and free but answers in seconds-to-tens-of-seconds and occasionally mis-uses tools (confabulates a path, or gives up). For reliable, fast answers, use a cloud provider.Free cloud tiers throttle. Cerebras and OpenRouter free tiers rate-limit under sustained/burst use; the benchmark above was captured with fresh quota — re-running on an exhausted free tier shows worse numbers (a quota artifact, not the models). Bring your own key for real throughput.
The web app is a shared/hosted convenience, not the fully-private path. Uploaded documents go to the server (isolated per user, deleted on logout + a TTL sweep). For fully offline / private use, run the CLI with local Ollama.
Text documents only. Folio reads text files (Markdown,
.txt,.csv, code, …) — no images/audio/video.Anthropic / OpenAI need paid API credits. They are optional CLI upgrades, not required.
Tech stack
MCP Python SDK (FastMCP) — the server engine, the client session, both transports, and the OAuth modules.
litellm — one OpenAI-shaped API over Ollama, Cerebras, OpenRouter, Anthropic, and OpenAI (routes by the model-string prefix).
FastAPI + uvicorn — the async web host; its native async + SSE match the MCP SDK and the live-progress requirement.
sse-starlette — streams live log/progress events to the browser over Server-Sent Events.
itsdangerous — the web app remembers your GitHub sign-in in a small signed-cookie session;
itsdangerouscryptographically signs that cookie so it can't be tampered with (a tamper-evident seal). It's what makes "stay logged in" trustworthy.prompt-toolkit — the interactive CLI prompt, autocompletion, and history.
Security notes
The roots guard (
is_path_allowed) is enforced in every file tool — the SDK provides the roots mechanism, but Folio enforces the policy.OAuth applies to the HTTP transport only; the local stdio CLI needs none (you launched the process yourself).
Secrets live only in the git-ignored
.env..env.exampleships blank placeholders.
Project status
Complete: the MCP engine (roots, sampling, logging/progress, dual transport, OAuth), the offline CLI, and the OAuth-secured FastAPI web app. A hosted public deployment is intentionally not provided — a shared demo on free model tiers would burn the operator's quota, and the web app deliberately never accepts a visitor's API key — so run it locally (it works fully on your own machine, with the steps above).
License & credits
MIT. Built by extending
ollama-mcp-chat-cli, an earlier MCP chat-CLI project.
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/Shahrukh19S/folio-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server



