What can you do with this server?

Folio is a privacy-first document assistant for interacting with files in explicitly granted folders using your own LLM — no data leaves your machine. It can: * List authorized folders (list_roots): Discover which directories the assistant may access. * Browse directory contents (list_dir): List files and sub-folders within a granted root. * Read file contents (read_file): Retrieve the full text of any file within a granted root. * Search documents (search): Perform keyword-based searches across granted files, returning matching lines with file name and line number. * Answer natural-language questions (answer): Ask questions in plain English — the server retrieves relevant context and uses the host's LLM to generate a grounded answer. * Summarize files (summarize): Generate a concise summary of a file's key points using the host's own LLM. * Edit files (edit_file): Make targeted in-place edits by specifying an exact string to find and replace.

Which integrations are available for this server?

Provides OAuth-based authentication via GitHub for the web application. Allows using local Ollama models for document processing without external API costs. Allows using OpenAI's models for document generation with a bring-your-own-key approach.

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Folio summarize the file README.md" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Folio

by Shahrukh19S

Overview Schema Related Servers Score Discussions

Python

Hybrid

Folio

A privacy-first, zero-AI-cost "chat with your documents" assistant. One engine, two front doors: a fully offline command-line app, and an OAuth-secured web app.

Point Folio at a folder and it can read, search, summarize, answer questions about, and reformat the files inside it — and nothing else. Your documents never leave your machine, and the generation runs on your own model, so there is no per-token AI bill.

Folio is built on the Model Context Protocol (MCP) and is designed to exercise its most advanced features in a way that is essential to the use case, not bolted on.

Folio — the Aurora web UI answering a question with live search progress

The Aurora web UI answering with live search progress. Demo recorded on a cloud provider for speed — Folio runs identically on local Ollama (just slower).

Why the design is meaningful

Feature	Why it matters here
Roots	The assistant can only touch the folders you explicitly grant — enforced on every file operation. The rest of your disk is unreachable. This is privacy by construction.
Sampling	The generation is done by the host's own model (local Ollama by default), not the server. A hosted Folio therefore never racks up AI bills, and each person's documents are processed by their own model.
Dual transport	The same server runs locally over stdio (the CLI) or remotely over Streamable HTTP (the web app).
OAuth 2.1	The web app identifies users with GitHub sign-in; the server validates every request's bearer token before running a tool.
Logging & progress	Long jobs ("search the whole folder") stream live status, so you can see real work happening instead of a frozen spinner.

Related MCP server: punt-quarry

Features

🔒 Granted-folder access only — a non-negotiable path guard on every tool.
🔎 Search across files with live progress.
📝 Summarize a file and answer questions grounded in your documents.
✏️ Reformat / edit files in place.
💻 Offline CLI (stdio) — private, $0, works with no internet.
🌐 Web app (FastAPI + GitHub login) — a designed browser UI: upload or pick documents, ask questions, and watch live search progress stream in over SSE.
🔁 Provider-agnostic — local Ollama by default, with Cerebras and OpenRouter as drop-in cloud fallbacks, plus Anthropic and OpenAI as optional bring-your-own-key upgrades in the CLI (switch via two lines in .env).

Architecture

                       ┌───────────────────────────────────────┐
  CLI (local):         │            mcp_server.py              │
   main.py ──stdio────▶│   ONE FastMCP engine:                 │
                       │     • tools   (list/read/search/...)  │
  Web (remote):        │     • resources (roots, files)        │
   Browser ⇄ FastAPI ──┤     • prompts (/summarize, /format)   │
        host           │   roots guard · sampling · logging ·  │
   ──HTTP + OAuth─────▶│   progress · OAuth (HTTP mode)        │
                       └───────────────────────────────────────┘

Both hosts speak to the same server; only the transport differs. In each case the host (the CLI or the FastAPI app) is the MCP client: it runs the agent loop, holds the model keys, and answers the server's sampling / roots / logging / progress callbacks.

Requirements

(First) run and build the previous repository to setup local ollama + litellm tool-calling environment repo-here
```
ollama pull qwen2.5:7b
```
(No API key needed for the local path. Cerebras / OpenRouter are optional cloud fallbacks.)
Python 3.10+ (developed on 3.13)
uv for dependency + run management

Install

uv sync

Configure

Copy the template and fill in values locally (the real .env is git-ignored):

cp .env.example .env

The default configuration uses local Ollama and needs no keys:

LLM_PROVIDER=ollama
LLM_MODEL=ollama_chat/qwen2.5:7b

Switch provider by changing LLM_PROVIDER + LLM_MODEL together (see .env.example for the Cerebras / OpenRouter forms).

Run the CLI

Grant one or more folders and start chatting:

uv run main.py path/to/your/folder

With no folder it defaults to the bundled sample-docs/. At the > prompt you can:

ask questions in plain English (e.g. "how long are backups retained?"),
mention a file with @, e.g. @policies/data-retention.md what does this say?,
run a command, e.g. /summarize README.md.

Exit with Ctrl+C.

Run the web app

The OAuth-secured FastAPI web app runs with:

uv run uvicorn web.app:app --port 8000

Then open http://localhost:8000 and sign in with GitHub. From there you can load the bundled sample documents or upload your own, click a file to ground a question, and watch live search progress as Folio answers. It runs the same MCP engine as the CLI, just over HTTP.

Screenshots



Sign in → load the bundled sample set or upload your own.	Click a document to drop its exact `@mention` into the question.

Live search log + progress stream while Folio works.	The fully-offline CLI host (stdio), grounded in your docs.

Benchmarks & tradeoffs

Folio is provider-agnostic, so which model you point it at is a real tradeoff. These numbers come from running the actual agent loop over a small e-commerce document set (5 grounded questions with known answers), paced to respect free-tier rate limits — see benchmarks/ for the reproducible harness and full results.

Model	Correct	Median latency/call	Notes
Cerebras `gpt-oss-120b`	5/5	~0.5s	fast + accurate
Cerebras `zai-glm-4.7`	5/5	~0.7s	fast + accurate
OpenRouter `gpt-oss-120b:free`	3/5*	~3.1s	*2 misses were free-tier rate-limit 429s, not wrong answers
Ollama `qwen2.5:7b` (local)	3/5	~9.5s	private + $0, but ~15–20× slower and less consistent

Three takeaways:

Speed — Cerebras answers ~15–20× faster per call than the local 7B (~0.5s vs ~9.5s).
Accuracy — the bigger cloud models are consistently correct; the small local 7B is inconsistent (it confabulated a non-existent file path and sometimes answered "no information").
Free-tier reality — free cloud tiers rate-limit/throttle under load (OpenRouter's free llama-3.3-70b was entirely unusable in a burst). For real throughput, bring your own key.

The honest tradeoff triangle: privacy (local Ollama) ↔ speed + quality (Cerebras) ↔ cost (free, but throttled). Reproduce with uv run python benchmarks/benchmark.py.

Which provider should I use?

Privacy / offline / $0 → ollama (local; slower and less consistent, but nothing leaves your machine).
Fast + accurate, free → cerebras (near-instant; free tier throttles under heavy use).
Maximum quality (paid, CLI only) → anthropic or openai with your own key (e.g. LLM_MODEL=anthropic/claude-opus-4-8). The web app never accepts keys — this is a CLI upgrade.

Limitations (honest)

Local 7B is slow and inconsistent. qwen2.5:7b is private and free but answers in seconds-to-tens-of-seconds and occasionally mis-uses tools (confabulates a path, or gives up). For reliable, fast answers, use a cloud provider.
Free cloud tiers throttle. Cerebras and OpenRouter free tiers rate-limit under sustained/burst use; the benchmark above was captured with fresh quota — re-running on an exhausted free tier shows worse numbers (a quota artifact, not the models). Bring your own key for real throughput.
The web app is a shared/hosted convenience, not the fully-private path. Uploaded documents go to the server (isolated per user, deleted on logout + a TTL sweep). For fully offline / private use, run the CLI with local Ollama.
Text documents only. Folio reads text files (Markdown, .txt, .csv, code, …) — no images/audio/video.
Anthropic / OpenAI need paid API credits. They are optional CLI upgrades, not required.

Tech stack

MCP Python SDK (FastMCP) — the server engine, the client session, both transports, and the OAuth modules.
litellm — one OpenAI-shaped API over Ollama, Cerebras, OpenRouter, Anthropic, and OpenAI (routes by the model-string prefix).
FastAPI + uvicorn — the async web host; its native async + SSE match the MCP SDK and the live-progress requirement.
sse-starlette — streams live log/progress events to the browser over Server-Sent Events.
itsdangerous — the web app remembers your GitHub sign-in in a small signed-cookie session; itsdangerous cryptographically signs that cookie so it can't be tampered with (a tamper-evident seal). It's what makes "stay logged in" trustworthy.
prompt-toolkit — the interactive CLI prompt, autocompletion, and history.

Security notes

The roots guard (is_path_allowed) is enforced in every file tool — the SDK provides the roots mechanism, but Folio enforces the policy.
OAuth applies to the HTTP transport only; the local stdio CLI needs none (you launched the process yourself).
Secrets live only in the git-ignored .env. .env.example ships blank placeholders.

Project status

Complete: the MCP engine (roots, sampling, logging/progress, dual transport, OAuth), the offline CLI, and the OAuth-secured FastAPI web app. A hosted public deployment is intentionally not provided — a shared demo on free model tiers would burn the operator's quota, and the web app deliberately never accepts a visitor's API key — so run it locally (it works fully on your own machine, with the steps above).

License & credits

MIT. Built by extending ollama-mcp-chat-cli, an earlier MCP chat-CLI project.

Install Server

license - permissive license

quality

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Tools

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Shahrukh19S/folio-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server