Skip to main content
Glama

docuflow

A unified interface for AI agents and scripts to work with office documents: .docx, .xlsx, .pdf via MCP tools and a Python library. Markdown is the lingua franca for reading, templating, and patching.

Python License: MIT MCP Coverage


Why

Documents in .docx/.xlsx share the same logic (headings, paragraphs, tables), but each library has its own API — and any non-trivial edit usually breaks the formatting (fonts, line spacing, alignment, table borders, headers/footers, sectPr).

docuflow solves this with three ideas:

  1. Markdown as the lingua franca. read → markdown + outline + tables; create/render → markdown → docx. The agent works with familiar text instead of raw XML.

  2. Patches by stable IDs. apply_patches with target_id="h:0.1.0" / "t:0"replace / insert_after / delete / append without losing formatting.

  3. .docx templates as a Normal-style source. Point at a reference document — the new file inherits the font, size, line spacing, page margins, and headers/footers. The "Times New Roman 14 / 1.5 / justified" template look is no longer lost.


Related MCP server: docxtpl MCP Server

Features

  • 📖 Read documents as markdown + structural outline + tables (.docx, .xlsx, .pdf)

  • ✍️ Create from markdown (with Jinja2 templating and YAML frontmatter)

  • 🔧 In-place patches by stable IDs — replace, insert_after, delete, append

  • 🛡 Run-aware in-place replacement that preserves bold, italic, color — both inside and outside the match

  • 🧬 Inline styles (Pandoc-like): {#id .class key=value} on headings, {color="red" bold}…{/} on paragraphs

  • 📐 Template inheritance — fonts, sizes, margins, sectPr from a reference .docx

  • 🔍 Bulk search/replace (including regex) with run-aware formatting preservation

  • 🔒 Sandbox with a directory whitelist and path-traversal protection

  • 📕 PDF reading via opendataloader-pdf (Java 11+)

  • 🤖 MCP server with 11 tools for Claude Desktop / Cursor / Claude Code

  • 🎓 Claude skill at .claude/skills/docuflow-using/SKILL.md — teaching an AI agent how to pick the right tool


Installation

pip install docuflow

# optional: PDF reading via opendataloader-pdf
pip install 'docuflow[pdf]'

Requirements:

  • Python 3.11+

  • Java 11+ (only for PDF reading; everything else works without it)


Quick start

Python API

from pathlib import Path
from docuflow import Sandbox
from docuflow.formats import build_default_registry

sb = Sandbox(roots=[Path.home() / "Documents" / "Work"])
reg = build_default_registry()

# Read
content = reg.get_reader("docx").read(sb.resolve("report.docx"))
print(content.markdown)
for node in content.outline:
    print(node.id, "—", node.title)

# Create from markdown
reg.get_writer("docx").write(
    sb.resolve("new.docx"),
    type(content)(format="docx", markdown="# Title\n\nBody.",
                 outline=[], tables=[], metadata={}),
    template=sb.resolve("reference.docx"),   # ← inherits fonts, margins, sectPr
)

# In-place patches (preserve fonts, alignment, table borders)
from docuflow.core import EditPatch
reg.get_editor("docx").apply(sb.resolve("report.docx"), [
    EditPatch(op="replace",     target_id="h:0",   content="# Renamed"),
    EditPatch(op="replace",     target_id="t:0",   content="| A | B |\n|---|---|\n| 1 | 2 |"),
    EditPatch(op="insert_after", target_id="h:0.0", content="## New section"),
])

MCP server

docuflow-mcp --root ~/Documents/Work --root ~/Documents/Templates

Or via environment variable:

export DOCUFLOW_ROOTS="~/Documents/Work:~/Documents/Templates"
docuflow-mcp --env-roots

Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "docuflow": {
      "command": "docuflow-mcp",
      "args": ["--root", "/Users/me/Documents/Work"]
    }
  }
}

MCP tools (11)

Group

Tool

Purpose

Read

read_document

markdown + outline + tables

get_outline

structure only (cheap for large files)

extract_tables

tables only

Create

create_document

new docx/xlsx from markdown

render_template

Jinja2 + YAML frontmatter → docx/xlsx

list_template_variables

which {{variables}} a template needs

Edit

edit_document

rewrite from markdown (legacy)

apply_patches

in-place patches by ID (recommended)

find_and_replace

in-place search/replace (run-aware)

Meta

get_document_info

author, size, pages

get_sandbox_roots

available directories

Edit pattern: outline → patches

[
  {"op": "replace",      "target_id": "h:0",    "content": "# Renamed"},
  {"op": "insert_after", "target_id": "h:1.0",  "content": "## New section\n\nBody."},
  {"op": "replace",      "target_id": "t:0",    "content": "| A | B |\n|---|---|\n| X | Y |"},
  {"op": "delete",       "target_id": "h:2.1"},
  {"op": "append",       "content": "# Tail\n\nFinal paragraph."}
]

Identifiers:

  • h:N.N.N — heading (N — 0-indexed sibling position; number of segments = level).

  • t:N — table (N — global index in the document).

  • h:0 / h:0.0 / h:0.0.0 — 1st, 2nd, 3rd level respectively.

Why not edit_document for patches? A full rewrite loses styles, fonts, headers/footers, and embedded images. Use apply_patches for targeted changes.


Templates

Jinja2 + YAML frontmatter

templates/contract.md:

---
template_version: 1
style:
  default_font: Times New Roman
  default_size: 12
  align: justify
---

# Contract No. {{number}}

City of {{city}}, {{date}}

{% for stage in stages %}
- **Stage {{loop.index}}**: {{stage.description}} — {{stage.amount}} ₽
{% endfor %}
render_template(
    template_path="templates/contract.md",
    output_path="contracts/2026-001.docx",
    data={"number": "2026-001", "city": "Moscow", "date": "2026-06-17",
          "stages": [{"description": "Prepayment", "amount": "100000"},
                     {"description": "Settlement", "amount": "900000"}]},
)

render_template is strict-undefined: a missing variable raises TemplateRenderError. For large templates call list_template_variables first and diff against data.

.docx template as a formatting source

If you need a specific look ("Times New Roman 14, line spacing 1.5, 2 cm margins, table borders"), pass a reference .docx to template=:

writer.write(out, content, template=Path("reference.docx"))

The writer clears the reference body and fills it with markdown, inheriting the Normal style, sectPr (page margins, page size, headers/footers), and table-style definitions. If the reference is minimal (missing Heading 1, List Bullet, Table Grid), those styles are auto-created.


Inline styles (Pandoc-like)

Heading attributes

# Introduction {#intro}

## Methods {#methods color="#1f4e79" align="center" bold}

# Sheet: Data {#hdr bold color="#1f4e79"}
  • #id — fixes the node ID in the outline.

  • .class — classes (.foo .bar).

  • key=value — properties (color, align, bold, size, font, italic, underline).

  • A bare token (no value) is treated as =true (boldbold=true).

Color is #rrggbb or #rgb only. For .docx the attributes are applied to all runs of the heading. For .xlsx — to the first header row of the sheet.

Inline spans in paragraphs (.docx only)

Text with {color="red" bold}important{/} part.

Syntax {attrs}text{/} — text between { and {/} receives the attributes. The closing {/} is required. Supported attributes: color, bold, italic, underline, size, font.

Document style (YAML frontmatter)

---
title: Quarterly Report
author: Analytics
style:
  default_font: Arial
  default_size: 11
  align: justify
---

For .docx, default_font / default_size / align apply to the Normal style. For .xlsx, they apply to cells from the 2nd row on (the header row stays as is).

Heading IDs

Auto-generated as 0-indexed hierarchical (h:0, h:0.0, h:0.1.0). {#custom-id} overrides.


Architecture

docuflow/
├── core/                ← models, sandbox, markdown, registry
├── formats/
│   ├── docx/            ← Reader, Writer, Editor (in-place)
│   ├── xlsx/            ← Reader, Writer, Editor
│   └── pdf/             ← Reader (opendataloader-pdf)
└── mcp_server.py        ← MCP tools

.claude/skills/
└── docuflow-using/      ← Claude skill for agent integration

Two editor modes:

  • markdown-roundtrip (apply_markdown): rewrite the document from markdown — for "create from scratch".

  • in-place (apply, find_and_replace_in_place): modify XML elements directly, preserving run fonts, table borders, line spacing, sectPr, headers/footers. Recommended for edits.

In-place find_and_replace runs in run-aware mode: for each match it splits the first and last run at the match boundaries, removes runs that fall entirely inside the match, and inserts a single new run with the replacement text, inheriting the first affected run's formatting. Bold/italic/color outside the matched fragment is preserved. Paragraph-level format (alignment, indent, line-spacing, style) is always preserved.


Security

All paths pass through Sandbox.resolve(). Attempts to escape the allowed roots raise PathSecurityError. Symlinks are resolved; relative paths are resolved against the first root.


Limitations

Preserved

NOT preserved from .docx template

Normal style (font, size)

Page headers/footers (unless via template=)

sectPr (margins, page size, orientation)

Section breaks, page numbering

paragraph_format (line-spacing, alignment)

Embedded images, OLE objects

Table Grid (borders, if the style exists)

Track changes, comments

Headers/footers (when template= is used)

For full visual fidelity with a complex template, use template= + apply_patches — the markdown parser keeps most of the template's formatting because you're editing text rather than recreating the document.


Common errors

  • Path outside sandboxPathSecurityError. Don't try to bypass via .. — it's blocked. Ask to add a root.

  • Unknown formatUnsupportedFormatError. Supported: docx, xlsx, pdf (read-only).

  • Missing template variableTemplateRenderError. Strict-undefined by design.

  • edit_document instead of apply_patches and lost formatting → restore from a backup; next time use patches.

  • PDF without JavaPdfJavaNotFoundError. Install JDK 11+.


Claude skill

A Claude skill ships with the repo at .claude/skills/docuflow-using/SKILL.md. It teaches the agent:

  • which MCP tool to pick for a given task (read / create / patch / template),

  • the outline→patches workflow and how to read stable IDs (h:N.N.N, t:N),

  • inline-styles syntax (Pandoc-like),

  • the correct workflow for full template fidelity (template= + apply_patches),

  • common errors and their fixes.

For Russian-speaking agents the same skill is available at .claude/skills/docuflow-using/SKILL.md — it's authored in Russian and references README_RU.md.


Development

# Clone
git clone https://github.com/deja111vu/docuflow.git
cd docuflow

# Create venv
python -m venv .venv
. .venv/bin/activate          # Linux/macOS
.venv\Scripts\activate.bat    # Windows

# Install editable + dev deps
pip install -e '.[dev,pdf]'

# Tests + coverage (≥85%)
pytest --cov=docuflow

# Lint
ruff check src tests
mypy src/docuflow

Test stack

  • pytest + pytest-cov

  • hypothesis (property-based tests for sandbox and outline)

  • All tests are isolated: tmp_path + Sandbox(roots=[tmp_path])

CI

GitHub Actions runs ruff, mypy --strict, pytest --cov on Python 3.11 / 3.12 / 3.13.


Roadmap

  • v0.2.0 (current): in-place editor, run-aware replace, template-aware writer, auto-style creation, inline styles, PDF reader.

  • v0.3.0: pptx (read/write/edit), track changes, comments.

  • v0.4.0: bulk operations (merge, split), benchmarks, performance budgets.

See CHANGELOG.md for the full version history.


License

MIT. See LICENSE.

Acknowledgements

A
license - permissive license
-
quality - not tested
B
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/deja111vu/docuflow'

If you have feedback or need assistance with the MCP directory API, please join our Discord server