Skip to main content
Glama

diffgrab

PyPI Python License

한국어 문서 · llms.txt

Web page change tracking with structured diffs. markgrab + snapgrab integration, MCP native.

from diffgrab import DiffTracker

tracker = DiffTracker()
await tracker.track("https://example.com")
changes = await tracker.check()
for c in changes:
    if c.changed:
        print(c.summary)     # "3 lines added, 1 lines removed in sections: Introduction."
        print(c.unified_diff) # Standard unified diff output
await tracker.close()

Features

  • Change detection — track any URL, detect content changes via content hashing

  • Structured diffs — unified diff + section-level analysis (which headings changed)

  • Human-readable summaries — "5 lines added, 2 removed in sections: Intro, Methods"

  • Snapshot history — SQLite storage, browse past versions of any page

  • markgrab powered — HTML/YouTube/PDF/DOCX extraction via markgrab

  • Visual diff — optional screenshot comparison via snapgrab

  • MCP server — 5 tools for Claude Code / MCP clients

  • CLI includeddiffgrab track, check, diff, history, untrack

Related MCP server: Competitor Website Change Monitor

How It Works

flowchart TD
    A["diffgrab track URL"] --> B["Fetch initial snapshot\n(markgrab + snapgrab)"]
    B --> C["Store baseline"]
    C --> D["diffgrab check"]
    D --> E["Fetch current page"]
    E --> F{"Content\nhash match?"}
    F -->|"changed"| G["Compute structured diff\n+ section analysis"]
    F -->|"unchanged"| H["No changes"]
    G --> I["📊 DiffResult\nadded / removed / modified"]

Install

pip install diffgrab

Optional extras:

pip install 'diffgrab[cli]'      # CLI with click + rich
pip install 'diffgrab[visual]'   # Visual diff with snapgrab
pip install 'diffgrab[mcp]'      # MCP server with fastmcp
pip install 'diffgrab[all]'      # Everything

Usage

Python API

import asyncio
from diffgrab import DiffTracker

async def main():
    tracker = DiffTracker()

    # Track a URL (takes initial snapshot)
    await tracker.track("https://example.com", interval_hours=12)

    # Check for changes
    changes = await tracker.check()
    for change in changes:
        if change.changed:
            print(change.summary)
            print(change.unified_diff)

    # Get diff between specific snapshots
    result = await tracker.diff("https://example.com", before_id=1, after_id=2)

    # Browse snapshot history
    history = await tracker.history("https://example.com", count=20)

    # Stop tracking
    await tracker.untrack("https://example.com")

    await tracker.close()

asyncio.run(main())

Convenience Functions

from diffgrab import track, check, diff, history, untrack

await track("https://example.com")
changes = await check()
result = await diff("https://example.com")
snaps = await history("https://example.com")
await untrack("https://example.com")

CLI

# Track a URL
diffgrab track https://example.com --interval 12

# Check all tracked URLs for changes
diffgrab check

# Check a specific URL
diffgrab check https://example.com

# Show diff between snapshots
diffgrab diff https://example.com
diffgrab diff https://example.com --before 1 --after 3

# View snapshot history
diffgrab history https://example.com --count 20

# Stop tracking
diffgrab untrack https://example.com

MCP Server

Add to your Claude Code MCP config:

{
  "mcpServers": {
    "diffgrab": {
      "command": "diffgrab-mcp",
      "args": []
    }
  }
}

Or with uvx:

{
  "mcpServers": {
    "diffgrab": {
      "command": "uvx",
      "args": ["--from", "diffgrab[mcp]", "diffgrab-mcp"]
    }
  }
}

MCP Tools:

Tool

Description

track_url

Register a URL for change tracking

check_changes

Check tracked URLs for changes

get_diff

Get structured diff between snapshots

get_history

Browse snapshot history

untrack_url

Stop tracking a URL

DiffResult

Every diff operation returns a DiffResult:

@dataclass
class DiffResult:
    url: str                           # The tracked URL
    changed: bool                      # Whether content changed
    added_lines: int                   # Lines added
    removed_lines: int                 # Lines removed
    changed_sections: list[str]        # Markdown headings with changes
    unified_diff: str                  # Standard unified diff
    summary: str                       # Human-readable summary
    before_snapshot_id: int | None     # DB ID of older snapshot
    after_snapshot_id: int | None      # DB ID of newer snapshot
    before_timestamp: str              # When older snapshot was taken
    after_timestamp: str               # When newer snapshot was taken

Storage

Snapshots are stored in SQLite at ~/.local/share/diffgrab/diffgrab.db (auto-created). Custom path:

tracker = DiffTracker(db_path="/path/to/custom.db")

QuartzUnit Ecosystem

Package

Role

PyPI

markgrab

HTML/YouTube/PDF/DOCX to markdown

pip install markgrab

snapgrab

URL to screenshot + metadata

pip install snapgrab

docpick

OCR + LLM document extraction

pip install docpick

feedkit

RSS feed collection

pip install feedkit

diffgrab

Web page change tracking

pip install diffgrab

browsegrab

Browser agent for LLMs

Coming soon

Used in

  • newswatch — RSS news monitoring pipeline (feedkit → markgrab → embgrep → diffgrab)

  • watchdeck — Web page monitoring with visual diffs and safety guards

License

MIT


Part of the QuartzUnit ecosystem — composable Python libraries for data collection, extraction, search, and AI agent safety.

A
license - permissive license
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/QuartzUnit/diffgrab'

If you have feedback or need assistance with the MCP directory API, please join our Discord server