Skip to main content
Glama

extract_revisions

Read-only

Extract tracked changes and comments from DOCX files as structured JSON with before/after text per paragraph and revision details. Supports pagination for large documents without modifying the original file.

Instructions

Extract tracked changes as structured JSON with before/after text per paragraph, revision details, and comments. Supports pagination via offset and limit. Read-only - does not modify the document.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYesPath to the DOCX file.
offsetNo0-based offset for pagination. Default: 0.
limitNoMax entries per page (1-500). Default: 50.

Implementation Reference

  • The tool handler for "extract_revisions", which resolves the session, performs input validation, utilizes the core library to extract revisions, and handles pagination.
    export async function extractRevisions_tool(
      manager: SessionManager,
      params: {
        file_path?: string;
        offset?: number;
        limit?: number;
      },
    ): Promise<ToolResponse> {
      const resolved = await resolveSessionForTool(manager, params, { toolName: 'extract_revisions' });
      if (!resolved.ok) return resolved.response;
      const { session, metadata } = resolved;
    
      // Validate limit
      const limit = params.limit ?? 50;
      if (typeof limit !== 'number' || limit < 1 || limit > 500) {
        return err('INVALID_LIMIT', `limit must be between 1 and 500, got ${limit}`, 'Provide a limit in the range 1–500.');
      }
    
      // Validate offset
      const offset = params.offset ?? 0;
      if (typeof offset !== 'number' || offset < 0) {
        return err('INVALID_OFFSET', `offset must be >= 0, got ${offset}`, 'Provide a non-negative offset.');
      }
    
      try {
        // Check extraction cache
        const cached = manager.getExtractionCache(session);
        let allChanges;
    
        if (cached) {
          allChanges = cached.changes;
        } else {
          // Compute extraction from DOM clones
          const docClone = session.doc.getDocumentXmlClone();
          const comments = await session.doc.getComments();
          const result = extractRevisions(docClone, comments);
          allChanges = result.changes;
          // Cache the full result for pagination
          manager.setExtractionCache(session, allChanges);
        }
    
        // Apply pagination
        const totalChanges = allChanges.length;
        const page = allChanges.slice(offset, offset + limit);
        const hasMore = offset + limit < totalChanges;
    
        return ok(mergeSessionResolutionMetadata({
          changes: page,
          total_changes: totalChanges,
          has_more: hasMore,
          edit_revision: session.editRevision,
          file_path: manager.normalizePath(session.originalPath),
        }, metadata));
      } catch (e: unknown) {
        return err('EXTRACTION_ERROR', errorMessage(e));
      }
    }
  • The core logic for extracting revisions by comparing cloned documents with changes accepted vs rejected.
    export function extractRevisions(
      doc: Document,
      comments: Comment[],
      opts?: { offset?: number; limit?: number },
    ): ExtractRevisionsResult {
      const body = doc.getElementsByTagNameNS(W_NS, 'body').item(0);
      if (!body) {
        return { changes: [], total_changes: 0, has_more: false };
      }
    
      // Clone DOM twice and apply accept/reject
      const acceptedDoc = doc.cloneNode(true) as Document;
      const rejectedDoc = doc.cloneNode(true) as Document;
      acceptChanges(acceptedDoc);
      rejectChanges(rejectedDoc);
    
      // Build comment lookup by anchoredParagraphId
      const commentsByParaId = new Map<string, Comment[]>();
      for (const c of comments) {
        if (c.anchoredParagraphId) {
          const existing = commentsByParaId.get(c.anchoredParagraphId);
          if (existing) {
            existing.push(c);
          } else {
            commentsByParaId.set(c.anchoredParagraphId, [c]);
          }
        }
      }
    
      // Walk all paragraphs in the original tracked DOM
      const allParagraphs = Array.from(body.getElementsByTagNameNS(W_NS, 'p'));
      const changedParagraphs: ParagraphRevision[] = [];
    
      for (const p of allParagraphs) {
        if (!paragraphHasRevisions(p)) continue;
    
        const paraId = getParagraphBookmarkId(p);
        if (!paraId) continue; // All paragraphs should have bookmarks from session resolution
    
        // Detect entirely-inserted/deleted paragraphs to avoid stale bookmark lookups.
        // When rejectChanges() removes an inserted paragraph, it relocates bookmarks
        // to adjacent paragraphs, which would give the wrong before_text.
        const isFullyInserted = paragraphIsEntirelyInserted(p);
        const isFullyDeleted = paragraphIsEntirelyDeleted(p);
    
        // Look up before_text in rejected clone by bookmark
        let beforeText: string;
        if (isFullyInserted) {
          beforeText = ''; // Didn't exist before
        } else {
          const rejectedP = findParagraphByBookmarkId(rejectedDoc, paraId);
          beforeText = rejectedP ? getParagraphText(rejectedP) : '';
        }
    
        // Look up after_text in accepted clone by bookmark
        let afterText: string;
        if (isFullyDeleted) {
          afterText = ''; // Doesn't exist after
        } else {
          const acceptedP = findParagraphByBookmarkId(acceptedDoc, paraId);
          afterText = acceptedP ? getParagraphText(acceptedP) : '';
        }
    
        // Collect revision entries
        const revisions = collectRevisionEntries(p);
    
        // Skip structurally-empty paragraphs with only paragraph-level markers
        // (e.g. empty inserted paragraphs from comparison engines with pPr/rPr/ins only)
        if (revisions.length === 0 && beforeText === '' && afterText === '') continue;
    
        // Associate comments
        const paraComments = commentsByParaId.get(paraId) ?? [];
        const revisionComments = paraComments.map(commentToRevisionComment);
    
        changedParagraphs.push({
          para_id: paraId,
          before_text: beforeText,
          after_text: afterText,
          revisions,
          comments: revisionComments,
        });
      }
    
      // Apply pagination
      const totalChanges = changedParagraphs.length;
      const offset = opts?.offset ?? 0;
      const limit = opts?.limit ?? totalChanges;
      const page = changedParagraphs.slice(offset, offset + limit);
      const hasMore = offset + limit < totalChanges;
    
      return {
        changes: page,
        total_changes: totalChanges,
        has_more: hasMore,
      };
    }
  • Type definition for the result returned by extractRevisions.
    export type ExtractRevisionsResult = {
      changes: ParagraphRevision[];
      total_changes: number;
      has_more: boolean;
    };
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description confirms the read-only safety profile (consistent with annotations) and crucially discloses output structure details (before/after text per paragraph, comments) compensating for the missing output schema. It also clarifies pagination behavior. Could improve by stating behavior when no tracked changes exist.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences efficiently structured: core functionality and output format first, pagination mechanics second, safety confirmation third. No redundancy or filler. Every sentence earns its place by conveying critical information not fully captured in structured fields.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of output schema, the description effectively compensates by detailing the JSON structure (before/after text, revision details, comments). Combined with complete schema coverage and annotations, the description provides sufficient context for an agent to understand both inputs and expected returns.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema fully documents file_path, offset, and limit. The description adds minimal semantic value beyond noting that offset and limit support pagination, which is largely inferable from the parameter names and schema descriptions. Baseline 3 is appropriate given schema carries the burden.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action (extract), resource (tracked changes), output format (structured JSON), and key content components (before/after text, revision details, comments). It distinguishes from siblings like accept_changes (modifies state) and has_tracked_changes (boolean check) by emphasizing extraction and analysis of change content.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions pagination support, implying usage patterns for large documents. However, it lacks explicit guidance on when to use this versus siblings like has_tracked_changes (existence check) or accept_changes (applying changes), or prerequisites like requiring an open file.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/UseJunior/safe-docx'

If you have feedback or need assistance with the MCP directory API, please join our Discord server