Skip to main content
Glama

extract_revisions

Read-only

Retrieve tracked changes and comments from a DOCX file as structured JSON with before and after text per paragraph. Supports pagination via offset and limit.

Instructions

Extract tracked changes as structured JSON with before/after text per paragraph, revision details, and comments. Supports pagination via offset and limit. Read-only - does not modify the document.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYesPath to the DOCX file.
offsetNo0-based offset for pagination. Default: 0.
limitNoMax entries per page (1-500). Default: 50.

Implementation Reference

  • MCP tool handler for 'extract_revisions'. Resolves the session, validates limit/offset parameters, delegates to extractRevisions() from @usejunior/docx-core, caches results, applies pagination, and returns structured revision data with metadata.
    export async function extractRevisions_tool(
      manager: SessionManager,
      params: {
        file_path?: string;
        offset?: number;
        limit?: number;
      },
    ): Promise<ToolResponse> {
      const resolved = await resolveSessionForTool(manager, params, { toolName: 'extract_revisions' });
      if (!resolved.ok) return resolved.response;
      const { session, metadata } = resolved;
    
      // Validate limit
      const limit = params.limit ?? 50;
      if (typeof limit !== 'number' || limit < 1 || limit > 500) {
        return err('INVALID_LIMIT', `limit must be between 1 and 500, got ${limit}`, 'Provide a limit in the range 1–500.');
      }
    
      // Validate offset
      const offset = params.offset ?? 0;
      if (typeof offset !== 'number' || offset < 0) {
        return err('INVALID_OFFSET', `offset must be >= 0, got ${offset}`, 'Provide a non-negative offset.');
      }
    
      try {
        // Check extraction cache
        const cached = manager.getExtractionCache(session);
        let allChanges;
    
        if (cached) {
          allChanges = cached.changes;
        } else {
          // Compute extraction from DOM clones
          const docClone = session.doc.getDocumentXmlClone();
          const comments = await session.doc.getComments();
          const result = extractRevisions(docClone, comments);
          allChanges = result.changes;
          // Cache the full result for pagination
          manager.setExtractionCache(session, allChanges);
        }
    
        // Apply pagination
        const totalChanges = allChanges.length;
        const page = allChanges.slice(offset, offset + limit);
        const hasMore = offset + limit < totalChanges;
    
        return ok(mergeSessionResolutionMetadata({
          changes: page,
          total_changes: totalChanges,
          has_more: hasMore,
          edit_revision: session.editRevision,
          file_path: manager.normalizePath(session.originalPath),
        }, metadata));
      } catch (e: unknown) {
        return err('EXTRACTION_ERROR', errorMessage(e));
      }
    }
  • Core implementation of extractRevisions(). Clones the DOM, applies acceptChanges/rejectChanges to get before/after text per paragraph, collects revision entries (INSERTION, DELETION, MOVE_FROM, MOVE_TO, FORMAT_CHANGE), associates comments, and applies pagination.
    export function extractRevisions(
      doc: Document,
      comments: Comment[],
      opts?: { offset?: number; limit?: number },
    ): ExtractRevisionsResult {
      const body = doc.getElementsByTagNameNS(W_NS, 'body').item(0);
      if (!body) {
        return { changes: [], total_changes: 0, has_more: false };
      }
    
      // Clone DOM twice and apply accept/reject
      const acceptedDoc = doc.cloneNode(true) as Document;
      const rejectedDoc = doc.cloneNode(true) as Document;
      acceptChanges(acceptedDoc);
      rejectChanges(rejectedDoc);
    
      // Build comment lookup by anchoredParagraphId
      const commentsByParaId = new Map<string, Comment[]>();
      for (const c of comments) {
        if (c.anchoredParagraphId) {
          const existing = commentsByParaId.get(c.anchoredParagraphId);
          if (existing) {
            existing.push(c);
          } else {
            commentsByParaId.set(c.anchoredParagraphId, [c]);
          }
        }
      }
    
      // Walk all paragraphs in the original tracked DOM
      const allParagraphs = Array.from(body.getElementsByTagNameNS(W_NS, 'p'));
      const changedParagraphs: ParagraphRevision[] = [];
    
      for (const p of allParagraphs) {
        if (!paragraphHasRevisions(p)) continue;
    
        const paraId = getParagraphBookmarkId(p);
        if (!paraId) continue; // All paragraphs should have bookmarks from session resolution
    
        // Detect entirely-inserted/deleted paragraphs to avoid stale bookmark lookups.
        // When rejectChanges() removes an inserted paragraph, it relocates bookmarks
        // to adjacent paragraphs, which would give the wrong before_text.
        const isFullyInserted = paragraphIsEntirelyInserted(p);
        const isFullyDeleted = paragraphIsEntirelyDeleted(p);
    
        // Look up before_text in rejected clone by bookmark
        let beforeText: string;
        if (isFullyInserted) {
          beforeText = ''; // Didn't exist before
        } else {
          const rejectedP = findParagraphByBookmarkId(rejectedDoc, paraId);
          beforeText = rejectedP ? getParagraphText(rejectedP) : '';
        }
    
        // Look up after_text in accepted clone by bookmark
        let afterText: string;
        if (isFullyDeleted) {
          afterText = ''; // Doesn't exist after
        } else {
          const acceptedP = findParagraphByBookmarkId(acceptedDoc, paraId);
          afterText = acceptedP ? getParagraphText(acceptedP) : '';
        }
    
        // Collect revision entries
        const revisions = collectRevisionEntries(p);
    
        // Skip structurally-empty paragraphs with only paragraph-level markers
        // (e.g. empty inserted paragraphs from comparison engines with pPr/rPr/ins only)
        if (revisions.length === 0 && beforeText === '' && afterText === '') continue;
    
        // Associate comments
        const paraComments = commentsByParaId.get(paraId) ?? [];
        const revisionComments = paraComments.map(commentToRevisionComment);
    
        changedParagraphs.push({
          para_id: paraId,
          before_text: beforeText,
          after_text: afterText,
          revisions,
          comments: revisionComments,
        });
      }
    
      // Apply pagination
      const totalChanges = changedParagraphs.length;
      const offset = opts?.offset ?? 0;
      const limit = opts?.limit ?? totalChanges;
      const page = changedParagraphs.slice(offset, offset + limit);
      const hasMore = offset + limit < totalChanges;
    
      return {
        changes: page,
        total_changes: totalChanges,
        has_more: hasMore,
      };
    }
  • Type definitions for the extract_revisions tool: RevisionType, RevisionEntry, RevisionComment, ParagraphRevision, and ExtractRevisionsResult.
    export type RevisionType = 'INSERTION' | 'DELETION' | 'MOVE_FROM' | 'MOVE_TO' | 'FORMAT_CHANGE';
    
    export type RevisionEntry = {
      type: RevisionType;
      text: string;
      author: string;
    };
    
    export type RevisionComment = {
      author: string;
      text: string;
      date: string | null;
      replies?: RevisionComment[];
    };
    
    export type ParagraphRevision = {
      para_id: string;
      before_text: string;
      after_text: string;
      revisions: RevisionEntry[];
      comments: RevisionComment[];
    };
    
    export type ExtractRevisionsResult = {
      changes: ParagraphRevision[];
      total_changes: number;
      has_more: boolean;
    };
  • Tool catalog registration for 'extract_revisions' with description, input schema (file_path, offset, limit), and annotations (readOnlyHint: true).
    {
      name: 'extract_revisions',
      description:
        'Extract tracked changes as structured JSON with before/after text per paragraph, revision details, and comments. Supports pagination via offset and limit. Read-only - does not modify the document.',
      input: z.object({
        ...FILE_FIELD,
        offset: z.number().optional().describe('0-based offset for pagination. Default: 0.'),
        limit: z.number().optional().describe('Max entries per page (1-500). Default: 50.'),
      }),
      annotations: { readOnlyHint: true, destructiveHint: false },
    },
  • Import statement for extractRevisions_tool from the tool module.
    import { extractRevisions_tool } from './tools/extract_revisions.js';
  • Case branch in the MCP server dispatcher that routes 'extract_revisions' calls to extractRevisions_tool.
    case 'extract_revisions':
      return await extractRevisions_tool(sessions, args as Parameters<typeof extractRevisions_tool>[1]);
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint=true), the description adds output structure details and pagination support, but does not disclose rate limits or error behavior. The read-only confirmation is consistent with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no waste: first sentence defines purpose and output, second adds pagination and read-only nature. Front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description appropriately describes output format. Parameters are covered; pagination limits and read-only nature are stated. No major gaps for a read-only tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description repeats pagination via offset/limit but does not add new semantic meaning beyond the schema's parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool extracts tracked changes as structured JSON with specific elements (before/after text, revision details, comments), making its purpose distinct from siblings like accept_changes or get_comments.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or when-not-to-use guidance is given. The read-only note helps distinguish from write tools, but alternatives are not mentioned, leaving the agent to infer context from the name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/UseJunior/safe-docx'

If you have feedback or need assistance with the MCP directory API, please join our Discord server