Validate Tagged PDF

validate_tagged

Read-onlyIdempotent

Validate PDF/UA tagged structure requirements by checking document tagging, heading hierarchy, figure tags, table structure, and more. Identify missing or incorrect tag issues to assess PDF accessibility quality.

Instructions

Validate PDF/UA tagged structure requirements.

Args:

file_path (string): Absolute path to a local PDF file
response_format ('markdown' | 'json'): Output format (default: 'markdown')

Returns: Validation results including: whether the PDF is tagged, total checks performed, pass/fail counts, detailed issues with severity levels (error/warning/info), and a summary.

Checks performed:

Document marked as tagged
Structure tree root existence
Document root tag presence
Heading hierarchy (H1-H6) sequential order
Figure tags for images
Paragraph tag presence
Structure element count
Table tag structure (TR/TH/TD)

Examples:

Check if a PDF meets PDF/UA accessibility requirements
Identify missing or incorrect tag structure
Assess document accessibility quality

Input Schema

TableJSON Schema

Name	Required	Description	Default
`file_path`	Yes	Absolute path to a local PDF file (e.g., "/path/to/document.pdf")
`response_format`	No	Output format: "markdown" for human-readable, "json" for structured data	markdown

Implementation Reference

src/services/validation-service.ts:39-327 (handler)

Core handler function that performs PDF/UA tagged structure validation. It loads the document, analyzes tags via analyzeTagsFromDoc, counts images, then runs 8 checks: tagged status, structure tree root, Document tag, heading hierarchy, Figure tags, Paragraph tags, minimum element count, and Table tag structure. Returns a TaggedValidation result.

export async function validateTagged(filePath: string): Promise<TaggedValidation> {
  const issues: ValidationIssue[] = [];
  let totalChecks = 0;
  let passed = 0;
  let failed = 0;
  let warnings = 0;

  // ドキュメントを1回だけロードし、タグ解析と画像カウントで共有
  const doc = await loadDocument(filePath);
  let tagsAnalysis: Awaited<ReturnType<typeof analyzeTagsFromDoc>>;
  let imageCount: number;

  try {
    // タグ解析と画像カウントを並列実行
    [tagsAnalysis, imageCount] = await Promise.all([
      analyzeTagsFromDoc(doc),
      countImagesFromDoc(doc),
    ]);
  } finally {
    await doc.destroy();
  }

  // Check 1: Is the document tagged?
  totalChecks++;
  if (!tagsAnalysis.isTagged) {
    failed++;
    issues.push({
      severity: 'error',
      code: 'TAG-001',
      message: 'Document is not tagged',
      details:
        'The PDF does not have the Marked flag set in the MarkInfo dictionary. Tagged PDF is required for PDF/UA compliance.',
    });

    return {
      isTagged: false,
      totalChecks,
      passed,
      failed,
      warnings,
      issues,
      summary: 'Document is not tagged. PDF/UA validation cannot proceed.',
    };
  }
  passed++;
  issues.push({
    severity: 'info',
    code: 'TAG-001',
    message: 'Document is marked as tagged',
  });

  // Check 2: Structure tree root exists
  totalChecks++;
  if (!tagsAnalysis.rootTag) {
    failed++;
    issues.push({
      severity: 'error',
      code: 'TAG-002',
      message: 'No structure tree root found',
      details: 'The document is marked as tagged but has no StructTreeRoot.',
    });
  } else {
    passed++;
    issues.push({
      severity: 'info',
      code: 'TAG-002',
      message: 'Structure tree root exists',
    });
  }

  // Check 3: Document-level root tag
  totalChecks++;
  const hasDocumentTag = tagsAnalysis.roleCounts.Document !== undefined;
  if (!hasDocumentTag) {
    warnings++;
    issues.push({
      severity: 'warning',
      code: 'TAG-003',
      message: 'No Document root tag found',
      details: 'PDF/UA recommends a single Document tag as the root of the structure tree.',
    });
  } else {
    passed++;
    issues.push({
      severity: 'info',
      code: 'TAG-003',
      message: 'Document root tag present',
    });
  }

  // Check 4: Heading hierarchy
  totalChecks++;
  const headingLevels: number[] = [];
  for (let i = 1; i <= 6; i++) {
    if (tagsAnalysis.roleCounts[`H${i}`]) {
      headingLevels.push(i);
    }
  }
  // Also check generic H tag
  const hasGenericH = tagsAnalysis.roleCounts.H !== undefined;

  if (headingLevels.length === 0 && !hasGenericH) {
    warnings++;
    issues.push({
      severity: 'warning',
      code: 'TAG-004',
      message: 'No heading tags found',
      details:
        'Document has no heading tags (H1-H6 or H). Headings are recommended for document navigation.',
    });
  } else if (headingLevels.length > 0) {
    // Check for skipped levels
    let isSequential = true;
    if (headingLevels[0] !== 1) {
      isSequential = false;
    }
    for (let i = 1; i < headingLevels.length; i++) {
      if (headingLevels[i] - headingLevels[i - 1] > 1) {
        isSequential = false;
        break;
      }
    }

    if (!isSequential) {
      warnings++;
      issues.push({
        severity: 'warning',
        code: 'TAG-004',
        message: `Heading hierarchy has gaps: ${headingLevels.map((l) => `H${l}`).join(', ')}`,
        details:
          'Heading levels should be sequential without skipping levels (e.g., H1 → H2 → H3).',
      });
    } else {
      passed++;
      issues.push({
        severity: 'info',
        code: 'TAG-004',
        message: `Heading hierarchy is sequential: ${headingLevels.map((l) => `H${l}`).join(', ')}`,
      });
    }
  } else {
    passed++;
    issues.push({
      severity: 'info',
      code: 'TAG-004',
      message: 'Generic H tags used for headings',
    });
  }

  // Check 5: Figure tags for images
  totalChecks++;
  const figureCount = tagsAnalysis.roleCounts.Figure ?? 0;

  if (imageCount > 0 && figureCount === 0) {
    failed++;
    issues.push({
      severity: 'error',
      code: 'TAG-005',
      message: `Document has ${imageCount} image(s) but no Figure tags`,
      details: 'Images must be tagged as Figure with appropriate alt text for PDF/UA compliance.',
    });
  } else if (imageCount > 0 && figureCount < imageCount) {
    warnings++;
    issues.push({
      severity: 'warning',
      code: 'TAG-005',
      message: `${imageCount} image(s) found but only ${figureCount} Figure tag(s)`,
      details:
        'Some images may not be properly tagged. Decorative images should be marked as artifacts.',
    });
  } else if (imageCount > 0) {
    passed++;
    issues.push({
      severity: 'info',
      code: 'TAG-005',
      message: `${figureCount} Figure tag(s) for ${imageCount} image(s)`,
    });
  } else {
    passed++;
    issues.push({
      severity: 'info',
      code: 'TAG-005',
      message: 'No images detected; Figure tag check not applicable',
    });
  }

  // Check 6: Paragraph tags
  totalChecks++;
  const hasParagraphs = tagsAnalysis.roleCounts.P !== undefined;
  if (!hasParagraphs) {
    warnings++;
    issues.push({
      severity: 'warning',
      code: 'TAG-006',
      message: 'No P (Paragraph) tags found',
      details: 'Text content should be wrapped in P tags for proper reading order.',
    });
  } else {
    passed++;
    issues.push({
      severity: 'info',
      code: 'TAG-006',
      message: `${tagsAnalysis.roleCounts.P} Paragraph tag(s) found`,
    });
  }

  // Check 7: Minimum element count
  totalChecks++;
  if (tagsAnalysis.totalElements < 2) {
    warnings++;
    issues.push({
      severity: 'warning',
      code: 'TAG-007',
      message: `Only ${tagsAnalysis.totalElements} structure element(s) found`,
      details:
        'A properly tagged document should have multiple structure elements reflecting its content.',
    });
  } else {
    passed++;
    issues.push({
      severity: 'info',
      code: 'TAG-007',
      message: `${tagsAnalysis.totalElements} structure elements found`,
    });
  }

  // Check 8: Table tags
  totalChecks++;
  const tableCount = tagsAnalysis.roleCounts.Table ?? 0;
  const trCount = tagsAnalysis.roleCounts.TR ?? 0;
  const tdCount = tagsAnalysis.roleCounts.TD ?? 0;
  const thCount = tagsAnalysis.roleCounts.TH ?? 0;

  if (tableCount > 0) {
    if (trCount === 0) {
      warnings++;
      issues.push({
        severity: 'warning',
        code: 'TAG-008',
        message: `${tableCount} Table tag(s) but no TR (Table Row) tags`,
        details: 'Tables should contain TR, TH, and TD tags for proper structure.',
      });
    } else if (thCount === 0) {
      warnings++;
      issues.push({
        severity: 'warning',
        code: 'TAG-008',
        message: `Table has ${trCount} row(s) but no TH (Table Header) tags`,
        details: 'PDF/UA requires tables to have header cells (TH) for accessibility.',
      });
    } else {
      passed++;
      issues.push({
        severity: 'info',
        code: 'TAG-008',
        message: `Table structure: ${tableCount} table(s), ${trCount} row(s), ${thCount} header(s), ${tdCount} cell(s)`,
      });
    }
  } else {
    passed++;
    issues.push({
      severity: 'info',
      code: 'TAG-008',
      message: 'No Table tags found; table check not applicable',
    });
  }

  const errorCount = issues.filter((i) => i.severity === 'error').length;
  const warnCount = issues.filter((i) => i.severity === 'warning').length;

  let summary: string;
  if (errorCount === 0 && warnCount === 0) {
    summary = `All ${totalChecks} checks passed. Document appears well-structured for PDF/UA compliance.`;
  } else if (errorCount === 0) {
    summary = `${passed}/${totalChecks} checks passed with ${warnCount} warning(s). Review warnings for full PDF/UA compliance.`;
  } else {
    summary = `${passed}/${totalChecks} checks passed, ${errorCount} error(s), ${warnCount} warning(s). Critical issues must be resolved for PDF/UA compliance.`;
  }

  return {
    isTagged: true,
    totalChecks,
    passed,
    failed,
    warnings,
    issues,
    summary,
  };
}

src/tools/tier3/validate-tagged.ts:12-68 (handler)

MCP tool registration for 'validate_tagged'. Registers the tool with schema, metadata, and a handler that calls validateTagged service function, formats output as JSON or Markdown, and handles errors.

export function registerValidateTagged(server: McpServer): void {
  server.registerTool(
    'validate_tagged',
    {
      title: 'Validate Tagged PDF',
      description: `Validate PDF/UA tagged structure requirements.

Args:
  - file_path (string): Absolute path to a local PDF file
  - response_format ('markdown' | 'json'): Output format (default: 'markdown')

Returns:
  Validation results including: whether the PDF is tagged, total checks performed, pass/fail counts, detailed issues with severity levels (error/warning/info), and a summary.

Checks performed:
  - Document marked as tagged
  - Structure tree root existence
  - Document root tag presence
  - Heading hierarchy (H1-H6) sequential order
  - Figure tags for images
  - Paragraph tag presence
  - Structure element count
  - Table tag structure (TR/TH/TD)

Examples:
  - Check if a PDF meets PDF/UA accessibility requirements
  - Identify missing or incorrect tag structure
  - Assess document accessibility quality`,
      inputSchema: ValidateTaggedSchema,
      annotations: {
        readOnlyHint: true,
        destructiveHint: false,
        idempotentHint: true,
        openWorldHint: false,
      },
    },
    async (params: ValidateTaggedInput) => {
      try {
        const result = await validateTagged(params.file_path);

        const raw =
          params.response_format === ResponseFormat.JSON
            ? JSON.stringify(result, null, 2)
            : formatTaggedValidationMarkdown(result);

        const { text } = truncateIfNeeded(raw);
        return { content: [{ type: 'text' as const, text }] };
      } catch (error) {
        const err = handleStructuredError(error);
        return {
          content: [{ type: 'text' as const, text: JSON.stringify(err, null, 2) }],
          isError: true,
        };
      }
    },
  );
}

src/schemas/tier3.ts:9-14 (schema)

Zod schema for validate_tagged input: file_path (string) and response_format ('markdown' | 'json').

export const ValidateTaggedSchema = z
  .object({
    file_path: FilePathSchema,
    response_format: ResponseFormatSchema,
  })
  .strict();

src/types.ts:259-267 (schema)

TaggedValidation output type: isTagged, totalChecks, passed, failed, warnings, issues array, and summary string.

export interface TaggedValidation {
  isTagged: boolean;
  totalChecks: number;
  passed: number;
  failed: number;
  warnings: number;
  issues: ValidationIssue[];
  summary: string;
}

src/tools/index.ts:25-52 (registration)

Top-level registration: imports registerValidateTagged and calls it on line 49 within registerAllTools.

import { registerValidateTagged } from './tier3/validate-tagged.js';

/**
 * Register all tools with the MCP server.
 */
export function registerAllTools(server: McpServer): void {
  // Tier 1: Basic PDF operations
  registerGetPageCount(server);
  registerGetMetadata(server);
  registerReadText(server);
  registerSearchText(server);
  registerReadImages(server);
  registerReadUrl(server);
  registerSummarize(server);

  // Tier 2: Structure analysis
  registerInspectStructure(server);
  registerInspectTags(server);
  registerInspectFonts(server);
  registerInspectAnnotations(server);
  registerInspectSignatures(server);
  registerExtractTables(server);

  // Tier 3: Validation & analysis
  registerValidateTagged(server);
  registerValidateMetadata(server);
  registerCompareStructure(server);
}

@shuji-bonji/pdf-reader-mcp

Validate Tagged PDF

Instructions

Input Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API