Validate Tagged PDF
validate_taggedValidate PDF/UA tagged structure requirements by checking document tagging, heading hierarchy, figure tags, table structure, and more. Identify missing or incorrect tag issues to assess PDF accessibility quality.
Instructions
Validate PDF/UA tagged structure requirements.
Args:
file_path (string): Absolute path to a local PDF file
response_format ('markdown' | 'json'): Output format (default: 'markdown')
Returns: Validation results including: whether the PDF is tagged, total checks performed, pass/fail counts, detailed issues with severity levels (error/warning/info), and a summary.
Checks performed:
Document marked as tagged
Structure tree root existence
Document root tag presence
Heading hierarchy (H1-H6) sequential order
Figure tags for images
Paragraph tag presence
Structure element count
Table tag structure (TR/TH/TD)
Examples:
Check if a PDF meets PDF/UA accessibility requirements
Identify missing or incorrect tag structure
Assess document accessibility quality
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes | Absolute path to a local PDF file (e.g., "/path/to/document.pdf") | |
| response_format | No | Output format: "markdown" for human-readable, "json" for structured data | markdown |
Implementation Reference
- Core handler function that performs PDF/UA tagged structure validation. It loads the document, analyzes tags via analyzeTagsFromDoc, counts images, then runs 8 checks: tagged status, structure tree root, Document tag, heading hierarchy, Figure tags, Paragraph tags, minimum element count, and Table tag structure. Returns a TaggedValidation result.
export async function validateTagged(filePath: string): Promise<TaggedValidation> { const issues: ValidationIssue[] = []; let totalChecks = 0; let passed = 0; let failed = 0; let warnings = 0; // ドキュメントを1回だけロードし、タグ解析と画像カウントで共有 const doc = await loadDocument(filePath); let tagsAnalysis: Awaited<ReturnType<typeof analyzeTagsFromDoc>>; let imageCount: number; try { // タグ解析と画像カウントを並列実行 [tagsAnalysis, imageCount] = await Promise.all([ analyzeTagsFromDoc(doc), countImagesFromDoc(doc), ]); } finally { await doc.destroy(); } // Check 1: Is the document tagged? totalChecks++; if (!tagsAnalysis.isTagged) { failed++; issues.push({ severity: 'error', code: 'TAG-001', message: 'Document is not tagged', details: 'The PDF does not have the Marked flag set in the MarkInfo dictionary. Tagged PDF is required for PDF/UA compliance.', }); return { isTagged: false, totalChecks, passed, failed, warnings, issues, summary: 'Document is not tagged. PDF/UA validation cannot proceed.', }; } passed++; issues.push({ severity: 'info', code: 'TAG-001', message: 'Document is marked as tagged', }); // Check 2: Structure tree root exists totalChecks++; if (!tagsAnalysis.rootTag) { failed++; issues.push({ severity: 'error', code: 'TAG-002', message: 'No structure tree root found', details: 'The document is marked as tagged but has no StructTreeRoot.', }); } else { passed++; issues.push({ severity: 'info', code: 'TAG-002', message: 'Structure tree root exists', }); } // Check 3: Document-level root tag totalChecks++; const hasDocumentTag = tagsAnalysis.roleCounts.Document !== undefined; if (!hasDocumentTag) { warnings++; issues.push({ severity: 'warning', code: 'TAG-003', message: 'No Document root tag found', details: 'PDF/UA recommends a single Document tag as the root of the structure tree.', }); } else { passed++; issues.push({ severity: 'info', code: 'TAG-003', message: 'Document root tag present', }); } // Check 4: Heading hierarchy totalChecks++; const headingLevels: number[] = []; for (let i = 1; i <= 6; i++) { if (tagsAnalysis.roleCounts[`H${i}`]) { headingLevels.push(i); } } // Also check generic H tag const hasGenericH = tagsAnalysis.roleCounts.H !== undefined; if (headingLevels.length === 0 && !hasGenericH) { warnings++; issues.push({ severity: 'warning', code: 'TAG-004', message: 'No heading tags found', details: 'Document has no heading tags (H1-H6 or H). Headings are recommended for document navigation.', }); } else if (headingLevels.length > 0) { // Check for skipped levels let isSequential = true; if (headingLevels[0] !== 1) { isSequential = false; } for (let i = 1; i < headingLevels.length; i++) { if (headingLevels[i] - headingLevels[i - 1] > 1) { isSequential = false; break; } } if (!isSequential) { warnings++; issues.push({ severity: 'warning', code: 'TAG-004', message: `Heading hierarchy has gaps: ${headingLevels.map((l) => `H${l}`).join(', ')}`, details: 'Heading levels should be sequential without skipping levels (e.g., H1 → H2 → H3).', }); } else { passed++; issues.push({ severity: 'info', code: 'TAG-004', message: `Heading hierarchy is sequential: ${headingLevels.map((l) => `H${l}`).join(', ')}`, }); } } else { passed++; issues.push({ severity: 'info', code: 'TAG-004', message: 'Generic H tags used for headings', }); } // Check 5: Figure tags for images totalChecks++; const figureCount = tagsAnalysis.roleCounts.Figure ?? 0; if (imageCount > 0 && figureCount === 0) { failed++; issues.push({ severity: 'error', code: 'TAG-005', message: `Document has ${imageCount} image(s) but no Figure tags`, details: 'Images must be tagged as Figure with appropriate alt text for PDF/UA compliance.', }); } else if (imageCount > 0 && figureCount < imageCount) { warnings++; issues.push({ severity: 'warning', code: 'TAG-005', message: `${imageCount} image(s) found but only ${figureCount} Figure tag(s)`, details: 'Some images may not be properly tagged. Decorative images should be marked as artifacts.', }); } else if (imageCount > 0) { passed++; issues.push({ severity: 'info', code: 'TAG-005', message: `${figureCount} Figure tag(s) for ${imageCount} image(s)`, }); } else { passed++; issues.push({ severity: 'info', code: 'TAG-005', message: 'No images detected; Figure tag check not applicable', }); } // Check 6: Paragraph tags totalChecks++; const hasParagraphs = tagsAnalysis.roleCounts.P !== undefined; if (!hasParagraphs) { warnings++; issues.push({ severity: 'warning', code: 'TAG-006', message: 'No P (Paragraph) tags found', details: 'Text content should be wrapped in P tags for proper reading order.', }); } else { passed++; issues.push({ severity: 'info', code: 'TAG-006', message: `${tagsAnalysis.roleCounts.P} Paragraph tag(s) found`, }); } // Check 7: Minimum element count totalChecks++; if (tagsAnalysis.totalElements < 2) { warnings++; issues.push({ severity: 'warning', code: 'TAG-007', message: `Only ${tagsAnalysis.totalElements} structure element(s) found`, details: 'A properly tagged document should have multiple structure elements reflecting its content.', }); } else { passed++; issues.push({ severity: 'info', code: 'TAG-007', message: `${tagsAnalysis.totalElements} structure elements found`, }); } // Check 8: Table tags totalChecks++; const tableCount = tagsAnalysis.roleCounts.Table ?? 0; const trCount = tagsAnalysis.roleCounts.TR ?? 0; const tdCount = tagsAnalysis.roleCounts.TD ?? 0; const thCount = tagsAnalysis.roleCounts.TH ?? 0; if (tableCount > 0) { if (trCount === 0) { warnings++; issues.push({ severity: 'warning', code: 'TAG-008', message: `${tableCount} Table tag(s) but no TR (Table Row) tags`, details: 'Tables should contain TR, TH, and TD tags for proper structure.', }); } else if (thCount === 0) { warnings++; issues.push({ severity: 'warning', code: 'TAG-008', message: `Table has ${trCount} row(s) but no TH (Table Header) tags`, details: 'PDF/UA requires tables to have header cells (TH) for accessibility.', }); } else { passed++; issues.push({ severity: 'info', code: 'TAG-008', message: `Table structure: ${tableCount} table(s), ${trCount} row(s), ${thCount} header(s), ${tdCount} cell(s)`, }); } } else { passed++; issues.push({ severity: 'info', code: 'TAG-008', message: 'No Table tags found; table check not applicable', }); } const errorCount = issues.filter((i) => i.severity === 'error').length; const warnCount = issues.filter((i) => i.severity === 'warning').length; let summary: string; if (errorCount === 0 && warnCount === 0) { summary = `All ${totalChecks} checks passed. Document appears well-structured for PDF/UA compliance.`; } else if (errorCount === 0) { summary = `${passed}/${totalChecks} checks passed with ${warnCount} warning(s). Review warnings for full PDF/UA compliance.`; } else { summary = `${passed}/${totalChecks} checks passed, ${errorCount} error(s), ${warnCount} warning(s). Critical issues must be resolved for PDF/UA compliance.`; } return { isTagged: true, totalChecks, passed, failed, warnings, issues, summary, }; } - src/tools/tier3/validate-tagged.ts:12-68 (handler)MCP tool registration for 'validate_tagged'. Registers the tool with schema, metadata, and a handler that calls validateTagged service function, formats output as JSON or Markdown, and handles errors.
export function registerValidateTagged(server: McpServer): void { server.registerTool( 'validate_tagged', { title: 'Validate Tagged PDF', description: `Validate PDF/UA tagged structure requirements. Args: - file_path (string): Absolute path to a local PDF file - response_format ('markdown' | 'json'): Output format (default: 'markdown') Returns: Validation results including: whether the PDF is tagged, total checks performed, pass/fail counts, detailed issues with severity levels (error/warning/info), and a summary. Checks performed: - Document marked as tagged - Structure tree root existence - Document root tag presence - Heading hierarchy (H1-H6) sequential order - Figure tags for images - Paragraph tag presence - Structure element count - Table tag structure (TR/TH/TD) Examples: - Check if a PDF meets PDF/UA accessibility requirements - Identify missing or incorrect tag structure - Assess document accessibility quality`, inputSchema: ValidateTaggedSchema, annotations: { readOnlyHint: true, destructiveHint: false, idempotentHint: true, openWorldHint: false, }, }, async (params: ValidateTaggedInput) => { try { const result = await validateTagged(params.file_path); const raw = params.response_format === ResponseFormat.JSON ? JSON.stringify(result, null, 2) : formatTaggedValidationMarkdown(result); const { text } = truncateIfNeeded(raw); return { content: [{ type: 'text' as const, text }] }; } catch (error) { const err = handleStructuredError(error); return { content: [{ type: 'text' as const, text: JSON.stringify(err, null, 2) }], isError: true, }; } }, ); } - src/schemas/tier3.ts:9-14 (schema)Zod schema for validate_tagged input: file_path (string) and response_format ('markdown' | 'json').
export const ValidateTaggedSchema = z .object({ file_path: FilePathSchema, response_format: ResponseFormatSchema, }) .strict(); - src/types.ts:259-267 (schema)TaggedValidation output type: isTagged, totalChecks, passed, failed, warnings, issues array, and summary string.
export interface TaggedValidation { isTagged: boolean; totalChecks: number; passed: number; failed: number; warnings: number; issues: ValidationIssue[]; summary: string; } - src/tools/index.ts:25-52 (registration)Top-level registration: imports registerValidateTagged and calls it on line 49 within registerAllTools.
import { registerValidateTagged } from './tier3/validate-tagged.js'; /** * Register all tools with the MCP server. */ export function registerAllTools(server: McpServer): void { // Tier 1: Basic PDF operations registerGetPageCount(server); registerGetMetadata(server); registerReadText(server); registerSearchText(server); registerReadImages(server); registerReadUrl(server); registerSummarize(server); // Tier 2: Structure analysis registerInspectStructure(server); registerInspectTags(server); registerInspectFonts(server); registerInspectAnnotations(server); registerInspectSignatures(server); registerExtractTables(server); // Tier 3: Validation & analysis registerValidateTagged(server); registerValidateMetadata(server); registerCompareStructure(server); }