Skip to main content
Glama

Validate Tagged PDF

validate_tagged
Read-onlyIdempotent

Validate PDF/UA tagged structure requirements by checking document tagging, heading hierarchy, figure tags, table structure, and more. Identify missing or incorrect tag issues to assess PDF accessibility quality.

Instructions

Validate PDF/UA tagged structure requirements.

Args:

  • file_path (string): Absolute path to a local PDF file

  • response_format ('markdown' | 'json'): Output format (default: 'markdown')

Returns: Validation results including: whether the PDF is tagged, total checks performed, pass/fail counts, detailed issues with severity levels (error/warning/info), and a summary.

Checks performed:

  • Document marked as tagged

  • Structure tree root existence

  • Document root tag presence

  • Heading hierarchy (H1-H6) sequential order

  • Figure tags for images

  • Paragraph tag presence

  • Structure element count

  • Table tag structure (TR/TH/TD)

Examples:

  • Check if a PDF meets PDF/UA accessibility requirements

  • Identify missing or incorrect tag structure

  • Assess document accessibility quality

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYesAbsolute path to a local PDF file (e.g., "/path/to/document.pdf")
response_formatNoOutput format: "markdown" for human-readable, "json" for structured datamarkdown

Implementation Reference

  • Core handler function that performs PDF/UA tagged structure validation. It loads the document, analyzes tags via analyzeTagsFromDoc, counts images, then runs 8 checks: tagged status, structure tree root, Document tag, heading hierarchy, Figure tags, Paragraph tags, minimum element count, and Table tag structure. Returns a TaggedValidation result.
    export async function validateTagged(filePath: string): Promise<TaggedValidation> {
      const issues: ValidationIssue[] = [];
      let totalChecks = 0;
      let passed = 0;
      let failed = 0;
      let warnings = 0;
    
      // ドキュメントを1回だけロードし、タグ解析と画像カウントで共有
      const doc = await loadDocument(filePath);
      let tagsAnalysis: Awaited<ReturnType<typeof analyzeTagsFromDoc>>;
      let imageCount: number;
    
      try {
        // タグ解析と画像カウントを並列実行
        [tagsAnalysis, imageCount] = await Promise.all([
          analyzeTagsFromDoc(doc),
          countImagesFromDoc(doc),
        ]);
      } finally {
        await doc.destroy();
      }
    
      // Check 1: Is the document tagged?
      totalChecks++;
      if (!tagsAnalysis.isTagged) {
        failed++;
        issues.push({
          severity: 'error',
          code: 'TAG-001',
          message: 'Document is not tagged',
          details:
            'The PDF does not have the Marked flag set in the MarkInfo dictionary. Tagged PDF is required for PDF/UA compliance.',
        });
    
        return {
          isTagged: false,
          totalChecks,
          passed,
          failed,
          warnings,
          issues,
          summary: 'Document is not tagged. PDF/UA validation cannot proceed.',
        };
      }
      passed++;
      issues.push({
        severity: 'info',
        code: 'TAG-001',
        message: 'Document is marked as tagged',
      });
    
      // Check 2: Structure tree root exists
      totalChecks++;
      if (!tagsAnalysis.rootTag) {
        failed++;
        issues.push({
          severity: 'error',
          code: 'TAG-002',
          message: 'No structure tree root found',
          details: 'The document is marked as tagged but has no StructTreeRoot.',
        });
      } else {
        passed++;
        issues.push({
          severity: 'info',
          code: 'TAG-002',
          message: 'Structure tree root exists',
        });
      }
    
      // Check 3: Document-level root tag
      totalChecks++;
      const hasDocumentTag = tagsAnalysis.roleCounts.Document !== undefined;
      if (!hasDocumentTag) {
        warnings++;
        issues.push({
          severity: 'warning',
          code: 'TAG-003',
          message: 'No Document root tag found',
          details: 'PDF/UA recommends a single Document tag as the root of the structure tree.',
        });
      } else {
        passed++;
        issues.push({
          severity: 'info',
          code: 'TAG-003',
          message: 'Document root tag present',
        });
      }
    
      // Check 4: Heading hierarchy
      totalChecks++;
      const headingLevels: number[] = [];
      for (let i = 1; i <= 6; i++) {
        if (tagsAnalysis.roleCounts[`H${i}`]) {
          headingLevels.push(i);
        }
      }
      // Also check generic H tag
      const hasGenericH = tagsAnalysis.roleCounts.H !== undefined;
    
      if (headingLevels.length === 0 && !hasGenericH) {
        warnings++;
        issues.push({
          severity: 'warning',
          code: 'TAG-004',
          message: 'No heading tags found',
          details:
            'Document has no heading tags (H1-H6 or H). Headings are recommended for document navigation.',
        });
      } else if (headingLevels.length > 0) {
        // Check for skipped levels
        let isSequential = true;
        if (headingLevels[0] !== 1) {
          isSequential = false;
        }
        for (let i = 1; i < headingLevels.length; i++) {
          if (headingLevels[i] - headingLevels[i - 1] > 1) {
            isSequential = false;
            break;
          }
        }
    
        if (!isSequential) {
          warnings++;
          issues.push({
            severity: 'warning',
            code: 'TAG-004',
            message: `Heading hierarchy has gaps: ${headingLevels.map((l) => `H${l}`).join(', ')}`,
            details:
              'Heading levels should be sequential without skipping levels (e.g., H1 → H2 → H3).',
          });
        } else {
          passed++;
          issues.push({
            severity: 'info',
            code: 'TAG-004',
            message: `Heading hierarchy is sequential: ${headingLevels.map((l) => `H${l}`).join(', ')}`,
          });
        }
      } else {
        passed++;
        issues.push({
          severity: 'info',
          code: 'TAG-004',
          message: 'Generic H tags used for headings',
        });
      }
    
      // Check 5: Figure tags for images
      totalChecks++;
      const figureCount = tagsAnalysis.roleCounts.Figure ?? 0;
    
      if (imageCount > 0 && figureCount === 0) {
        failed++;
        issues.push({
          severity: 'error',
          code: 'TAG-005',
          message: `Document has ${imageCount} image(s) but no Figure tags`,
          details: 'Images must be tagged as Figure with appropriate alt text for PDF/UA compliance.',
        });
      } else if (imageCount > 0 && figureCount < imageCount) {
        warnings++;
        issues.push({
          severity: 'warning',
          code: 'TAG-005',
          message: `${imageCount} image(s) found but only ${figureCount} Figure tag(s)`,
          details:
            'Some images may not be properly tagged. Decorative images should be marked as artifacts.',
        });
      } else if (imageCount > 0) {
        passed++;
        issues.push({
          severity: 'info',
          code: 'TAG-005',
          message: `${figureCount} Figure tag(s) for ${imageCount} image(s)`,
        });
      } else {
        passed++;
        issues.push({
          severity: 'info',
          code: 'TAG-005',
          message: 'No images detected; Figure tag check not applicable',
        });
      }
    
      // Check 6: Paragraph tags
      totalChecks++;
      const hasParagraphs = tagsAnalysis.roleCounts.P !== undefined;
      if (!hasParagraphs) {
        warnings++;
        issues.push({
          severity: 'warning',
          code: 'TAG-006',
          message: 'No P (Paragraph) tags found',
          details: 'Text content should be wrapped in P tags for proper reading order.',
        });
      } else {
        passed++;
        issues.push({
          severity: 'info',
          code: 'TAG-006',
          message: `${tagsAnalysis.roleCounts.P} Paragraph tag(s) found`,
        });
      }
    
      // Check 7: Minimum element count
      totalChecks++;
      if (tagsAnalysis.totalElements < 2) {
        warnings++;
        issues.push({
          severity: 'warning',
          code: 'TAG-007',
          message: `Only ${tagsAnalysis.totalElements} structure element(s) found`,
          details:
            'A properly tagged document should have multiple structure elements reflecting its content.',
        });
      } else {
        passed++;
        issues.push({
          severity: 'info',
          code: 'TAG-007',
          message: `${tagsAnalysis.totalElements} structure elements found`,
        });
      }
    
      // Check 8: Table tags
      totalChecks++;
      const tableCount = tagsAnalysis.roleCounts.Table ?? 0;
      const trCount = tagsAnalysis.roleCounts.TR ?? 0;
      const tdCount = tagsAnalysis.roleCounts.TD ?? 0;
      const thCount = tagsAnalysis.roleCounts.TH ?? 0;
    
      if (tableCount > 0) {
        if (trCount === 0) {
          warnings++;
          issues.push({
            severity: 'warning',
            code: 'TAG-008',
            message: `${tableCount} Table tag(s) but no TR (Table Row) tags`,
            details: 'Tables should contain TR, TH, and TD tags for proper structure.',
          });
        } else if (thCount === 0) {
          warnings++;
          issues.push({
            severity: 'warning',
            code: 'TAG-008',
            message: `Table has ${trCount} row(s) but no TH (Table Header) tags`,
            details: 'PDF/UA requires tables to have header cells (TH) for accessibility.',
          });
        } else {
          passed++;
          issues.push({
            severity: 'info',
            code: 'TAG-008',
            message: `Table structure: ${tableCount} table(s), ${trCount} row(s), ${thCount} header(s), ${tdCount} cell(s)`,
          });
        }
      } else {
        passed++;
        issues.push({
          severity: 'info',
          code: 'TAG-008',
          message: 'No Table tags found; table check not applicable',
        });
      }
    
      const errorCount = issues.filter((i) => i.severity === 'error').length;
      const warnCount = issues.filter((i) => i.severity === 'warning').length;
    
      let summary: string;
      if (errorCount === 0 && warnCount === 0) {
        summary = `All ${totalChecks} checks passed. Document appears well-structured for PDF/UA compliance.`;
      } else if (errorCount === 0) {
        summary = `${passed}/${totalChecks} checks passed with ${warnCount} warning(s). Review warnings for full PDF/UA compliance.`;
      } else {
        summary = `${passed}/${totalChecks} checks passed, ${errorCount} error(s), ${warnCount} warning(s). Critical issues must be resolved for PDF/UA compliance.`;
      }
    
      return {
        isTagged: true,
        totalChecks,
        passed,
        failed,
        warnings,
        issues,
        summary,
      };
    }
  • MCP tool registration for 'validate_tagged'. Registers the tool with schema, metadata, and a handler that calls validateTagged service function, formats output as JSON or Markdown, and handles errors.
    export function registerValidateTagged(server: McpServer): void {
      server.registerTool(
        'validate_tagged',
        {
          title: 'Validate Tagged PDF',
          description: `Validate PDF/UA tagged structure requirements.
    
    Args:
      - file_path (string): Absolute path to a local PDF file
      - response_format ('markdown' | 'json'): Output format (default: 'markdown')
    
    Returns:
      Validation results including: whether the PDF is tagged, total checks performed, pass/fail counts, detailed issues with severity levels (error/warning/info), and a summary.
    
    Checks performed:
      - Document marked as tagged
      - Structure tree root existence
      - Document root tag presence
      - Heading hierarchy (H1-H6) sequential order
      - Figure tags for images
      - Paragraph tag presence
      - Structure element count
      - Table tag structure (TR/TH/TD)
    
    Examples:
      - Check if a PDF meets PDF/UA accessibility requirements
      - Identify missing or incorrect tag structure
      - Assess document accessibility quality`,
          inputSchema: ValidateTaggedSchema,
          annotations: {
            readOnlyHint: true,
            destructiveHint: false,
            idempotentHint: true,
            openWorldHint: false,
          },
        },
        async (params: ValidateTaggedInput) => {
          try {
            const result = await validateTagged(params.file_path);
    
            const raw =
              params.response_format === ResponseFormat.JSON
                ? JSON.stringify(result, null, 2)
                : formatTaggedValidationMarkdown(result);
    
            const { text } = truncateIfNeeded(raw);
            return { content: [{ type: 'text' as const, text }] };
          } catch (error) {
            const err = handleStructuredError(error);
            return {
              content: [{ type: 'text' as const, text: JSON.stringify(err, null, 2) }],
              isError: true,
            };
          }
        },
      );
    }
  • Zod schema for validate_tagged input: file_path (string) and response_format ('markdown' | 'json').
    export const ValidateTaggedSchema = z
      .object({
        file_path: FilePathSchema,
        response_format: ResponseFormatSchema,
      })
      .strict();
  • TaggedValidation output type: isTagged, totalChecks, passed, failed, warnings, issues array, and summary string.
    export interface TaggedValidation {
      isTagged: boolean;
      totalChecks: number;
      passed: number;
      failed: number;
      warnings: number;
      issues: ValidationIssue[];
      summary: string;
    }
  • Top-level registration: imports registerValidateTagged and calls it on line 49 within registerAllTools.
    import { registerValidateTagged } from './tier3/validate-tagged.js';
    
    /**
     * Register all tools with the MCP server.
     */
    export function registerAllTools(server: McpServer): void {
      // Tier 1: Basic PDF operations
      registerGetPageCount(server);
      registerGetMetadata(server);
      registerReadText(server);
      registerSearchText(server);
      registerReadImages(server);
      registerReadUrl(server);
      registerSummarize(server);
    
      // Tier 2: Structure analysis
      registerInspectStructure(server);
      registerInspectTags(server);
      registerInspectFonts(server);
      registerInspectAnnotations(server);
      registerInspectSignatures(server);
      registerExtractTables(server);
    
      // Tier 3: Validation & analysis
      registerValidateTagged(server);
      registerValidateMetadata(server);
      registerCompareStructure(server);
    }
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate read-only, non-destructive, idempotent behavior. The description adds substantial behavioral context by listing all performed checks (8 specific checks) and return values. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (Args, Returns, Checks performed, Examples). It is somewhat lengthy but every section adds value. Front-loaded with purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multiple checks, no output schema), the description fully conveys inputs, outputs (detailed return fields), and examples. Agent can reliably invoke and interpret results.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with both parameters fully described. The description restates parameter info but adds default value. No additional semantic value beyond schema for parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The title 'Validate Tagged PDF' and description explicitly state validation of PDF/UA tagged structure requirements. The description lists specific checks (e.g., heading hierarchy, table tags) that clearly differentiate it from sibling tools like 'inspect_tags' or 'validate_metadata'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear examples of when to use the tool (e.g., check PDF/UA compliance, identify missing tags). It does not explicitly contrast with alternatives, but the examples give sufficient context for agent to decide.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/shuji-bonji/pdf-reader-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server