Skip to main content
Glama
shuji-bonji

@shuji-bonji/pdf-spec-mcp

get_tables

Extract table structures from a specified section of the PDF specification, returning tables with headers, rows, and optional captions.

Instructions

Extract table structures from a specified section of the PDF specification (ISO 32000-2). Returns tables with headers, rows, and optional captions.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
specNoSpecification ID (e.g., "iso32000-2", "ts32002", "pdfua2"). Use list_specs to see available specs. Default: "iso32000-2" (PDF 2.0).
sectionYesSection identifier (e.g., "7.3.4", "12.8", "Annex A")
table_indexNoOptional 0-based index to retrieve a specific table. If omitted, returns all tables in the section.

Implementation Reference

  • The tool handler function for 'get_tables'. Receives GetTablesArgs, validates inputs (spec, section, table_index), then delegates to the getTables service function.
    async function handleGetTables(args: GetTablesArgs) {
      const specId = validateSpecId(args.spec);
      validateSectionId(args.section);
      const tableIndex = validateTableIndex(args.table_index);
      await ensureRegistryInitialized();
      return getTables(args.section, tableIndex, specId);
    }
  • The main implementation in PDFSpecService.getTables(). First tries StructTree-based extraction (collectStructTreeTables), falls back to text-based detection (detectTablesFromText). Supports optional table_index to return a single table.
    public async getTables(
      sectionId: string,
      tableIndex?: number,
      specId?: string
    ): Promise<TablesResult> {
      const id = this.registry.resolveSpecId(specId);
      const result = await this.getSectionContent(sectionId, id);
    
      // Collect tables from StructTree (type: 'table')
      let tables: TableInfo[] = collectStructTreeTables(result.content);
    
      // Fallback: text-based table detection if StructTree has no tables
      if (tables.length === 0) {
        tables = detectTablesFromText(result.content);
      }
    
      if (tableIndex !== undefined) {
        if (tableIndex >= tables.length) {
          throw new ContentError(
            `table_index ${tableIndex} out of range. Section "${sectionId}" has ${tables.length} table(s).`
          );
        }
        return {
          section: result.sectionNumber,
          sectionTitle: result.title,
          totalTables: 1,
          tables: [tables[tableIndex]],
        };
      }
    
      return {
        section: result.sectionNumber,
        sectionTitle: result.title,
        totalTables: tables.length,
        tables,
      };
    }
  • Input type definition for get_tables arguments: spec (optional), section (required), table_index (optional).
    export interface GetTablesArgs {
      spec?: string;
      section: string;
      table_index?: number;
    }
  • Return type definition for get_tables: section number, section title, totalTables count, and array of TableInfo objects.
    export interface TablesResult {
      section: string;
      sectionTitle: string;
      totalTables: number;
      tables: TableInfo[];
    }
  • MCP tool definition (name, description, inputSchema) for 'get_tables'. Registers it with the SDK tool list.
    {
      name: 'get_tables',
      description:
        'Extract table structures from a specified section of the PDF specification (ISO 32000-2). ' +
        'Returns tables with headers, rows, and optional captions.',
      inputSchema: {
        type: 'object',
        properties: {
          spec: SPEC_PARAM,
          section: {
            type: 'string',
            description: 'Section identifier (e.g., "7.3.4", "12.8", "Annex A")',
          },
          table_index: {
            type: 'number',
            description:
              'Optional 0-based index to retrieve a specific table. ' +
              'If omitted, returns all tables in the section.',
          },
        },
        required: ['section'],
      },
    },
  • Tool handler registry mapping 'get_tables' string to handleGetTables function.
    export const toolHandlers = {
      list_specs: handleListSpecs,
      get_structure: handleGetStructure,
      get_section: handleGetSection,
      search_spec: handleSearchSpec,
      get_requirements: handleGetRequirements,
      get_definitions: handleGetDefinitions,
      get_tables: handleGetTables,
      compare_versions: handleCompareVersions,
    } as const;
  • Helper: collectStructTreeTables - Extracts tables from StructTree-based content elements (type: 'table'), merges continuation tables with same headers, and attaches preceding captions.
    function collectStructTreeTables(content: ContentElement[]): TableInfo[] {
      const tables: TableInfo[] = [];
    
      for (let i = 0; i < content.length; i++) {
        const element = content[i];
        if (element.type !== 'table') continue;
    
        // Check for caption in preceding paragraph
        let caption: string | null = null;
        if (i > 0) {
          const prev = content[i - 1];
          if (prev.type === 'paragraph' && /^Table\s+\d+/.test(prev.text)) {
            caption = prev.text;
          }
        }
    
        // Merge with previous table if this is a continuation (same headers, no caption)
        if (
          !caption &&
          tables.length > 0 &&
          element.headers.length > 0 &&
          arraysEqual(tables[tables.length - 1].headers, element.headers)
        ) {
          tables[tables.length - 1].rows.push(...element.rows);
          continue;
        }
    
        tables.push({
          index: tables.length,
          caption,
          headers: element.headers,
          rows: element.rows,
        });
      }
    
      return tables;
    }
  • Helper: detectTablesFromText - Fallback text-based table detection from paragraph patterns (Table N — Title caption format with tab/space-delimited rows).
    function detectTablesFromText(content: ContentElement[]): TableInfo[] {
      const tables: TableInfo[] = [];
      const TABLE_CAPTION_RE = /^(Table\s+\d+)\s*[—–-]\s*(.+)/;
    
      for (let i = 0; i < content.length; i++) {
        const el = content[i];
        if (el.type !== 'paragraph') continue;
    
        const captionMatch = el.text.match(TABLE_CAPTION_RE);
        if (!captionMatch) continue;
    
        const caption = el.text;
    
        const rows: string[][] = [];
        let headers: string[] = [];
        let j = i + 1;
    
        while (j < content.length) {
          const next = content[j];
          if (next.type !== 'paragraph') break;
          if (TABLE_CAPTION_RE.test(next.text)) break;
          if (next.text.length > 300 && !next.text.includes('\t')) break;
    
          let cells: string[];
          if (next.text.includes('\t')) {
            cells = next.text
              .split('\t')
              .map((c) => c.trim())
              .filter(Boolean);
          } else {
            cells = next.text
              .split(/\s{2,}/)
              .map((c) => c.trim())
              .filter(Boolean);
          }
    
          if (cells.length >= 2) {
            if (headers.length === 0) {
              headers = cells;
            } else {
              rows.push(cells);
            }
          } else {
            break;
          }
          j++;
        }
    
        if (headers.length > 0 || rows.length > 0) {
          tables.push({
            index: tables.length,
            caption,
            headers,
            rows,
          });
        }
      }
    
      return tables;
    }
  • Backward-compatible exported function getTables that delegates to defaultPdfService.getTables().
    export async function getTables(
      sectionId: string,
      tableIndex?: number,
      specId?: string
    ): Promise<TablesResult> {
      return defaultPdfService.getTables(sectionId, tableIndex, specId);
    }
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden. It states the tool is read-only (extracting) and details what is returned. However, it does not disclose potential limitations like rate limits or authentication needs, though for a retrieval tool this is acceptable.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence with no extraneous information. It front-loads the main action and result, making it efficient for an agent to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While there is no output schema, the description adequately explains the return type. It lacks details on edge cases (e.g., section without tables) but covers the core functionality sufficiently for a simple extraction tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage with clear explanations for each parameter. The tool description adds minimal extra value beyond stating the return format, which is not parameter-specific. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool extracts table structures from a specified section of a PDF specification, including what is returned (headers, rows, optional captions). It distinctly differentiates from sibling tools like get_section or get_definitions by focusing specifically on tables.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when table structures are needed from a PDF spec section, but it does not provide explicit guidance on when to use this tool over alternatives like get_section or search_spec, nor does it mention prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/shuji-bonji/pdf-spec-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server