Skip to main content
Glama

brand_extract_pdf

Extracts brand colors, typography, spacing, and rules from PDF brand guidelines via text extraction and pattern matching. Writes authoritative values to core-identity.yaml, outranking web sources.

Instructions

Extract brand colors, typography, spacing, and guideline rules from a PDF brand guidelines document. Accepts a local file path to a PDF. Uses text extraction and pattern matching to identify hex color values, font names, size specifications, and spacing rules. Writes extracted values to core-identity.yaml with source='guidelines' and updates source-catalog.json. Guidelines source outranks web extraction by default based on brand.config.yaml source_priority. Use when the user has brand guidelines as a PDF file — this is the most accurate extraction source. Use after brand_extract_web to merge authoritative guideline values with web-extracted data. Run brand_resolve_conflicts afterward to review any disagreements between sources. NOT for website extraction — use brand_extract_web. NOT for Figma — use brand_extract_figma.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYesPath to a PDF brand guidelines document.
pagesNoPage range to parse: "all", "3", or "1-5".all

Implementation Reference

  • Main handler function for brand_extract_pdf tool. Reads the .brand/ directory config and existing identity, calls extractPdfBrandData from pdf-extractor, merges extracted colors/typography/spacing with priority over existing identity, writes updated identity to core-identity.yaml, updates source catalog, and returns extraction results.
    async function handler(input: Params) {
      const brandDir = new BrandDir(process.cwd());
    
      if (!(await brandDir.exists())) {
        return buildResponse({
          what_happened: "No .brand/ directory found",
          next_steps: ["Run brand_start or brand_init first to create the brand system"],
          data: { error: ERROR_CODES.NOT_INITIALIZED },
        });
      }
    
      const config = await brandDir.readConfig();
      const sourcePriority = getConfiguredSourcePriority(config);
      const identity = await brandDir.readCoreIdentity();
      let extracted;
      try {
        extracted = await extractPdfBrandData(input.file_path, input.pages);
      } catch (error) {
        return buildResponse({
          what_happened: `PDF extraction failed for ${input.file_path}`,
          next_steps: [
            "Check that the file exists and is a readable PDF",
            "If the PDF is image-only, try a selectable-text export or narrower page range",
          ],
          data: {
            success: false,
            error: ERROR_CODES.FETCH_FAILED,
            details: error instanceof Error ? error.message : String(error),
          },
        });
      }
    
      let colors = [...identity.colors];
      for (const color of extracted.colors) {
        colors = mergeColorWithPriority(colors, color, sourcePriority);
      }
    
      let typography = [...identity.typography];
      for (const entry of extracted.typography) {
        typography = mergeTypographyWithPriority(typography, entry, sourcePriority);
      }
    
      const spacing = extracted.spacing && (
        !identity.spacing
        || sourcePriority.indexOf("guidelines") <= sourcePriority.indexOf(identity.spacing.source)
      )
        ? extracted.spacing
        : identity.spacing;
    
      await brandDir.writeCoreIdentity({
        ...identity,
        colors,
        typography,
        spacing,
      });
    
      await upsertSourceCatalog(
        brandDir,
        buildSourceCatalogRecords({
          colors: extracted.colors,
          typography: extracted.typography,
          spacing: extracted.spacing,
        }),
      );
    
      return buildResponse({
        what_happened: `PDF guideline extraction complete for ${extracted.filePath}`,
        next_steps: [
          "Review the extracted colors, typography, spacing, and rule snippets",
          "Run brand_resolve_conflicts with mode \"show\" to inspect differences against existing sources",
          "Run brand_compile to refresh tokens, DESIGN.md, and runtime artifacts",
        ],
        data: {
          success: true,
          file_path: extracted.filePath,
          pages: extracted.pages,
          page_count: extracted.pageCount,
          extracted: {
            colors: extracted.colors,
            typography: extracted.typography,
            spacing: extracted.spacing,
            logos: extracted.logos,
            brand_rules: extracted.rules,
          },
        },
      });
    }
  • Registers the 'brand_extract_pdf' tool on the MCP server with schema params (file_path, pages), a description string explaining usage, and a wrapper that validates inputs via Zod schema before delegating to the handler.
    export function register(server: McpServer) {
      server.tool(
        "brand_extract_pdf",
        "Extract brand colors, typography, spacing, and guideline rules from a PDF brand guidelines document. Accepts a local file path to a PDF. Uses text extraction and pattern matching to identify hex color values, font names, size specifications, and spacing rules. Writes extracted values to core-identity.yaml with source='guidelines' and updates source-catalog.json. Guidelines source outranks web extraction by default based on brand.config.yaml source_priority. Use when the user has brand guidelines as a PDF file — this is the most accurate extraction source. Use after brand_extract_web to merge authoritative guideline values with web-extracted data. Run brand_resolve_conflicts afterward to review any disagreements between sources. NOT for website extraction — use brand_extract_web. NOT for Figma — use brand_extract_figma.",
        paramsShape,
        async (args) => {
          const result = safeParseParams(ParamsSchema, args);
          if (!result.success) return result.response;
          return handler(result.data);
        },
      );
    }
  • Zod schema defining input parameters: file_path (required string, path to PDF) and pages (optional string, default 'all', supports '3' or '1-5' range).
    const paramsShape = {
      file_path: z.string().min(1).describe("Path to a PDF brand guidelines document."),
      pages: z.string().default("all").describe('Page range to parse: "all", "3", or "1-5".'),
    };
    
    const ParamsSchema = z.object(paramsShape);
    type Params = z.infer<typeof ParamsSchema>;
  • Core PDF extraction logic using pdfjs-dist. Reads the PDF file, parses pages, extracts text content, then runs extractors for colors (hex values), typography (font families/sizes), spacing (base unit/scale), logo hints, and rules (dos/donts/guidance). Returns a PdfBrandExtraction object.
    export async function extractPdfBrandData(filePath: string, pages = "all"): Promise<PdfBrandExtraction> {
      const resolvedPath = resolve(filePath);
      const dataBuffer = await readFile(resolvedPath);
    
      // Dynamic import — pdfjs-dist is ESM-only in v5+
      const pdfjsLib = await import("pdfjs-dist/legacy/build/pdf.mjs");
    
      const loadingTask = pdfjsLib.getDocument({
        data: new Uint8Array(dataBuffer),
        useSystemFonts: true,
      });
      const doc = await loadingTask.promise;
    
      const range = parsePageRange(pages);
      const maxPage = range.end ?? doc.numPages;
      const pageTexts: string[] = [];
    
      for (let i = range.start; i <= Math.min(maxPage, doc.numPages); i++) {
        const page = await doc.getPage(i);
        const textContent = await page.getTextContent({
          includeMarkedContent: false,
        });
        const parts: string[] = [];
        let lastY = 0;
        for (const item of textContent.items) {
          if ("str" in item) {
            const y = (item as { str: string; transform: number[] }).transform[5];
            if (lastY && y !== lastY) {
              parts.push(`\n${item.str}`);
            } else {
              parts.push(item.str);
            }
            lastY = y;
          }
        }
        pageTexts.push(parts.join(""));
      }
    
      const text = pageTexts.join("\n\n").replace(/\n{3,}/g, "\n\n").trim();
    
      return {
        filePath: resolvedPath,
        pages,
        pageCount: doc.numPages,
        text,
        colors: extractColors(text),
        typography: extractTypography(text),
        spacing: extractSpacing(text),
        logos: extractLogoHints(text),
        rules: extractRules(text),
      };
    }
  • src/server.ts:39-64 (registration)
    Import and registration call for brand_extract_pdf in the main server setup, placed in the 'Session 1: Core Identity' section.
    import { register as registerExtractPdf } from "./tools/brand-extract-pdf.js";
    import { register as registerResolveConflicts } from "./tools/brand-resolve-conflicts.js";
    import { register as registerBrandcodeAuth } from "./tools/brand-brandcode-auth.js";
    import { register as registerBrandcodeConnect } from "./tools/brand-brandcode-connect.js";
    import { register as registerBrandcodeSync } from "./tools/brand-brandcode-sync.js";
    import { register as registerBrandcodeStatus } from "./tools/brand-brandcode-status.js";
    import { register as registerBrandcodeLive } from "./tools/brand-brandcode-live.js";
    import { register as registerRepoConnect } from "./tools/brand-repo-connect.js";
    import { register as registerRepoStatus } from "./tools/brand-repo-status.js";
    import { register as registerEnrichSkill } from "./tools/brand-enrich-skill.js";
    
    export function createServer(): McpServer {
      const server = new McpServer({
        name: "brandsystem",
        version: getVersion(),
      });
    
      // ── Entry points (register first — agents see these first) ──
      registerStart(server);       // #1: Entry point for new brands
      registerStatus(server);      // #2: "What can I do?" / resume point
    
      // ── Session 1: Core Identity ──
      registerExtractWeb(server);  // Extract from website (CSS parsing)
      registerExtractVisual(server); // Extract from website (headless Chrome + vision)
      registerExtractSite(server); // Extract from website (multi-page rendered crawl)
      registerExtractPdf(server); // Extract from PDF guidelines
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden. It details that it writes to core-identity.yaml and source-catalog.json, uses text extraction and pattern matching, and that guideline source outranks web extraction. It could be more explicit about side effects like overwriting, but covers key behaviors well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph with 9 sentences, starting with the main action. It is fairly concise but could be more structured (e.g., using separate sections or bullet points for purpose, usage, and exclusions). Most sentences add value, though some repetition of writing to files could be streamlined.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description adequately explains what is extracted and where it is stored. It covers the tool's purpose, input, processing, output files, source priority, and usage order. Missing details on error handling or performance, but overall complete for a specialized extraction tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with both parameters documented. The description adds minimal extra meaning beyond the schema mentions (e.g., 'Accepts a local file path'). Baseline 3 is appropriate as the description does not significantly enhance understanding of the parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it extracts brand colors, typography, spacing, and guideline rules from a PDF. It uses a specific verb ('Extract') and resource ('PDF brand guidelines'), and distinguishes from siblings like brand_extract_web and brand_extract_figma by explicitly stating what it is NOT for.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to use: when the user has a PDF brand guidelines document. It indicates this is the most accurate source and provides ordering instructions: use after brand_extract_web, run brand_resolve_conflicts afterward. It also states not for website or Figma extraction, directing to other tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Brandcode-Studio/brandsystem-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server