Skip to main content
Glama
166,954 tools. Last updated 2026-06-02 20:00

"Tools and methods for extracting data from PDF files" matching MCP tools:

  • Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
    Connector
  • Returns the four behavioral data-source buckets - Search & attention, Conversation & pain, Adoption & spend, Capital & hiring - with each bucket's tagline and what it captures. Use when a user asks "what data sources do you use?", "where does the Demand Score come from?", or wants to understand how Demand Discovery AI differs from passive validation tools (which only triangulate the first two buckets). This four-bucket framing is the core competitive moat. The specific connector list is intentionally not public. Trigger phrases: "what data sources", "where does the demand score come from", "behavioral data sources", "the four buckets", "search and attention bucket", "conversation and pain bucket", "adoption and spend bucket", "capital and hiring bucket", "how many data sources", "what kind of data sources".
    Connector
  • Returns the four behavioral data-source buckets - Search & attention, Conversation & pain, Adoption & spend, Capital & hiring - with each bucket's tagline and what it captures. Use when a user asks "what data sources do you use?", "where does the Demand Score come from?", or wants to understand how Demand Discovery AI differs from passive validation tools (which only triangulate the first two buckets). This four-bucket framing is the core competitive moat. The specific connector list is intentionally not public. Trigger phrases: "what data sources", "where does the demand score come from", "behavioral data sources", "the four buckets", "search and attention bucket", "conversation and pain bucket", "adoption and spend bucket", "capital and hiring bucket", "how many data sources", "what kind of data sources".
    Connector
  • Get a human's FULL profile including contact info (email, Telegram, Signal), crypto wallets, fiat payment methods (PayPal, Venmo, etc.), and social links. Requires agent_key from register_agent. Rate limited: PRO = 50/day. Alternative: $0.05 via x402. Use this before create_job_offer to see how to pay the human. The human_id comes from search_humans results.
    Connector
  • Fetch the result JSON for a completed brand audit. With `target` set, returns the per-target CheckResult; without, returns the audit-level aggregate. Returns notReady when polling an in-flight audit. When a rendered PDF sidecar exists and the R2 binding is configured, metadata includes a signed PDF URL; completed targets without a PDF URL include pdfPending so callers can poll again.
    Connector
  • Download a completed report as PDF. Returns base64-encoded PDF content. Confirm report status='completed' via atlas_get_report(report_id) first. report_id from atlas_start_report response or atlas_list_reports. Free.
    Connector

Matching MCP Servers

Matching MCP Connectors

  • Markdown to PDF: headings, bold, code, lists, rules. A4/Letter/Legal. Free 30/hr. MCP + REST.

  • Send transactional pdfs for AI agents via SMTP. Templates included.

  • Upload one or more images to a Wix site's Media Manager. Returns the uploaded file URL (wixstatic.com) and media ID usable in other Wix APIs. ⚠️ You MUST provide image data — calling this tool without image data will fail. ⚠️ NEVER call this tool more than once when uploading multiple images. Always pass ALL images together in a single call using the image array. Choose ONE of the two supported input methods: Option A — image array (use when the user attaches image files OR provides image URLs): Pass siteId + image array with ALL images at once. Each item requires download_url. If you are a ChatGPT/OpenAI client: user-attached files are automatically resolved to download_urls — just pass them in the image array. Even for a single image, wrap it in an array. Option B — imageBase64 (use only when you can read and encode the file yourself): Read the file, encode it as base64, and pass siteId + imageBase64 + mimeType. Supports one image at a time.
    Connector
  • Find working SOURCE CODE examples from 37 indexed Senzing GitHub repositories. REQUIRED: either `query` (string, for search) or `repo` with `file_path` or `list_files=true` — the call WILL FAIL without one. Three modes: (1) Search: pass `query` to find examples across all repos, (2) File listing: pass `repo` + `list_files=true`, (3) File retrieval: pass `repo` + `file_path`. Indexes source code (.py, .java, .cs, .rs) and READMEs — NOT build/data files. For sample data, use get_sample_data. Covers Python, Java, C#, Rust SDK patterns: initialization, ingestion, search, redo, configuration, message queues, REST APIs. Use max_lines to limit large files. Returns GitHub raw URLs for file retrieval.
    Connector
  • Confirm a narrative lens and generate targeted CV edits with trade-offs (5 credits, takes 20-30s). Returns an array of section edits with before/after text, trade-off notes, and optionally clean + review PDF download URLs. This is step 3 (final step) of the positioning pipeline. Pass confirmed_lens from ceevee_analyze_positioning, and optionally positioning_snapshot, detected_lens_full, recruiter_inference, selected_opportunities from prior steps for richer edits. Use ceevee_explain_change to understand any specific edit.
    Connector
  • Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
    Connector
  • Trigger background datasheet extraction for multiple parts at once (up to 20). Non-blocking — returns immediately with the status of each part. Use this to warm up datasheets for a BOM before calling read_datasheet. Example: prefetch_datasheets(['TPS54302', 'ADS1115', 'LP5907']) If a part comes back 'no_source' on the first call, retry prefetch for that MPN once after 10-30s — the URL resolver is retriable and often finds a source on the second pass. If still 'no_source', use request_datasheet_upload + confirm_datasheet_upload to attach your own PDF (org-private). Part numbers must be specific MPNs (e.g. 'STM32F446RCT6', 'TPS54302DDCR') or LCSC numbers (e.g. 'C2837938'). Do NOT pass bare values ('100nF', '10K'), descriptions, BOM reference designators, test points, or board/module names — see the server instructions for the full rule set. When a BOM has values-only rows, use search_parts first to resolve each to an MPN. DATASHEET STATUS VALUES: - 'ready' — extracted and indexed; call read_datasheet, search_datasheets, or analyze_image. - 'extracting' / 'in_progress' / 'queued' / 'pending' — extraction running or scheduled. Poll check_extraction_status every 5-10s until 'ready' or 'failed'. Typical time: 30s-2min. - 'not_extracted' — known part but datasheet hasn't been fetched yet. Trigger it via prefetch_datasheets (cheapest) or by calling read_datasheet (auto-triggers on first read). - 'no_source' — we couldn't find a public datasheet URL for this MPN. First, retry prefetch_datasheets in 10-30s (the URL resolver re-runs and often finds a source on the second pass). If still 'no_source', the agent can upload the PDF manually via request_datasheet_upload + confirm_datasheet_upload (see those tools). Org-uploaded datasheets are private to the org. - 'unsupported' — PDF exists but can't be extracted (scanned image-only, encrypted, or corrupted). Upload a clean text-based PDF via request_datasheet_upload to override. - 'failed' / 'error' — extraction errored. The response includes the error reason. Retry via prefetch_datasheets or escalate to support. - 'rejected' — input wasn't a real MPN (bare value like '100nF', description, or reference designator). Fix the input and re-call. - 'deduplicated' — another part in the family already has this datasheet; same content is returned under the primary MPN.
    Connector
  • Upload connector code to Core and restart — WITHOUT redeploying skills. Use this to update connector source code (server.js, UI assets, plugins) quickly. Set github=true to pull files from the solution's GitHub repo, or pass files directly. Much faster than ateam_build_and_run for connector-only changes.
    Connector
  • Get contents of multiple files from a remote public git repository in a single call. Reduces round-trips when you need to read several related files. Max 10 files per batch, 5000 total lines budget across all files. Each file supports optional line ranges. Failed files return per-file errors without blocking other files.
    Connector
  • Starts a crawl job on a website and extracts content from all pages. **Best for:** Extracting content from multiple related pages, when you need comprehensive coverage. **Not recommended for:** Extracting content from a single page (use scrape); when token limits are a concern (use map + batch_scrape); when you need fast results (crawling can be slow). **Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control. **Common mistakes:** Setting limit or maxDiscoveryDepth too high (causes token overflow) or too low (causes missing pages); using crawl for a single page (use scrape instead). Using a /* wildcard is not recommended. **Prompt Example:** "Get all blog posts from the first two levels of example.com/blog." **Usage Example:** ```json { "name": "firecrawl_crawl", "arguments": { "url": "https://example.com/blog/*", "maxDiscoveryDepth": 5, "limit": 20, "allowExternalLinks": false, "deduplicateSimilarURLs": true, "sitemap": "include" } } ``` **Returns:** Operation ID for status checking; use firecrawl_check_crawl_status to check progress. **Safe Mode:** Read-only crawling. Webhooks and interactive actions are disabled for security.
    Connector
  • Headline findings from Wiremi's public-data report, "The State of ROSCAs in the Canadian Diaspora 2026": immigrant population from rotating-savings cultures, the credit-invisibility gap measured by Statistics Canada, why ROSCA payments are invisible to credit bureaus, and the 70-year history of ROSCAs in Canada. Every figure is sourced to public data (Statistics Canada, World Bank, peer-reviewed research). Returns the canonical report URL and PDF so callers can cite the source. No personal data.
    Connector
  • Search for electronic components by part number, description, or keyword. Start here — this is the best entry point for finding components. Queries all configured providers in parallel. Results are merged by MPN with indicative pricing and stock from each source. Each result includes datasheet_status so you know which parts have datasheets available for read_datasheet. Best with specific part numbers or keywords (e.g. 'STM32F103', 'buck converter 3A'). For spec-based discovery in natural language, use search_datasheets instead. When the calling org has a private parts library, matching org-uploaded parts are appended to the results with source='private_library' and any tags the team has applied — including private parts whose MPN, manufacturer, description, type, category, or tag matches the query. DATASHEET STATUS VALUES: - 'ready' — extracted and indexed; call read_datasheet, search_datasheets, or analyze_image. - 'extracting' / 'in_progress' / 'queued' / 'pending' — extraction running or scheduled. Poll check_extraction_status every 5-10s until 'ready' or 'failed'. Typical time: 30s-2min. - 'not_extracted' — known part but datasheet hasn't been fetched yet. Trigger it via prefetch_datasheets (cheapest) or by calling read_datasheet (auto-triggers on first read). - 'no_source' — we couldn't find a public datasheet URL for this MPN. First, retry prefetch_datasheets in 10-30s (the URL resolver re-runs and often finds a source on the second pass). If still 'no_source', the agent can upload the PDF manually via request_datasheet_upload + confirm_datasheet_upload (see those tools). Org-uploaded datasheets are private to the org. - 'unsupported' — PDF exists but can't be extracted (scanned image-only, encrypted, or corrupted). Upload a clean text-based PDF via request_datasheet_upload to override. - 'failed' / 'error' — extraction errored. The response includes the error reason. Retry via prefetch_datasheets or escalate to support. - 'rejected' — input wasn't a real MPN (bare value like '100nF', description, or reference designator). Fix the input and re-call. - 'deduplicated' — another part in the family already has this datasheet; same content is returned under the primary MPN.
    Connector
  • List available markdown holdings reports for Bulgarian pension funds. Reports contain detailed portfolio holdings data extracted from official PDF filings and converted to structured markdown with metadata (allocation %, exposure, top holdings). Use this tool to discover what reports are available before loading specific ones with `read_holdings_report`. Filter by manager, fund type, or date range.
    Connector
  • Get full details for a specific electronic component by manufacturer part number (MPN) or LCSC number. Returns specs, pricing, and stock from all configured providers, plus the cached datasheet summary if available. Includes datasheet_status and available_sections when ready. Set prefetch_datasheet=true to trigger background extraction — no extra charge. Use after search_parts to drill into a specific result. The part_number must be a specific manufacturer part number (e.g. 'TPS54302DDCR', 'STM32F446RCT6') or LCSC number (e.g. 'C2837938'). Do NOT pass bare component values ('100nF', '10K'), descriptions ('buck converter'), or reference designators ('R1', 'U3'). DATASHEET STATUS VALUES: - 'ready' — extracted and indexed; call read_datasheet, search_datasheets, or analyze_image. - 'extracting' / 'in_progress' / 'queued' / 'pending' — extraction running or scheduled. Poll check_extraction_status every 5-10s until 'ready' or 'failed'. Typical time: 30s-2min. - 'not_extracted' — known part but datasheet hasn't been fetched yet. Trigger it via prefetch_datasheets (cheapest) or by calling read_datasheet (auto-triggers on first read). - 'no_source' — we couldn't find a public datasheet URL for this MPN. First, retry prefetch_datasheets in 10-30s (the URL resolver re-runs and often finds a source on the second pass). If still 'no_source', the agent can upload the PDF manually via request_datasheet_upload + confirm_datasheet_upload (see those tools). Org-uploaded datasheets are private to the org. - 'unsupported' — PDF exists but can't be extracted (scanned image-only, encrypted, or corrupted). Upload a clean text-based PDF via request_datasheet_upload to override. - 'failed' / 'error' — extraction errored. The response includes the error reason. Retry via prefetch_datasheets or escalate to support. - 'rejected' — input wasn't a real MPN (bare value like '100nF', description, or reference designator). Fix the input and re-call. - 'deduplicated' — another part in the family already has this datasheet; same content is returned under the primary MPN.
    Connector
  • Download a completed CeeVee report as PDF. Returns base64-encoded PDF content. Only call after ceevee_get_report confirms status='completed'. report_id from ceevee_generate_report response or ceevee_list_reports. Free.
    Connector