Skip to main content
Glama
131,188 tools. Last updated 2026-05-07 21:48

"A tool for uploading and analyzing documents, extracting text from PDFs, and conducting research" matching MCP tools:

  • Fetches any public web page and returns clean, readable plain text stripped of HTML, navigation, scripts, advertisements, and boilerplate. Returns the page title, meta description, word count, and main body text ready for analysis or summarisation. Use this tool when an agent needs to read the content of a specific web page or article URL — for example to summarise an article, extract facts from a page, verify a claim by reading the source, or convert a web page into plain text to pass to another tool. Pass article URLs returned by web_news_headlines to this tool to read full article content. Do not use this tool to discover current news headlines — use web_news_headlines instead. Does not execute JavaScript — best suited for standard HTML content pages. Will not work with paywalled, login-protected, or JavaScript-rendered single-page applications.
    Connector
  • Primary tool for reading a filing's content. Pass a `document_id` from `list_filings` / `get_financials`. MANDATORY for any substantive answer - filing metadata (dates, form codes, descriptions) alone doesn't answer the user; the numbers and text live inside the document. ── RESPONSE SHAPES ── • `kind='embedded'` (PDF up to ~20 MB; structured text up to `max_bytes`): returns `bytes_base64` with the full document, `source_url_official` (evergreen registry URL for citation, auto-resolved), and `source_url_direct` (short-TTL signed proxy URL). For PDFs the host converts bytes into a document content block - you read it natively including scans. • `kind='resource_link'` (document exceeds `max_bytes`): NO `bytes_base64`. Returns `reason`, `next_steps`, the two source URLs, plus `index_preview` for PDFs (`{page_count, text_layer, outline_present, index_status}`). Use the navigation tools below. ── WORKFLOW FOR kind='resource_link' ── 1. Read `index_preview.text_layer`. Values: `full` (every page has real text), `partial` (mixed), `none` (scanned / image-only), `oversized_skipped` (indexing skipped), `encrypted` / `failed`. 2. If `full` / `partial`: call `get_document_navigation` (outline + previews + landmarks) and/or `search_document` to locate pages. If `none` / `oversized_skipped`: skip search. 3. Call `fetch_document_pages(pages='N-M', format='pdf'|'text'|'png')` to get actual content. Prefer `pdf` for citations, `text` for skim, `png` for scanned or oversized. ── CRITICAL RULES ── • **Navigation-aids-only**: previews, snippets, landmark matches, and outline titles returned by the navigation tools are for LOCATING pages. NEVER cite them as source material - quote only from `fetch_document_pages` output or this tool's inline bytes. • **No fallback to memory**: if this tool fails (rate limit, 5xx, disconnect), do NOT fill in names / numbers / dates from training data. Tell the user what failed and offer retry or `source_url_official`. • Don't reflexively retry with a larger `max_bytes` - for big PDFs the bytes are unreadable to you anyway. Use the navigation tools instead. `source_url_official` is auto-resolved from a session-side cache populated by the most recent `list_filings` call. The optional `company_id` / `transaction_id` / `filing_type` / `filing_description` inputs are OVERRIDES for the rare case where `document_id` didn't come through `list_filings`. Per-country document availability, format, and pricing - call `list_jurisdictions({jurisdiction:"<code>"})`.
    Connector
  • Retrieves the latest real-time news headlines and article summaries from BBC News and The Guardian across nine topic categories. Returns structured articles with headline, description, source name, article URL, and publication date — sorted most recent first. No API key required. Use this tool when an agent needs current news about a specific topic, wants to summarise today's headlines, needs to research recent events, monitor a subject area for new developments, or build a news briefing. Do not use this tool to read the full content of a specific article — use web_url_reader instead, passing the article URL returned by this tool. Do not use when news from sources outside BBC News and The Guardian is required.
    Connector
  • Expand one author into a deduplicated paper list. This is the main author->paper traversal tool and supports research filters. Use `author_id` when you already know the exact author, or `author_name` plus `candidate_index` after `scholarfetch_author_candidates`. Supported comma-separated `filters`: year>=YYYY, year<=YYYY, year=YYYY, has:abstract, has:doi, has:pdf, venue:<text>, title:<text>, doi:<text>. If you pass `engines`, it must include `openalex`.
    Connector
  • Retrieves the latest real-time news headlines and article summaries from BBC News and The Guardian across nine topic categories. Returns structured articles with headline, description, source name, article URL, and publication date — sorted most recent first. No API key required. Use this tool when an agent needs current news about a specific topic, wants to summarise today's headlines, needs to research recent events, monitor a subject area for new developments, or build a news briefing. Do not use this tool to read the full content of a specific article — use web_url_reader instead, passing the article URL returned by this tool. Do not use when news from sources outside BBC News and The Guardian is required.
    Connector
  • Return specific pages of a PDF in one of three formats: • format='pdf' - pdf-lib page slice, preserves the original text layer and fonts (no re-encoding). This is the ONLY format that gives you byte-exact, citation-grade content. Use this for financial numbers, legal quotes, and any answer requiring precision. • format='text' - raw extracted text from pdfjs. Machine-readable but NOT authoritative - OCR errors on bad-quality text layers can silently garble digits. Use only for summarisation / light reading, and cross-check numbers by re-fetching with format='pdf'. • format='png' - page rasterization via Cloudflare Browser Rendering, for documents with text_layer='none' (scanned PDFs). Phase 6 - may return 'not implemented' in current deployment. The response includes at most 100 pages (Anthropic document-block hard cap). Split larger ranges into multiple calls. Requires the document's bytes to already be cached - call fetch_document on the full document first if this is a new filing.
    Connector

Matching MCP Servers

Matching MCP Connectors

  • UK property research tools - crime stats, schools, demographics, valuations for AI.

  • US visa bulletin data and CBP border wait times. 3 MCP tools for immigration and travel planning.

  • Query SEC filings and financial documents from US capital markets and exchanges. This tool searches through 10-K annual reports, 10-Q quarterly reports, 8-K current reports, proxy statements, earnings call transcripts, investor presentations, and other SEC-mandated filings from US companies. Use for questions about US company financials, executive compensation, business operations, or regulatory disclosures. Limited to official SEC filings and related documents only.
    Connector
  • List all generated reports with status and summary info. Returns an array of report objects with id, report_type, status, title, and summary. Use the report id with atlas_get_report for details or atlas_download_report to download completed PDFs. Free.
    Connector
  • Retrieves AI-generated summaries of web search results using Brave's Summarizer API. This tool processes search results to create concise, coherent summaries of information gathered from multiple sources. When to use: - When you need a concise overview of complex topics from multiple sources - For quick fact-checking or getting key points without reading full articles - When providing users with summarized information that synthesizes various perspectives - For research tasks requiring distilled information from web searches Returns a text summary that consolidates information from the search results. Optional features include inline references to source URLs and additional entity information. Requirements: Must first perform a web search using brave_web_search with summary=true parameter. Requires a Pro AI subscription to access the summarizer functionality.
    Connector
  • Add a document to a deal's data room. Creates the deal if needed. This is the primary way to get documents into Sieve for screening. Upload a pitch deck, financials, or any document -- then call sieve_screen to analyze everything in the data room. Provide company_name to create a new deal (or find existing), or deal_id to add to an existing deal. Provide exactly one content source: file_path (local file), text (raw text/markdown), or url (fetch from URL). Args: title: Document title (e.g. "Pitch Deck Q1 2026"). company_name: Company name -- creates deal if new, finds existing if not. deal_id: Add to an existing deal (from sieve_deals or previous sieve_dataroom_add). website_url: Company website URL (used when creating a new deal). document_type: Type: 'pitch_deck', 'financials', 'legal', or 'other'. file_path: Path to a local file (PDF, DOCX, XLSX). The tool reads and uploads it. text: Raw text or markdown content (alternative to file). url: URL to fetch document from (alternative to file).
    Connector
  • Delete a single item by id. `kind` MUST match the item type: 'text' for text nodes, 'line' for freehand strokes, 'image' for images — the wrong kind silently targets the wrong table and is a common mistake. Get the id + type from `get_board` (texts[], lines[], images[]). There is no bulk/erase-all tool: loop if you need to delete multiple items.
    Connector
  • Extract text from PDFs and images as clean Markdown. Uses Mistral OCR — handles complex layouts, tables, handwriting, multi-column documents, and mathematical notation. Preserves document hierarchy in structured Markdown. 10 sats/page. Pay per request with Bitcoin Lightning — no API key or signup needed. Requires create_payment with toolName='extract_document' and quantity=pageCount for multi-page PDFs.
    Connector
  • Verify content authenticity by its SHA-256 hash. Use this when a user has a file and wants to check if it's been registered with ProofX without uploading it. The user should compute the SHA-256 hash of their file first.
    Connector
  • Create a job description from text within a hiring context. Returns a JD object with 'id' and stored content. Use JD content as jd_text in atlas_fit_match, atlas_fit_rank, atlas_start_jd_fit_batch, and atlas_start_jd_analysis. Requires context_id from atlas_create_context or atlas_list_contexts. Free.
    Connector
  • Fetches any public web page and returns clean, readable plain text stripped of HTML, navigation, scripts, advertisements, and boilerplate. Returns the page title, meta description, word count, and main body text ready for analysis or summarisation. Use this tool when an agent needs to read the content of a specific web page or article URL — for example to summarise an article, extract facts from a page, verify a claim by reading the source, or convert a web page into plain text to pass to another tool. Pass article URLs returned by web_news_headlines to this tool to read full article content. Do not use this tool to discover current news headlines — use web_news_headlines instead. Does not execute JavaScript — best suited for standard HTML content pages. Will not work with paywalled, login-protected, or JavaScript-rendered single-page applications.
    Connector
  • Starts a crawl job on a website and extracts content from all pages. **Best for:** Extracting content from multiple related pages, when you need comprehensive coverage. **Not recommended for:** Extracting content from a single page (use scrape); when token limits are a concern (use map + batch_scrape); when you need fast results (crawling can be slow). **Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control. **Common mistakes:** Setting limit or maxDiscoveryDepth too high (causes token overflow) or too low (causes missing pages); using crawl for a single page (use scrape instead). Using a /* wildcard is not recommended. **Prompt Example:** "Get all blog posts from the first two levels of example.com/blog." **Usage Example:** ```json { "name": "firecrawl_crawl", "arguments": { "url": "https://example.com/blog/*", "maxDiscoveryDepth": 5, "limit": 20, "allowExternalLinks": false, "deduplicateSimilarURLs": true, "sitemap": "include" } } ``` **Returns:** Operation ID for status checking; use firecrawl_check_crawl_status to check progress. **Safe Mode:** Read-only crawling. Webhooks and interactive actions are disabled for security.
    Connector
  • Search Hansard for parliamentary debates, questions, and speeches. Returns contributions from MPs and Lords including date, party, debate title, and text (capped at 3000 chars per contribution). Useful for understanding legislative intent or political context.
    Connector
  • List all positioning sessions (market analysis through lens selection to targeted edits). Returns an array of session objects with id, status, cv_version_id, and created_at. Use the session id with ceevee_get_positioning_session for full details including analysis results, edits, and PDFs. Free.
    Connector
  • Read the contents of an attached file directly. Use this when the user asks 'what is in this file?' or 'read this document'. Works for text files (.txt, .md, .json, code files, etc.) and PDFs (returns OCR-extracted text after files.ingest). For images, use files.get_base64.
    Connector
  • Fetches any public web page and returns clean, readable plain text stripped of HTML, navigation, scripts, advertisements, and boilerplate. Returns the page title, meta description, word count, and main body text ready for analysis or summarisation. Use this tool when an agent needs to read the content of a specific web page or article URL — for example to summarise an article, extract facts from a page, verify a claim by reading the source, or convert a web page into plain text to pass to another tool. Pass article URLs returned by web_news_headlines to this tool to read full article content. Do not use this tool to discover current news headlines — use web_news_headlines instead. Does not execute JavaScript — best suited for standard HTML content pages. Will not work with paywalled, login-protected, or JavaScript-rendered single-page applications.
    Connector