extract_entities
Extract named entities, linked concepts, and sameAs graph nodes from page content or raw text using rule-based heuristics. Build entity maps for schema generation or audit entity coverage.
Instructions
Extract named entities, linked concepts, and sameAs graph nodes from a page's content and structured data. Combines body-text NER heuristics with JSON-LD @type / sameAs walking.
Read-only when given url (one HTTP GET). Zero network when given text.
Deterministic, rule-based; no LLM. Output is a list of entities with type, confidence, and any sameAs URIs found in structured data.
When to use: building an entity map for schema generation, or auditing whether a page's entities match its target topic. To validate the JSON-LD itself, use audit_schema.
Either url or text must be provided.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | No | Public URL to fetch and analyze. Either this OR `text` is required. | |
| text | No | Raw text/HTML to analyze directly. Either this OR `url` is required. | |
| respect_robots | No | If true (default), respect robots.txt when fetching `url`. Ignored when `text` is used. |