get_page_markdown
Extract clean markdown content from web pages by removing navigation, headers, footers, and sidebars while preserving main content formatting and optional images or links.
Instructions
Extract clean markdown content from a URL. Returns only the main content without navigation, headers, footers, or sidebars.
Input Schema
Name | Required | Description | Default |
---|---|---|---|
includeImages | No | Whether to include image references in markdown (default: true) | |
includeLinks | No | Whether to include hyperlinks in markdown (default: true) | |
timeout | No | Navigation timeout in milliseconds (default: 30000) | |
url | Yes | The URL to extract markdown from | |
waitForSelector | No | Optional CSS selector to wait for before extracting content |
Input Schema (JSON Schema)
{
"properties": {
"includeImages": {
"default": true,
"description": "Whether to include image references in markdown (default: true)",
"type": "boolean"
},
"includeLinks": {
"default": true,
"description": "Whether to include hyperlinks in markdown (default: true)",
"type": "boolean"
},
"timeout": {
"default": 30000,
"description": "Navigation timeout in milliseconds (default: 30000)",
"type": "number"
},
"url": {
"description": "The URL to extract markdown from",
"type": "string"
},
"waitForSelector": {
"description": "Optional CSS selector to wait for before extracting content",
"type": "string"
}
},
"required": [
"url"
],
"type": "object"
}