fetch_as_markdown
Convert website HTML to Markdown format by fetching content from a URL, enabling structured text extraction for documentation or analysis.
Instructions
Fetch a website, convert its HTML content to Markdown, and return it.
Args:
url (str): URL of the website to fetch.
headers (Optional[dict[str, str]]): Custom headers for the request.
Returns:
FetchResponse: An object containing the Markdown content or an error message.
On success, isError is false and content contains the Markdown text.
On failure, isError is true and errorMessage contains the error details.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| headers | No |
Implementation Reference
- jij_mcp/mcp_setting.py:235-253 (handler)The main handler function for the 'fetch_as_markdown' MCP tool, registered with @mcp.tool(). It constructs FetchRequestArgs and delegates to Fetcher.markdown.@mcp.tool() async def fetch_as_markdown( url: str, headers: typ.Optional[dict[str, str]] = None ) -> FetchResponse: """ Fetch a website, convert its HTML content to Markdown, and return it. Args: url (str): URL of the website to fetch. headers (Optional[dict[str, str]]): Custom headers for the request. Returns: FetchResponse: An object containing the Markdown content or an error message. On success, isError is false and content contains the Markdown text. On failure, isError is true and errorMessage contains the error details. """ args = FetchRequestArgs(url=url, headers=headers) return await Fetcher.markdown(args)
- jij_mcp/fetch/types.py:5-12 (schema)Pydantic schema for input arguments (url and optional headers) used by fetch_as_markdown.class FetchRequestArgs(BaseModel): """Input arguments schema for fetch tools.""" url: HttpUrl = Field(..., description="URL of the content to fetch.") headers: Optional[dict[str, str]] = Field( default=None, description="Optional headers to include in the request." )
- jij_mcp/fetch/types.py:15-19 (schema)Pydantic schema for the response, including content, error flag, and message.class FetchResponse(BaseModel): content: list[dict[str, str]] # MCP標準のcontent形式に合わせる isError: bool = False errorMessage: Optional[str] = None
- jij_mcp/fetch/fetcher.py:125-144 (helper)Core helper method that fetches HTML, converts it to markdown using markdownify.MarkdownConverter (customized to skip images), handles encoding and errors.async def markdown(payload: FetchRequestArgs) -> FetchResponse: """Fetches content and converts it to Markdown.""" try: response = await Fetcher._fetch(payload) html_content = await response.aread() # Decode carefully before passing to markdownify try: html_text = html_content.decode("utf-8") except UnicodeDecodeError: detected_encoding = response.encoding or "iso-8859-1" html_text = html_content.decode(detected_encoding, errors="replace") # Use custom NoImagesConverter to ignore images converter = NoImagesConverter() md = converter.convert(html_text) return FetchResponse(content=[{"type": "text", "text": md}], isError=False) except Exception as e: return FetchResponse(content=[], isError=True, errorMessage=str(e))
- jij_mcp/fetch/fetcher.py:10-18 (helper)Custom MarkdownConverter subclass that skips image tags during HTML to Markdown conversion.class NoImagesConverter(MarkdownConverter): """ Create a custom MarkdownConverter that ignores all images during conversion """ def convert_img(self, el, text, parent_tags): # Return empty string instead of converting the image return ""