parse_pdf
Parse PDF files from local paths or URLs into JSON or Markdown for automated data extraction.
Instructions
Parses a PDF file and returns the extracted content in the specified format.
The function supports both local file paths and remote URLs as input sources. It extracts
the content from the PDF and formats it either as structured JSON or as a Markdown string.
:param source: The source of the PDF file to be parsed.
- If it is a string starting with "http://" or "https://", it will be treated as a remote URL.
- Otherwise, it will be treated as a local file path (absolute path recommended, e.g. "/Users/yourname/file.pdf").
:param format: The desired format for the parsed output. Supports:
- "json": Returns the extracted content as a dictionary.
- "markdown": Returns the extracted content as a Markdown-formatted string.
:return: The extracted content in the specified format (JSON dictionary or Markdown string).Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| source | Yes | ||
| format | No | json |
Implementation Reference
- The parse_pdf async function that implements the tool logic: validates format, calls client.parse_pro.parse(source, format), and returns the result as a JSON string if needed.
async def parse_pdf(source: str, format: Literal["json", "markdown"] = "json"): """ Parses a PDF file and returns the extracted content in the specified format. The function supports both local file paths and remote URLs as input sources. It extracts the content from the PDF and formats it either as structured JSON or as a Markdown string. :param source: The source of the PDF file to be parsed. - If it is a string starting with "http://" or "https://", it will be treated as a remote URL. - Otherwise, it will be treated as a local file path (absolute path recommended, e.g. "/Users/yourname/file.pdf"). :param format: The desired format for the parsed output. Supports: - "json": Returns the extracted content as a dictionary. - "markdown": Returns the extracted content as a Markdown-formatted string. :return: The extracted content in the specified format (JSON dictionary or Markdown string). """ if format not in ["json", "markdown"]: raise ValueError(f"Unsupported output format: {format}") res = await client.parse_pro.parse( source=source, format=format, ) if not isinstance(res, str): res = json.dumps(res, ensure_ascii=False) return res - src/netmind_parse_pdf_mcp/server.py:15-15 (registration)The @mcp.tool() decorator registers parse_pdf as an MCP tool on the FastMCP server instance.
@mcp.tool() - The function signature and docstring define the input schema: source (str, URL or file path) and format (Literal "json" or "markdown"). The function raises ValueError for unsupported formats.
""" Parses a PDF file and returns the extracted content in the specified format. The function supports both local file paths and remote URLs as input sources. It extracts the content from the PDF and formats it either as structured JSON or as a Markdown string. :param source: The source of the PDF file to be parsed. - If it is a string starting with "http://" or "https://", it will be treated as a remote URL. - Otherwise, it will be treated as a local file path (absolute path recommended, e.g. "/Users/yourname/file.pdf"). :param format: The desired format for the parsed output. Supports: - "json": Returns the extracted content as a dictionary. - "markdown": Returns the extracted content as a Markdown-formatted string. :return: The extracted content in the specified format (JSON dictionary or Markdown string). """ - The main() function checks for the NETMIND_API_TOKEN environment variable and runs the MCP server via stdio transport. This is the entry point that launches the tool server.
def main(): if not NETMIND_API_TOKEN: print( "Error: NETMIND_API_TOKEN environment variable is required", file=sys.stderr, ) sys.exit(1) mcp.run(transport='stdio') if __name__ == "__main__": main()