find_text
Locate specific text within PDF documents and retrieve its coordinates, supporting regular expressions for advanced searches.
Instructions
Find text in PDF and get coordinates. Supports regular expressions.
Ref: https://developer.pdf.co/api-reference/pdf-find/basic.md
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL to the source PDF file. Supports publicly accessible links including Google Drive, Dropbox, PDF.co Built-In Files Storage. Use 'upload_file' tool to upload local files. | |
| searchString | Yes | Text to search. Can support regular expressions if regexSearch is set to True. | |
| httpusername | No | HTTP auth user name if required to access source url. (Optional) | |
| httppassword | No | HTTP auth password if required to access source url. (Optional) | |
| pages | No | Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'. The first-page index is 0. (Optional) | |
| wordMatchingMode | No | Values can be either SmartMatch, ExactMatch, or None. (Optional) | |
| password | No | Password of the PDF file. (Optional) | |
| regexSearch | No | Set to True to enable regular expressions in the search string. (Optional) | |
| api_key | No | PDF.co API key. If not provided, will use X_API_KEY environment variable. (Optional) |
Implementation Reference
- pdfco/mcp/tools/apis/search.py:8-58 (handler)The main MCP tool handler for 'find_text', registered with @mcp.tool(). Defines input schema using Pydantic Field descriptions. Prepares params and delegates to the find_text_in_pdf helper function.@mcp.tool(name="find_text") async def find_text( url: str = Field( description="URL to the source PDF file. Supports publicly accessible links including Google Drive, Dropbox, PDF.co Built-In Files Storage. Use 'upload_file' tool to upload local files." ), searchString: str = Field( description="Text to search. Can support regular expressions if regexSearch is set to True." ), httpusername: str = Field( description="HTTP auth user name if required to access source url. (Optional)", default="", ), httppassword: str = Field( description="HTTP auth password if required to access source url. (Optional)", default="", ), pages: str = Field( description="Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'. The first-page index is 0. (Optional)", default="", ), wordMatchingMode: str = Field( description="Values can be either SmartMatch, ExactMatch, or None. (Optional)", default=None, ), password: str = Field( description="Password of the PDF file. (Optional)", default="" ), regexSearch: bool = Field( description="Set to True to enable regular expressions in the search string. (Optional)", default=False, ), api_key: str = Field( description="PDF.co API key. If not provided, will use X_API_KEY environment variable. (Optional)", default="", ), ) -> BaseResponse: """ Find text in PDF and get coordinates. Supports regular expressions. Ref: https://developer.pdf.co/api-reference/pdf-find/basic.md """ params = ConversionParams( url=url, httpusername=httpusername, httppassword=httppassword, pages=pages, password=password, ) return await find_text_in_pdf( params, searchString, regexSearch, wordMatchingMode, api_key=api_key )
- pdfco/mcp/services/pdf.py:64-76 (helper)Supporting helper function that builds the custom payload for text search parameters and invokes the generic request function to call PDF.co's 'pdf/find' API endpoint.async def find_text_in_pdf( params: ConversionParams, search_string: str, regex_search: bool = False, word_matching_mode: str | None = None, api_key: str | None = None, ) -> BaseResponse: custom_payload = {"searchString": search_string, "regexSearch": regex_search} if word_matching_mode: custom_payload["wordMatchingMode"] = word_matching_mode return await request( "pdf/find", params, custom_payload=custom_payload, api_key=api_key )