find_text
Search and locate text within PDF documents using regular expressions and retrieve precise coordinates. Ideal for extracting specific information from PDFs efficiently.
Instructions
Find text in PDF and get coordinates. Supports regular expressions.
Ref: https://developer.pdf.co/api-reference/pdf-find/basic.md
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| api_key | No | PDF.co API key. If not provided, will use X_API_KEY environment variable. (Optional) | |
| httppassword | No | HTTP auth password if required to access source url. (Optional) | |
| httpusername | No | HTTP auth user name if required to access source url. (Optional) | |
| pages | No | Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'. The first-page index is 0. (Optional) | |
| password | No | Password of the PDF file. (Optional) | |
| regexSearch | No | Set to True to enable regular expressions in the search string. (Optional) | |
| searchString | Yes | Text to search. Can support regular expressions if regexSearch is set to True. | |
| url | Yes | URL to the source PDF file. Supports publicly accessible links including Google Drive, Dropbox, PDF.co Built-In Files Storage. Use 'upload_file' tool to upload local files. | |
| wordMatchingMode | No | Values can be either SmartMatch, ExactMatch, or None. (Optional) |
Implementation Reference
- pdfco/mcp/tools/apis/search.py:8-58 (handler)Full MCP tool handler for 'find_text', including registration decorator, input schema via Pydantic Fields, logic to build ConversionParams, and call to service helper.@mcp.tool(name="find_text") async def find_text( url: str = Field( description="URL to the source PDF file. Supports publicly accessible links including Google Drive, Dropbox, PDF.co Built-In Files Storage. Use 'upload_file' tool to upload local files." ), searchString: str = Field( description="Text to search. Can support regular expressions if regexSearch is set to True." ), httpusername: str = Field( description="HTTP auth user name if required to access source url. (Optional)", default="", ), httppassword: str = Field( description="HTTP auth password if required to access source url. (Optional)", default="", ), pages: str = Field( description="Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'. The first-page index is 0. (Optional)", default="", ), wordMatchingMode: str = Field( description="Values can be either SmartMatch, ExactMatch, or None. (Optional)", default=None, ), password: str = Field( description="Password of the PDF file. (Optional)", default="" ), regexSearch: bool = Field( description="Set to True to enable regular expressions in the search string. (Optional)", default=False, ), api_key: str = Field( description="PDF.co API key. If not provided, will use X_API_KEY environment variable. (Optional)", default="", ), ) -> BaseResponse: """ Find text in PDF and get coordinates. Supports regular expressions. Ref: https://developer.pdf.co/api-reference/pdf-find/basic.md """ params = ConversionParams( url=url, httpusername=httpusername, httppassword=httppassword, pages=pages, password=password, ) return await find_text_in_pdf( params, searchString, regexSearch, wordMatchingMode, api_key=api_key )
- pdfco/mcp/services/pdf.py:64-76 (helper)Service-level helper that performs the actual PDF.co API request for finding text in PDF, constructing the custom payload for searchString, regexSearch, and wordMatchingMode.async def find_text_in_pdf( params: ConversionParams, search_string: str, regex_search: bool = False, word_matching_mode: str | None = None, api_key: str | None = None, ) -> BaseResponse: custom_payload = {"searchString": search_string, "regexSearch": regex_search} if word_matching_mode: custom_payload["wordMatchingMode"] = word_matching_mode return await request( "pdf/find", params, custom_payload=custom_payload, api_key=api_key )