find_text
Search for text in PDF files and retrieve its exact coordinates. Supports regular expressions for advanced pattern matching.
Instructions
Find text in PDF and get coordinates. Supports regular expressions.
Ref: https://developer.pdf.co/api-reference/pdf-find/basic.mdInput Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL to the source PDF file. Supports publicly accessible links including Google Drive, Dropbox, PDF.co Built-In Files Storage. Use 'upload_file' tool to upload local files. | |
| searchString | Yes | Text to search. Can support regular expressions if regexSearch is set to True. | |
| httpusername | No | HTTP auth user name if required to access source url. (Optional) | |
| httppassword | No | HTTP auth password if required to access source url. (Optional) | |
| pages | No | Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'. The first-page index is 0. (Optional) | |
| wordMatchingMode | No | Values can be either SmartMatch, ExactMatch, or None. (Optional) | |
| password | No | Password of the PDF file. (Optional) | |
| regexSearch | No | Set to True to enable regular expressions in the search string. (Optional) | |
| api_key | No | PDF.co API key. If not provided, will use X_API_KEY environment variable. (Optional) |
Implementation Reference
- pdfco/mcp/tools/apis/search.py:8-58 (handler)The @mcp.tool(name='find_text') decorated async function that defines the find_text tool handler. It accepts url, searchString, httpusername, httppassword, pages, wordMatchingMode, password, regexSearch, and api_key parameters, then delegates to find_text_in_pdf service function.
@mcp.tool(name="find_text") async def find_text( url: str = Field( description="URL to the source PDF file. Supports publicly accessible links including Google Drive, Dropbox, PDF.co Built-In Files Storage. Use 'upload_file' tool to upload local files." ), searchString: str = Field( description="Text to search. Can support regular expressions if regexSearch is set to True." ), httpusername: str = Field( description="HTTP auth user name if required to access source url. (Optional)", default="", ), httppassword: str = Field( description="HTTP auth password if required to access source url. (Optional)", default="", ), pages: str = Field( description="Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'. The first-page index is 0. (Optional)", default="", ), wordMatchingMode: str = Field( description="Values can be either SmartMatch, ExactMatch, or None. (Optional)", default=None, ), password: str = Field( description="Password of the PDF file. (Optional)", default="" ), regexSearch: bool = Field( description="Set to True to enable regular expressions in the search string. (Optional)", default=False, ), api_key: str = Field( description="PDF.co API key. If not provided, will use X_API_KEY environment variable. (Optional)", default="", ), ) -> BaseResponse: """ Find text in PDF and get coordinates. Supports regular expressions. Ref: https://developer.pdf.co/api-reference/pdf-find/basic.md """ params = ConversionParams( url=url, httpusername=httpusername, httppassword=httppassword, pages=pages, password=password, ) return await find_text_in_pdf( params, searchString, regexSearch, wordMatchingMode, api_key=api_key ) - pdfco/mcp/tools/apis/search.py:8-8 (registration)The tool is registered with the MCP server via the @mcp.tool(name='find_text') decorator, where `mcp` is a FastMCP('pdfco') instance from pdfco.mcp.server.
@mcp.tool(name="find_text") - pdfco/mcp/services/pdf.py:64-76 (helper)The find_text_in_pdf service function that builds the custom payload with searchString and regexSearch, optionally includes wordMatchingMode, and sends a request to the 'pdf/find' endpoint via the PDF.co API client.
async def find_text_in_pdf( params: ConversionParams, search_string: str, regex_search: bool = False, word_matching_mode: str | None = None, api_key: str | None = None, ) -> BaseResponse: custom_payload = {"searchString": search_string, "regexSearch": regex_search} if word_matching_mode: custom_payload["wordMatchingMode"] = word_matching_mode return await request( "pdf/find", params, custom_payload=custom_payload, api_key=api_key ) - pdfco/mcp/models.py:106-160 (helper)The parse_payload method on ConversionParams used to build the payload dict that gets sent to the PDF.co API. It includes url, httpusername, httppassword, pages, password, and other fields from the params object.
def parse_payload(self, async_mode: bool = True): payload = { "async": async_mode, } if self.url: payload["url"] = self.url if self.httpusername: payload["httpusername"] = self.httpusername if self.httppassword: payload["httppassword"] = self.httppassword if self.pages: payload["pages"] = self.pages if self.unwrap: payload["unwrap"] = self.unwrap if self.rect: payload["rect"] = self.rect if self.lang: payload["lang"] = self.lang if self.line_grouping: payload["lineGrouping"] = self.line_grouping if self.password: payload["password"] = self.password if self.name: payload["name"] = self.name if self.autosize: payload["autosize"] = self.autosize if self.html: payload["html"] = self.html if self.templateId: payload["templateId"] = self.templateId if self.templateData: payload["templateData"] = self.templateData if self.margins: payload["margins"] = self.margins if self.paperSize: payload["paperSize"] = self.paperSize if self.orientation: payload["orientation"] = self.orientation if self.printBackground: payload["printBackground"] = self.printBackground if self.mediaType: payload["mediaType"] = self.mediaType if self.DoNotWaitFullLoad: payload["DoNotWaitFullLoad"] = self.DoNotWaitFullLoad if self.header: payload["header"] = self.header if self.footer: payload["footer"] = self.footer if self.worksheetIndex: payload["worksheetIndex"] = self.worksheetIndex return payload - pdfco/mcp/services/pdf.py:125-153 (helper)The request helper function that makes the actual HTTP POST call to the PDF.co API, handles authentication via PDFCoClient, and returns a BaseResponse with the result or error.
async def request( endpoint: str, params: ConversionParams, custom_payload: dict | None = None, api_key: str | None = None, ) -> BaseResponse: payload = params.parse_payload(async_mode=True) if custom_payload: payload.update(custom_payload) try: async with PDFCoClient(api_key=api_key) as client: url = f"/v1/{endpoint}" print(f"Requesting {url} with payload {payload}", file=sys.stderr) response = await client.post(url, json=payload) print(f"response: {response}", file=sys.stderr) json_data = response.json() return BaseResponse( status="working", content=json_data, credits_used=json_data.get("credits"), credits_remaining=json_data.get("remainingCredits"), tips=f"You **should** use the 'wait_job_completion' tool to wait for the job [{json_data.get('jobId')}] to complete if a jobId is present.", ) except Exception as e: return BaseResponse( status="error", content=f"{type(e)}: {[arg for arg in e.args if arg]}", )