get_document_content
Extracts text content from parliamentary documents using document ID, supporting PDF and Word formats. Enables analysis, summarization, and direct reference by retrieving content in paginated chunks for full document access.
Instructions
Downloads a parliamentary document and extracts its text content for use in the conversation. This tool retrieves the actual content of a document based on its ID, making it available for analysis, summarization, or direct reference in the conversation. The text is extracted from PDF or Word (DOCX) documents using professional libraries and returned in a readable format.
IMPORTANT: For longer documents, the content may be truncated. The response includes pagination information to help you retrieve the complete document:
- isTruncated: Indicates whether there is more content available
- totalLength: The total length of the document content
- currentOffset: The starting position of the current content chunk
- nextOffset: The starting position for the next content chunk (use this as the 'offset' parameter in your next call)
- remainingLength: The amount of content remaining after the current chunk
To retrieve the complete document, you can make multiple calls to this tool, incrementing the offset each time:
Example usage:
- First call: get_document_content({docId: '2025D18220'})
- If the response shows isTruncated=true, call again with the nextOffset value: get_document_content({docId: '2025D18220', offset: 8000})
- Continue until isTruncated=false or you've retrieved all the content you need.
This pagination approach allows you to analyze even very long documents within the conversation context.
Use this tool when you need to analyze or discuss the specific content of a document rather than just its metadata.
Input Schema
Name | Required | Description | Default |
---|---|---|---|
docId | Yes | Document ID (e.g., '2024D39058') - the unique identifier for the parliamentary document you want to download and extract text from | |
offset | No | Optional starting position for text extraction (default: 0). Use this to retrieve additional content from a truncated document by setting it to the 'nextOffset' value from a previous response. |