Skip to main content
Glama
ibm-ecm

IBM Core Content Services MCP Server

Official
by ibm-ecm

get_document_text_extract

Extract text content from IBM FileNet documents using document ID or path to retrieve annotations for processing and analysis.

Instructions

Retrieves a document's text extract content.

:param identifier: The document id or path (required). This can be either the document's ID (GUID) or its path in the repository (e.g., "/Folder1/document.pdf").

:returns: The text content of the document's text extract annotation. If multiple text extracts are found, they will be concatenated. Returns an empty string if no text extract is found.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
identifierYes

Implementation Reference

  • The handler function for the 'get_document_text_extract' tool. It queries the GraphQL API for document annotations matching the TEXT_EXTRACT_ANNOTATION_CLASS, downloads the text content from each matching annotation's content elements using download_text_async, concatenates them with separators, and returns the full text.
    @mcp.tool( name="get_document_text_extract", ) async def get_document_text_extract(identifier: str) -> str: """ Retrieves a document's text extract content. :param identifier: The document id or path (required). This can be either the document's ID (GUID) or its path in the repository (e.g., "/Folder1/document.pdf"). :returns: The text content of the document's text extract annotation. If multiple text extracts are found, they will be concatenated. Returns an empty string if no text extract is found. """ query = """ query getDocumentTextExtract($object_store_name: String!, $identifier: String!) { document(repositoryIdentifier: $object_store_name, identifier: $identifier) { annotations{ annotations{ id name className annotatedContentElement descriptiveText contentElements{ ... on ContentTransfer{ downloadUrl retrievalName contentSize } } } } } } """ variables = { "identifier": identifier, "object_store_name": graphql_client.object_store, } # First run execute_async and wait for the result result = await graphql_client.execute_async(query=query, variables=variables) # Initialize an empty string to store all text content all_text_content = "" # Check if we have a valid result with annotations if ( result and "data" in result and result["data"] and "document" in result["data"] and result["data"]["document"] and "annotations" in result["data"]["document"] and result["data"]["document"]["annotations"] and "annotations" in result["data"]["document"]["annotations"] ): annotations = result["data"]["document"]["annotations"]["annotations"] # Process each annotation for annotation in annotations: if ( "contentElements" in annotation and annotation["className"] == TEXT_EXTRACT_ANNOTATION_CLASS and annotation["annotatedContentElement"] is not None ): # Process each content element for content_element in annotation["contentElements"]: if ( "downloadUrl" in content_element and content_element["downloadUrl"] ): # Download the text content using the downloadUrl download_url = content_element["downloadUrl"] text_content = await graphql_client.download_text_async( download_url ) # Append the text content to our result string if text_content: if all_text_content: all_text_content += TEXT_EXTRACT_SEPARATOR all_text_content += text_content return all_text_content
  • The register_server_tools function calls register_document_tools (which registers the get_document_text_extract tool among others) for CORE and FULL server types.
    if server_type == ServerType.CORE: register_document_tools(mcp, graphql_client, metadata_cache) register_folder_tools(mcp, graphql_client) register_class_tools(mcp, graphql_client, metadata_cache) register_search_tools(mcp, graphql_client, metadata_cache)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ibm-ecm/ibm-content-services-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server