Skip to main content
Glama
ibm-ecm

IBM Core Content Services MCP Server

Official
by ibm-ecm

get_document_text_extract

Extract text content from IBM FileNet documents using document ID or path to retrieve annotations for processing and analysis.

Instructions

Retrieves a document's text extract content.

:param identifier: The document id or path (required). This can be either the document's ID (GUID) or its path in the repository (e.g., "/Folder1/document.pdf").

:returns: The text content of the document's text extract annotation. If multiple text extracts are found, they will be concatenated. Returns an empty string if no text extract is found.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
identifierYes

Implementation Reference

  • The handler function for the 'get_document_text_extract' tool. It queries the GraphQL API for document annotations matching the TEXT_EXTRACT_ANNOTATION_CLASS, downloads the text content from each matching annotation's content elements using download_text_async, concatenates them with separators, and returns the full text.
    @mcp.tool(
        name="get_document_text_extract",
    )
    async def get_document_text_extract(identifier: str) -> str:
        """
        Retrieves a document's text extract content.
    
        :param identifier: The document id or path (required). This can be either the document's ID (GUID)
                          or its path in the repository (e.g., "/Folder1/document.pdf").
    
        :returns: The text content of the document's text extract annotation.
                 If multiple text extracts are found, they will be concatenated.
                 Returns an empty string if no text extract is found.
        """
        query = """
        query getDocumentTextExtract($object_store_name: String!, $identifier: String!) {
            document(repositoryIdentifier: $object_store_name, identifier: $identifier) {
                annotations{
                    annotations{
                        id
                        name
                        className
                        annotatedContentElement
                        descriptiveText
                        contentElements{
                            ... on ContentTransfer{
                                downloadUrl
                                retrievalName
                                contentSize
                            }
                        }
                    }
                }
            }
        }
        """
    
        variables = {
            "identifier": identifier,
            "object_store_name": graphql_client.object_store,
        }
    
        # First run execute_async and wait for the result
        result = await graphql_client.execute_async(query=query, variables=variables)
    
        # Initialize an empty string to store all text content
        all_text_content = ""
    
        # Check if we have a valid result with annotations
        if (
            result
            and "data" in result
            and result["data"]
            and "document" in result["data"]
            and result["data"]["document"]
            and "annotations" in result["data"]["document"]
            and result["data"]["document"]["annotations"]
            and "annotations" in result["data"]["document"]["annotations"]
        ):
            annotations = result["data"]["document"]["annotations"]["annotations"]
    
            # Process each annotation
            for annotation in annotations:
                if (
                    "contentElements" in annotation
                    and annotation["className"] == TEXT_EXTRACT_ANNOTATION_CLASS
                    and annotation["annotatedContentElement"] is not None
                ):
                    # Process each content element
                    for content_element in annotation["contentElements"]:
                        if (
                            "downloadUrl" in content_element
                            and content_element["downloadUrl"]
                        ):
                            # Download the text content using the downloadUrl
                            download_url = content_element["downloadUrl"]
                            text_content = await graphql_client.download_text_async(
                                download_url
                            )
    
                            # Append the text content to our result string
                            if text_content:
                                if all_text_content:
                                    all_text_content += TEXT_EXTRACT_SEPARATOR
                                all_text_content += text_content
    
        return all_text_content
  • The register_server_tools function calls register_document_tools (which registers the get_document_text_extract tool among others) for CORE and FULL server types.
    if server_type == ServerType.CORE:
        register_document_tools(mcp, graphql_client, metadata_cache)
        register_folder_tools(mcp, graphql_client)
        register_class_tools(mcp, graphql_client, metadata_cache)
        register_search_tools(mcp, graphql_client, metadata_cache)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ibm-ecm/ibm-content-services-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server