Core Content Services MCP Server

Official

Overview Schema Related Servers Score Discussions

get_document_text_extract

Retrieve the text extract content of a document using its ID or repository path. Returns concatenated text or empty string if none found.

Instructions

Retrieves a document's text extract content.

:param identifier: The document id or path (required). This can be either the document's ID (GUID) or its path in the repository (e.g., "/Folder1/document.pdf").

:returns: The text content of the document's text extract annotation. If multiple text extracts are found, they will be concatenated. Returns an empty string if no text extract is found.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`identifier`	Yes

Output Schema

TableJSON Schema

Name	Required	Description	Default
`result`	Yes

Implementation Reference

src/cs_mcp_server/tools/documents.py:108-124 (registration)

Registration of the 'get_document_text_extract' tool via @mcp.tool decorator. The function get_document_text_extract(identifier) delegates to the helper function in cs_mcp_server/utils/utils.py.

@mcp.tool(
    name="get_document_text_extract",
)
async def get_document_text_extract(identifier: str) -> str:
    """
    Retrieves a document's text extract content.

    :param identifier: The document id or path (required). This can be either the document's ID (GUID)
                      or its path in the repository (e.g., "/Folder1/document.pdf").

    :returns: The text content of the document's text extract annotation.
             If multiple text extracts are found, they will be concatenated.
             Returns an empty string if no text extract is found.
    """
    return await get_document_text_extract_content(
        graphql_client=graphql_client, identifier=identifier
    )

src/cs_mcp_server/utils/utils.py:92-178 (handler)

The actual implementation/helper function 'get_document_text_extract_content'. It executes a GraphQL query to fetch document annotations, filters for text extract annotations (by className), downloads text content from each content element's downloadUrl, and concatenates them with a separator.

async def get_document_text_extract_content(
    graphql_client: GraphQLClient, identifier: str
) -> str:
    """
    Retrieves a document's text extract content.

    This utility function queries the document's annotations, filters for text extract
    annotations, and downloads the text content from each annotation's content elements.

    :param graphql_client: GraphQL client instance
    :param identifier: The document id or path (GUID or repository path)
    :returns: The concatenated text content from all text extract annotations.
             Returns empty string if no text extract is found.
    """
    query = """
    query getDocumentTextExtract($object_store_name: String!, $identifier: String!) {
        document(repositoryIdentifier: $object_store_name, identifier: $identifier) {
            annotations{
                annotations{
                    id
                    name
                    className
                    annotatedContentElement
                    descriptiveText
                    contentElements{
                        ... on ContentTransfer{
                            downloadUrl
                            retrievalName
                            contentSize
                        }
                    }
                }
            }
        }
    }
    """

    variables = {
        "identifier": identifier,
        "object_store_name": graphql_client.object_store,
    }

    # Execute query
    result = await graphql_client.execute_async(query=query, variables=variables)

    # Initialize empty string for text content
    all_text_content = ""

    # Check if we have valid result with annotations
    if (
        result
        and "data" in result
        and result["data"]
        and "document" in result["data"]
        and result["data"]["document"]
        and "annotations" in result["data"]["document"]
        and result["data"]["document"]["annotations"]
        and "annotations" in result["data"]["document"]["annotations"]
    ):
        annotations = result["data"]["document"]["annotations"]["annotations"]

        # Process each annotation
        for annotation in annotations:
            if (
                "contentElements" in annotation
                and annotation["className"] == TEXT_EXTRACT_ANNOTATION_CLASS
                and annotation["annotatedContentElement"] is not None
            ):
                # Process each content element
                for content_element in annotation["contentElements"]:
                    if (
                        "downloadUrl" in content_element
                        and content_element["downloadUrl"]
                    ):
                        # Download the text content
                        download_url = content_element["downloadUrl"]
                        text_content = await graphql_client.download_text_async(
                            download_url
                        )

                        # Append text content with separator
                        if text_content:
                            if all_text_content:
                                all_text_content += TEXT_EXTRACT_SEPARATOR
                            all_text_content += text_content

    return all_text_content

src/cs_mcp_server/resources/documents.py:38-116 (helper)

A separate helper function '_fetch_text_extract_by_identifier' in the resources module that performs nearly identical logic for fetching text extract content (used as a resource, not a tool).

async def _fetch_text_extract_by_identifier(
    graphql_client: GraphQLClient, identifier: str
) -> str:
    """Fetch text extract content for a document by ID or path."""

    query = """
    query getDocumentTextExtract($object_store_name: String!, $identifier: String!) {
        document(repositoryIdentifier: $object_store_name, identifier: $identifier) {
            annotations{
                annotations{
                    id
                    name
                    className
                    annotatedContentElement
                    descriptiveText
                    contentElements{
                        ... on ContentTransfer{
                            downloadUrl
                            retrievalName
                            contentSize
                        }
                    }
                }
            }
        }
    }
    """

    variables = {
        "identifier": identifier,
        "object_store_name": graphql_client.object_store,
    }

    try:
        result = await graphql_client.execute_async(query=query, variables=variables)

        if "errors" in result:
            logger.error("GraphQL errors in text extract query: %s", result["errors"])

        all_text_content = ""

        if (
            result
            and "data" in result
            and result["data"]
            and "document" in result["data"]
            and result["data"]["document"]
            and "annotations" in result["data"]["document"]
            and result["data"]["document"]["annotations"]
            and "annotations" in result["data"]["document"]["annotations"]
        ):
            annotations = result["data"]["document"]["annotations"]["annotations"]

            for annotation in annotations:
                if (
                    "contentElements" in annotation
                    and annotation["className"] == TEXT_EXTRACT_ANNOTATION_CLASS
                    and annotation["annotatedContentElement"] is not None
                ):
                    for content_element in annotation["contentElements"]:
                        if (
                            "downloadUrl" in content_element
                            and content_element["downloadUrl"]
                        ):
                            download_url = content_element["downloadUrl"]
                            text_content = await graphql_client.download_text_async(
                                download_url
                            )

                            if text_content:
                                if all_text_content:
                                    all_text_content += TEXT_EXTRACT_SEPARATOR
                                all_text_content += text_content

        return all_text_content

    except Exception as e:
        logger.error("Error fetching text extract for %s: %s", identifier, str(e))
        return f"Error retrieving text extract: {str(e)}"

src/cs_mcp_server/tools/property_extraction.py:34-87 (schema)

The property_extraction tool uses the same get_document_text_extract_content helper as part of its workflow, demonstrating reuse of the text extract logic in another tool context.

@mcp.tool(
    name="property_extraction",
)
async def property_extraction(identifier: str) -> Union[dict, ToolError]:
    """
    Use this tool for property extraction workflow when you need to extract property values from a document's content/text.

    This tool first determines the document's class, then fetches the class metadata to identify
    all available properties specific to that document class. It filters out system properties and
    hidden properties.

    It also retrieves a document's text extract content.

    :param identifier: The document id or path (required). This can be either the document's ID (GUID)
                      or its path in the repository (e.g., "/Folder1/document.pdf").

    :returns: A dictionary containing:
             - text_extract: The text content of the document's text extract annotation.
               If multiple text extracts are found, they will be concatenated.
             - properties: A list of property metadata dictionaries with symbolicName, displayName,
               descriptiveText, dataType, and cardinality fields.
    """
    text_extract = await get_document_text_extract_content(
        graphql_client=graphql_client, identifier=identifier
    )

    # First, get the class name of the document
    query = """
    query getDocument($object_store_name: String!, $identifier: String!){
        document(repositoryIdentifier: $object_store_name, identifier: $identifier){
            className
        }
    }
    """

    variables: dict[str, Any] = {
        "identifier": identifier,
        "object_store_name": graphql_client.object_store,
    }

    response = graphql_client.execute(query=query, variables=variables)

    if "errors" in response:
        return response

    class_name = response["data"]["document"]["className"]

    properties = await get_class_specific_property_names(
        graphql_client=graphql_client,
        metadata_cache=metadata_cache,
        class_name=class_name,
    )

    return {"text_extract": text_extract, "properties": properties}

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden. It adds transparency by noting that multiple text extracts are concatenated and that an empty string is returned if none exist. However, it does not disclose other behavioral aspects like access rights or performance implications.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured with clear bullet points for the parameter and return value. Every sentence provides useful information with no redundancy or wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that an output schema exists (not shown), the description appropriately omits detailed return format but still explains concatenation and empty string behavior. For a single-parameter tool, the description covers essential aspects adequately.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, so the description adds crucial meaning by explaining that 'identifier' can be a document ID (GUID) or a path (e.g., '/Folder1/document.pdf'). It also notes the parameter is required, adding value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Retrieves') and the resource ('document's text extract content'), making the purpose unambiguous. It is specific enough to distinguish from sibling tools like get_document_properties or get_document_versions, though it does not explicitly differentiate.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. There is no mention of context, prerequisites, or exclusions, which would help an agent decide appropriately.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ibm-ecm/cs-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server