Skip to main content
Glama
biocontext-ai

BioContextAI Knowledgebase MCP

Official

bc_get_string_similarity_scores

Calculate protein similarity scores using STRING database Smith-Waterman bit scores to assess homology between protein pairs for biomedical research.

Instructions

Retrieve protein homology similarity scores from STRING database based on Smith-Waterman bit scores. Only scores above 50 reported.

Returns: list or dict: Similarity scores array with stringId_A, stringId_B, bitscore or error message.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
protein_symbolYesFirst protein symbol (e.g., 'TP53')
protein_symbol_comparisonYesSecond protein symbol (e.g., 'MKI67')
speciesNoSpecies taxonomy ID (e.g., '9606' for human)

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The handler function for the tool (likely 'bc_get_string_similarity_scores' under 'BC' MCP server). Includes schema via Pydantic Annotated fields, registration via @core_mcp.tool(), and core logic: resolves symbols to STRING IDs using get_string_id tool, queries STRING homology API for bitscores.
    @core_mcp.tool()
    def get_string_similarity_scores(
        protein_symbol: Annotated[str, Field(description="First protein symbol (e.g., 'TP53')")],
        protein_symbol_comparison: Annotated[str, Field(description="Second protein symbol (e.g., 'MKI67')")],
        species: Annotated[str, Field(description="Species taxonomy ID (e.g., '9606' for human)")] = "",
    ) -> Union[List[Dict[str, Any]], dict]:
        """Retrieve protein homology similarity scores from STRING database based on Smith-Waterman bit scores. Only scores above 50 reported.
    
        Returns:
            list or dict: Similarity scores array with stringId_A, stringId_B, bitscore or error message.
        """
        # Resolve both protein symbols to STRING IDs
        try:
            string_id1 = get_string_id.fn(protein_symbol=protein_symbol, species=species)
            string_id2 = get_string_id.fn(protein_symbol=protein_symbol_comparison, species=species)
    
            if not all(isinstance(string_id, str) for string_id in [string_id1, string_id2]):
                return {"error": "Could not extract STRING IDs"}
    
            identifiers = f"{string_id1}%0d{string_id2}"
    
            url = f"https://string-db.org/api/json/homology?identifiers={identifiers}"
            if species:
                url += f"&species={species}"
    
            response = requests.get(url)
            response.raise_for_status()
    
            return response.json()
        except requests.exceptions.RequestException as e:
            return {"error": f"Failed to fetch similarity scores: {e!s}"}
        except Exception as e:
            return {"error": f"An error occurred: {e!s}"}
  • Import statement that loads the stringdb module, executing the @tool decorators to register get_string_similarity_scores to core_mcp.
    from .stringdb import *
  • Definition of the core_mcp FastMCP server instance (named 'BC') where all tools including get_string_similarity_scores are registered.
    core_mcp = FastMCP(  # type: ignore
        "BC",
        instructions="Provides access to biomedical knowledge bases.",
    )
  • Module __init__.py that re-exports the get_string_similarity_scores function (and others) for convenient import.
    from ._get_string_id import get_string_id
    from ._get_string_interactions import get_string_interactions
    from ._get_string_network_image import get_string_network_image
    from ._get_string_similarity_scores import get_string_similarity_scores
    
    __all__ = [
        "get_string_id",
        "get_string_interactions",
        "get_string_network_image",
        "get_string_similarity_scores",
    ]
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses key behavioral traits: the data source (STRING database), the scoring method (Smith-Waterman bit scores), and a filtering threshold (scores above 50). However, it doesn't mention error handling, rate limits, authentication needs, or whether this is a read-only operation, leaving gaps for a tool with no annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized with two sentences: one for the core functionality and one for the return format. It's front-loaded with the main purpose. The return statement could be slightly more concise, but overall, there's minimal waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (a bioinformatics tool with specific scoring), 100% schema coverage, and an output schema (implied by 'Returns'), the description is fairly complete. It covers the what, how, and filtering threshold. However, with no annotations, it could benefit from more behavioral context like error cases or performance notes.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema (e.g., it doesn't explain relationships between parameters or provide additional examples). This meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Retrieve protein homology similarity scores'), resource ('from STRING database'), and method ('based on Smith-Waterman bit scores'). It distinguishes from sibling tools like bc_get_string_interactions or bc_get_string_id by focusing specifically on similarity scores rather than interactions or ID mapping.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by mentioning 'Only scores above 50 reported,' which suggests a filtering threshold. However, it doesn't explicitly state when to use this tool versus alternatives like bc_get_string_interactions or provide clear exclusion criteria. The guidance is functional but not comprehensive.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/biocontext-ai/knowledgebase-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server