Skip to main content
Glama
biocontext-ai

BioContextAI Knowledgebase MCP

Official

bc_get_string_similarity_scores

Retrieve protein homology similarity scores between two proteins using Smith-Waterman bit scores from the STRING database. Get only scores above 50.

Instructions

Retrieve protein homology similarity scores from STRING database based on Smith-Waterman bit scores. Only scores above 50 reported.

Returns: list or dict: Similarity scores array with stringId_A, stringId_B, bitscore or error message.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
protein_symbolYesFirst protein symbol (e.g., 'TP53')
protein_symbol_comparisonYesSecond protein symbol (e.g., 'MKI67')
speciesNoSpecies taxonomy ID (e.g., '9606' for human)

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The main handler implementation for the 'bc_get_string_similarity_scores' tool. The function is decorated with @core_mcp.tool() which registers it as an MCP tool. It takes two protein symbols and optional species, resolves them to STRING IDs, and queries the STRING database for homology similarity scores (Smith-Waterman bit scores).
    @core_mcp.tool()
    def get_string_similarity_scores(
        protein_symbol: Annotated[str, Field(description="First protein symbol (e.g., 'TP53')")],
        protein_symbol_comparison: Annotated[str, Field(description="Second protein symbol (e.g., 'MKI67')")],
        species: Annotated[str, Field(description="Species taxonomy ID (e.g., '9606' for human)")] = "",
    ) -> Union[List[Dict[str, Any]], dict]:
        """Retrieve protein homology similarity scores from STRING database based on Smith-Waterman bit scores. Only scores above 50 reported.
    
        Returns:
            list or dict: Similarity scores array with stringId_A, stringId_B, bitscore or error message.
        """
        # Resolve both protein symbols to STRING IDs
        try:
            string_id1 = get_string_id.fn(protein_symbol=protein_symbol, species=species)
            string_id2 = get_string_id.fn(protein_symbol=protein_symbol_comparison, species=species)
    
            if not all(isinstance(string_id, str) for string_id in [string_id1, string_id2]):
                return {"error": "Could not extract STRING IDs"}
    
            identifiers = f"{string_id1}%0d{string_id2}"
    
            url = f"https://string-db.org/api/json/homology?identifiers={identifiers}"
            if species:
                url += f"&species={species}"
    
            response = requests.get(url)
            response.raise_for_status()
    
            return response.json()
        except requests.exceptions.RequestException as e:
            return {"error": f"Failed to fetch similarity scores: {e!s}"}
        except Exception as e:
            return {"error": f"An error occurred: {e!s}"}
  • The FastMCP server instance 'core_mcp' is created here. The @core_mcp.tool() decorator in the handler file registers the tool with this server instance.
    from fastmcp import FastMCP
    
    core_mcp = FastMCP(  # type: ignore
        "BC",
        instructions="Provides access to biomedical knowledge bases.",
    )
  • Pydantic Field annotations define the input schema: protein_symbol (str), protein_symbol_comparison (str), and species (str, optional). The return type is Union[List[Dict[str, Any]], dict].
    def get_string_similarity_scores(
        protein_symbol: Annotated[str, Field(description="First protein symbol (e.g., 'TP53')")],
        protein_symbol_comparison: Annotated[str, Field(description="Second protein symbol (e.g., 'MKI67')")],
        species: Annotated[str, Field(description="Species taxonomy ID (e.g., '9606' for human)")] = "",
    ) -> Union[List[Dict[str, Any]], dict]:
  • Exports get_string_similarity_scores from the stringdb package module, making it accessible via import.
    from ._get_string_id import get_string_id
    from ._get_string_interactions import get_string_interactions
    from ._get_string_network_image import get_string_network_image
    from ._get_string_similarity_scores import get_string_similarity_scores
    
    __all__ = [
        "get_string_id",
        "get_string_interactions",
        "get_string_network_image",
        "get_string_similarity_scores",
    ]
  • The get_string_id helper function that is called by get_string_similarity_scores to resolve protein symbols to STRING IDs before querying homology scores.
    from typing import Annotated, Union
    
    import requests
    from pydantic import Field
    
    from biocontext_kb.core._server import core_mcp
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It discloses the score threshold and return types (list or dict) but omits details like authentication, rate limits, or side effects. Adequate but not comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is very concise: three sentences with no redundant information. First sentence front-loads the core purpose. Every sentence adds value without extraneous text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 3 parameters, 2 required, and an output schema exists, the description covers key aspects: purpose, threshold, and return structure. Minor gaps like missing example usage or format details prevent a perfect score.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for each parameter (e.g., 'First protein symbol (TP53)'). The description adds context about Smith-Waterman bit scores but does not enhance parameter meaning significantly, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Retrieve' and specifies the resource: protein homology similarity scores from STRING database. It distinguishes from sibling tools like bc_get_string_interactions by focusing on Smith-Waterman bit scores with a threshold.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives. The threshold constraint ('Only scores above 50 reported') is mentioned but not tied to usage contexts. Implied purpose but lacks comparative directives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/biocontext-ai/knowledgebase-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server