Cryo MCP Server

Overview Schema Related Servers Score Discussions

lookup_dataset

Retrieve detailed dataset information including required parameters, schema, and example queries to ensure correct usage before querying. Essential for datasets like 'balances' that need specific inputs such as 'address'.

Instructions

Look up a specific dataset and return detailed information about it. IMPORTANT: Always use this
function before querying a new dataset to understand its required parameters and schema.

The returned information includes:
1. Required parameters for the dataset (IMPORTANT for datasets like 'balances' that need an address)
2. Schema details showing available columns and data types
3. Example queries for the dataset

When the dataset requires specific parameters like 'address' (for 'balances'),
ALWAYS use the 'contract' parameter in query_dataset() to pass these values.

Example:
For 'balances' dataset, lookup_dataset('balances') will show it requires an 'address' parameter.
You should then query it using:
query_dataset('balances', blocks='1000:1010', contract='0x1234...')

Args:
    name: The name of the dataset to look up
    sample_start_block: Optional start block for sample data (integer)
    sample_end_block: Optional end block for sample data (integer)
    use_latest_sample: If True, use the latest block for sample data
    sample_blocks_from_latest: Number of blocks before the latest to include in sample
    
Returns:
    Detailed information about the dataset including schema and available fields

Input Schema

TableJSON Schema

Name	Required	Description	Default
`name`	Yes
`sample_blocks_from_latest`	No
`sample_end_block`	No
`sample_start_block`	No
`use_latest_sample`	No

Implementation Reference

cryo_mcp/server.py:322-477 (handler)

The primary handler function for the 'lookup_dataset' tool. It is decorated with @mcp.tool() for registration in the MCP server. The function fetches dataset information using get_dataset_info, retrieves schema via cryo's --dry-run, and optionally generates sample data by querying a small block range.

@mcp.tool()
def lookup_dataset(
    name: str,
    sample_start_block: Optional[int] = None,
    sample_end_block: Optional[int] = None,
    use_latest_sample: bool = False,
    sample_blocks_from_latest: Optional[int] = None
) -> Dict[str, Any]:
    """
    Look up a specific dataset and return detailed information about it. IMPORTANT: Always use this
    function before querying a new dataset to understand its required parameters and schema.
    
    The returned information includes:
    1. Required parameters for the dataset (IMPORTANT for datasets like 'balances' that need an address)
    2. Schema details showing available columns and data types
    3. Example queries for the dataset
    
    When the dataset requires specific parameters like 'address' (for 'balances'),
    ALWAYS use the 'contract' parameter in query_dataset() to pass these values.
    
    Example:
    For 'balances' dataset, lookup_dataset('balances') will show it requires an 'address' parameter.
    You should then query it using:
    query_dataset('balances', blocks='1000:1010', contract='0x1234...')
    
    Args:
        name: The name of the dataset to look up
        sample_start_block: Optional start block for sample data (integer)
        sample_end_block: Optional end block for sample data (integer)
        use_latest_sample: If True, use the latest block for sample data
        sample_blocks_from_latest: Number of blocks before the latest to include in sample
        
    Returns:
        Detailed information about the dataset including schema and available fields
    """
    # Get basic dataset info
    info = get_dataset_info(name)
    
    # Ensure we have the RPC URL
    rpc_url = os.environ.get("ETH_RPC_URL", DEFAULT_RPC_URL)
    
    # Get schema information by running the dataset with --dry-run
    schema_result = subprocess.run(
        ["cryo", name, "--dry-run", "-r", rpc_url],
        capture_output=True,
        text=True
    )
    
    if schema_result.returncode == 0:
        info["schema"] = schema_result.stdout
    else:
        info["schema_error"] = schema_result.stderr
    
    # Try to get a sample of the dataset (first 5 records)
    try:
        data_dir = Path(os.environ.get("CRYO_DATA_DIR", DEFAULT_DATA_DIR))
        
        # Determine block range for sample (priority: latest > specified blocks > default)
        if use_latest_sample or sample_blocks_from_latest is not None:
            # Get the latest block number
            latest_block = get_latest_block_number()
            
            if latest_block is None:
                info["sample_error"] = "Failed to get the latest block number from the RPC endpoint"
                return info
            
            if sample_blocks_from_latest is not None:
                # Use a range of blocks from latest-n to latest
                block_range = f"{latest_block - sample_blocks_from_latest}:{latest_block+1}"
            else:
                # Just the latest 5 blocks
                block_range = f"{latest_block-4}:{latest_block+1}"
            
            info["sample_block_range"] = block_range
            
            # Use the latest directory for latest block samples
            sample_dir = data_dir / "latest"
            sample_dir.mkdir(parents=True, exist_ok=True)
            
            # Clean up the latest directory before new query
            print("Cleaning latest directory for current sample")
            existing_files = list(sample_dir.glob(f"*{name}*.*"))
            for file in existing_files:
                try:
                    file.unlink()
                    print(f"Removed existing sample file: {file}")
                except Exception as e:
                    print(f"Warning: Could not remove sample file {file}: {e}")
        else:
            # For historical blocks, get the start block and end block
            if sample_start_block is not None:
                if sample_end_block is not None:
                    # Note: cryo uses [start:end) range (inclusive start, exclusive end)
                    # Add 1 to end_block to include it in the range
                    block_range = f"{sample_start_block}:{sample_end_block+1}"
                else:
                    # Use start block and get 5 blocks
                    block_range = f"{sample_start_block}:{sample_start_block+5}"
            else:
                # Default to a known good block range
                block_range = "1000:1005"
            
            # For historical samples, use the main data directory
            sample_dir = data_dir
            sample_dir.mkdir(parents=True, exist_ok=True)
                
        # Use the block range for the sample
        sample_cmd = [
            "cryo", name, 
            "-b", block_range,
            "-r", rpc_url,
            "--json", 
            "-o", str(sample_dir)
        ]
        
        print(f"Running sample command: {' '.join(sample_cmd)}")
        sample_result = subprocess.run(
            sample_cmd,
            capture_output=True,
            text=True,
            timeout=30  # Add timeout to prevent hanging
        )
        
        if sample_result.returncode == 0:
            # Try to find the report file which contains info about generated files
            report_dir = sample_dir / ".cryo" / "reports"
            if report_dir.exists():
                # Get the most recent report file
                report_files = sorted(report_dir.glob("*.json"), key=lambda x: x.stat().st_mtime, reverse=True)
                if report_files:
                    with open(report_files[0], 'r') as f:
                        report_data = json.load(f)
                        # Get the list of completed files from the report
                        if "results" in report_data and "completed_paths" in report_data["results"]:
                            completed_files = report_data["results"]["completed_paths"]
                            print(f"Found {len(completed_files)} files in Cryo report: {completed_files}")
                            info["sample_files"] = completed_files
                            return info
            
            # Fallback to glob search if report file not found
            output_files = list(sample_dir.glob(f"*{name}*.json"))
            print(f"Output files found via glob: {output_files}")
            
            if output_files:
                # Convert Path objects to strings for JSON serialization
                file_paths = [str(file_path) for file_path in output_files]
                info["sample_files"] = file_paths
            else:
                info["sample_error"] = "No output files generated"
        else:
            info["sample_error"] = sample_result.stderr
            info["sample_stdout"] = sample_result.stdout  # Include stdout for debugging
    except (subprocess.TimeoutExpired, Exception) as e:
        info["sample_error"] = str(e)
    
    return info

cryo_mcp/server.py:274-320 (helper)

Helper resource function called by lookup_dataset to get basic dataset information, description, and example queries via cryo's help command.

@mcp.resource("dataset://{name}")
def get_dataset_info(name: str) -> Dict[str, Any]:
    """Get information about a specific dataset"""
    # Ensure we have the RPC URL
    rpc_url = os.environ.get("ETH_RPC_URL", DEFAULT_RPC_URL)
    
    result = subprocess.run(
        ["cryo", "help", name, "-r", rpc_url],
        capture_output=True,
        text=True
    )

    # Get the latest block number for examples
    latest_block = get_latest_block_number()
    latest_example = ""
    
    if latest_block:
        latest_example = f"query_dataset('{name}', blocks_from_latest=10)  # Gets latest-10 to latest blocks"
    
    # Add special examples for datasets requiring address parameters
    address_example = ""
    if "address" in result.stdout.lower() and "required parameters: address" in result.stdout.lower():
        address_example = f"query_dataset('{name}', blocks='1000:1010', contract='0x123...')  # Use contract parameter for address"
    
    return {
        "name": name,
        "description": result.stdout,
        "example_queries": [
            f"query_dataset('{name}', blocks='1000:1010')",
            f"query_dataset('{name}', start_block=1000, end_block=1009)",
            f"query_dataset('{name}', use_latest=True)  # Gets just the latest block",
            latest_example,
            address_example
        ] if address_example else [
            f"query_dataset('{name}', blocks='1000:1010')",
            f"query_dataset('{name}', start_block=1000, end_block=1009)",
            f"query_dataset('{name}', use_latest=True)  # Gets just the latest block",
            latest_example
        ],
        "notes": [
            "Block ranges are inclusive for start_block and end_block when using integer parameters.",
            "Use 'use_latest=True' to query only the latest block.",
            "Use 'blocks_from_latest=N' to query the latest N blocks.",
            "IMPORTANT: For datasets requiring an 'address' parameter (like 'balances'), use the 'contract' parameter.",
            "Always check the required parameters in the dataset description and use lookup_dataset() first."
        ]
    }

cryo_mcp/server.py:322-322 (registration)
The @mcp.tool() decorator registers the lookup_dataset function as an MCP tool.
```
@mcp.tool()
```

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by explaining the tool's behavior: it's a read-only lookup (implied by 'look up' and 'return information'), it provides schema details and required parameters, and it includes sample data generation capabilities. However, it doesn't mention rate limits, authentication needs, or error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded with the core purpose and important usage guideline. Every sentence adds value, though the example section is somewhat lengthy. The structure flows logically from purpose to usage to parameters to returns.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 5 parameters, 0% schema coverage, no annotations, and no output schema, the description does an excellent job of explaining what the tool does, when to use it, and what parameters mean. The main gap is lack of information about return format details, though it describes what information will be included.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 0% description coverage, but the description fully compensates by explaining all 5 parameters in detail. It clarifies that 'name' is the dataset identifier, and the other 4 parameters control sample data generation (with specific examples like 'sample_blocks_from_latest'). This adds substantial meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('look up', 'return detailed information') and resource ('dataset'), distinguishing it from siblings like list_datasets (which lists datasets) or query_dataset (which queries data). It explicitly explains this is for understanding dataset parameters and schema before querying.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('Always use this function before querying a new dataset') and when to use alternatives (query_dataset for actual queries). It includes a concrete example showing the workflow between lookup_dataset and query_dataset, making the usage context clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

list_columns
@honeycombio/honeycomb-mcp
list_datasets
@honeycombio/honeycomb-mcp
paradex_filters_model
@Habinar/mcp-paradex-py
honeycomb_columns_listC
@kajirita2002/honeycomb-mcp-server
get_api_info
@nyudenkov/openapi-mcp-proxy
get_documentationA
@wysh3/perplexity-mcp-zerver

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/z80dev/cryo-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server