Cryo MCP Server

Overview Schema Related Servers Score Discussions

query_dataset

Download blockchain datasets like transactions or logs using specified block ranges and contract addresses. Returns file paths for use in SQL queries or further processing on the Cryo MCP Server.

Instructions

Download blockchain data and return the file paths where the data is stored.

IMPORTANT WORKFLOW NOTE: When running SQL queries, use this function first to download
data, then use the returned file paths with query_sql() to execute SQL on those files.

Example workflow for SQL:
1. First download data: result = query_dataset('transactions', blocks='1000:1010', output_format='parquet')
2. Get file paths: files = result.get('files', [])
3. Run SQL query: query_sql("SELECT * FROM read_parquet('/path/to/file.parquet')", files=files)

DATASET-SPECIFIC PARAMETERS:
For datasets that require specific address parameters (like 'balances', 'erc20_transfers', etc.),
ALWAYS use the 'contract' parameter to pass ANY Ethereum address. For example:

- For 'balances' dataset: Use contract parameter for the address you want balances for
  query_dataset('balances', blocks='1000:1010', contract='0x123...')

- For 'logs' or 'erc20_transfers': Use contract parameter for contract address
  query_dataset('logs', blocks='1000:1010', contract='0x123...')

To check what parameters a dataset requires, always use lookup_dataset() first:
lookup_dataset('balances')  # Will show required parameters

Args:
    dataset: The name of the dataset to query (e.g., 'logs', 'transactions', 'balances')
    blocks: Block range specification as a string (e.g., '1000:1010')
    start_block: Start block number as integer (alternative to blocks)
    end_block: End block number as integer (alternative to blocks)
    use_latest: If True, query the latest block
    blocks_from_latest: Number of blocks before the latest to include (e.g., 10 = latest-10 to latest)
    contract: Contract address to filter by - IMPORTANT: Use this parameter for ALL address-based filtering
      regardless of the parameter name in the native cryo command (address, contract, etc.)
    output_format: Output format (json, csv, parquet) - use 'parquet' for SQL queries
    include_columns: Columns to include alongside the defaults
    exclude_columns: Columns to exclude from the defaults

Returns:
    Dictionary containing file paths where the downloaded data is stored

Input Schema

TableJSON Schema

Name	Required	Default
`blocks`	No
`blocks_from_latest`	No
`contract`	No
`dataset`	Yes
`end_block`	No
`exclude_columns`	No
`include_columns`	No
`output_format`	No	json
`start_block`	No
`use_latest`	No

Implementation Reference

cryo_mcp/server.py:80-272 (handler)

The query_dataset tool handler function, registered via @mcp.tool() decorator. Executes cryo CLI to download blockchain dataset data for specified parameters and returns file paths to the generated files.

@mcp.tool()
def query_dataset(
    dataset: str,
    blocks: Optional[str] = None,
    start_block: Optional[int] = None,
    end_block: Optional[int] = None,
    use_latest: bool = False,
    blocks_from_latest: Optional[int] = None,
    contract: Optional[str] = None,
    output_format: str = "json",
    include_columns: Optional[List[str]] = None,
    exclude_columns: Optional[List[str]] = None
) -> Dict[str, Any]:
    """
    Download blockchain data and return the file paths where the data is stored.
    
    IMPORTANT WORKFLOW NOTE: When running SQL queries, use this function first to download
    data, then use the returned file paths with query_sql() to execute SQL on those files.
    
    Example workflow for SQL:
    1. First download data: result = query_dataset('transactions', blocks='1000:1010', output_format='parquet')
    2. Get file paths: files = result.get('files', [])
    3. Run SQL query: query_sql("SELECT * FROM read_parquet('/path/to/file.parquet')", files=files)

    DATASET-SPECIFIC PARAMETERS:
    For datasets that require specific address parameters (like 'balances', 'erc20_transfers', etc.),
    ALWAYS use the 'contract' parameter to pass ANY Ethereum address. For example:
    
    - For 'balances' dataset: Use contract parameter for the address you want balances for
      query_dataset('balances', blocks='1000:1010', contract='0x123...')
    
    - For 'logs' or 'erc20_transfers': Use contract parameter for contract address
      query_dataset('logs', blocks='1000:1010', contract='0x123...')
    
    To check what parameters a dataset requires, always use lookup_dataset() first:
    lookup_dataset('balances')  # Will show required parameters

    Args:
        dataset: The name of the dataset to query (e.g., 'logs', 'transactions', 'balances')
        blocks: Block range specification as a string (e.g., '1000:1010')
        start_block: Start block number as integer (alternative to blocks)
        end_block: End block number as integer (alternative to blocks)
        use_latest: If True, query the latest block
        blocks_from_latest: Number of blocks before the latest to include (e.g., 10 = latest-10 to latest)
        contract: Contract address to filter by - IMPORTANT: Use this parameter for ALL address-based filtering
          regardless of the parameter name in the native cryo command (address, contract, etc.)
        output_format: Output format (json, csv, parquet) - use 'parquet' for SQL queries
        include_columns: Columns to include alongside the defaults
        exclude_columns: Columns to exclude from the defaults

    Returns:
        Dictionary containing file paths where the downloaded data is stored
    """
    # Ensure we have the RPC URL
    rpc_url = os.environ.get("ETH_RPC_URL", DEFAULT_RPC_URL)
    
    # Build the cryo command
    cmd = ["cryo", dataset, "-r", rpc_url]

    # Handle block range (priority: blocks > use_latest > start/end_block > default)
    if blocks:
        # Use specified block range string directly
        cmd.extend(["-b", blocks])
    elif use_latest or blocks_from_latest is not None:
        # Get the latest block number
        latest_block = get_latest_block_number()
        
        if latest_block is None:
            return {"error": "Failed to get the latest block number from the RPC endpoint"}
        
        if blocks_from_latest is not None:
            # Use a range of blocks up to the latest
            start = latest_block - blocks_from_latest
            block_range = f"{start}:{latest_block+1}"  # +1 to make it inclusive
        else:
            # Just the latest block
            block_range = f"{latest_block}:{latest_block+1}"  # +1 to make it inclusive
        
        print(f"Using latest block range: {block_range}")
        cmd.extend(["-b", block_range])
    elif start_block is not None:
        # Convert integer block numbers to string range
        if end_block is not None:
            # Note: cryo uses [start:end) range (inclusive start, exclusive end)
            # Add 1 to end_block to include it in the range
            block_range = f"{start_block}:{end_block+1}"
        else:
            # If only start_block is provided, get 10 blocks starting from there
            block_range = f"{start_block}:{start_block+10}"
        
        print(f"Using block range: {block_range}")
        cmd.extend(["-b", block_range])
    else:
        # Default to a reasonable block range if none specified
        cmd.extend(["-b", "1000:1010"])

    # Handle dataset-specific address parameters
    # For all address-based filters, we use the contract parameter
    # but map it to the correct flag based on the dataset
    if contract:
        # Check if this is a dataset that requires a different parameter name
        if dataset == 'balances':
            # For balances dataset, contract parameter maps to --address
            cmd.extend(["--address", contract])
        else:
            # For other datasets like logs, transactions, etc. use --contract
            cmd.extend(["--contract", contract])

    if output_format == "json":
        cmd.append("--json")
    elif output_format == "csv":
        cmd.append("--csv")

    if include_columns:
        cmd.append("--include-columns")
        cmd.extend(include_columns)

    if exclude_columns:
        cmd.append("--exclude-columns")
        cmd.extend(exclude_columns)

    # Get the base data directory
    data_dir = Path(os.environ.get("CRYO_DATA_DIR", DEFAULT_DATA_DIR))
    
    # Choose output directory based on whether we're querying latest blocks
    if use_latest or blocks_from_latest is not None:
        output_dir = data_dir / "latest"
        output_dir.mkdir(parents=True, exist_ok=True)
        
        # Clean up the latest directory before new query
        print("Cleaning latest directory for current block query")
        existing_files = list(output_dir.glob(f"*{dataset}*.*"))
        for file in existing_files:
            try:
                file.unlink()
                print(f"Removed existing file: {file}")
            except Exception as e:
                print(f"Warning: Could not remove file {file}: {e}")
    else:
        # For historical queries, use the main data directory
        output_dir = data_dir
        output_dir.mkdir(parents=True, exist_ok=True)

    cmd.extend(["-o", str(output_dir)])

    # Print the command for debugging
    print(f"Running query command: {' '.join(cmd)}")
    
    # Execute the command
    result = subprocess.run(cmd, capture_output=True, text=True)

    if result.returncode != 0:
        return {
            "error": result.stderr,
            "stdout": result.stdout,
            "command": " ".join(cmd)
        }

    # Try to find the report file which contains info about generated files
    report_dir = output_dir / ".cryo" / "reports"
    if report_dir.exists():
        # Get the most recent report file (should be the one we just created)
        report_files = sorted(report_dir.glob("*.json"), key=lambda x: x.stat().st_mtime, reverse=True)
        if report_files:
            with open(report_files[0], 'r') as f:
                report_data = json.load(f)
                # Get the list of completed files from the report
                if "results" in report_data and "completed_paths" in report_data["results"]:
                    completed_files = report_data["results"]["completed_paths"]
                    print(f"Found {len(completed_files)} files in Cryo report: {completed_files}")
                    
                    # Return the list of files and their count
                    return {
                        "files": completed_files,
                        "count": len(completed_files),
                        "format": output_format
                    }
    
    # Fallback to glob search if report file not found or doesn't contain the expected data
    output_files = list(output_dir.glob(f"*{dataset}*.{output_format}"))
    print(f"Output files found via glob: {output_files}")

    if not output_files:
        return {"error": "No output files generated", "command": " ".join(cmd)}

    # Convert Path objects to strings for JSON serialization
    file_paths = [str(file_path) for file_path in output_files]
    
    return {
        "files": file_paths,
        "count": len(file_paths),
        "format": output_format
    }

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the tool's behavior: it downloads data to files, returns a dictionary of file paths, and integrates with query_sql. It mentions dataset-specific requirements and workflow dependencies, though it doesn't cover potential errors, rate limits, or file storage details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (workflow note, dataset-specific parameters, Args, Returns) and uses bullet points for readability. It's appropriately sized for a complex tool but could be slightly more concise by integrating the example workflow more tightly with the parameter explanations.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 10 parameters, 0% schema coverage, no annotations, and no output schema, the description provides comprehensive context. It explains the tool's role in a larger workflow, details all parameters, and describes the return value. The main gap is lack of error handling or performance considerations, but it's largely complete given the constraints.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Given 0% schema description coverage, the description compensates fully by explaining all 10 parameters in detail. It clarifies dataset-specific usage (e.g., contract parameter for address filtering), provides examples for blocks and output_format, and explains parameter relationships (e.g., blocks vs. start_block/end_block). This adds significant meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Download blockchain data and return the file paths where the data is stored.' It specifies the verb ('download'), resource ('blockchain data'), and output ('file paths'), distinguishing it from siblings like query_sql or list_datasets that don't download data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool vs. alternatives: 'When running SQL queries, use this function first to download data, then use the returned file paths with query_sql() to execute SQL on those files.' It also advises to use lookup_dataset() first to check dataset parameters, offering clear workflow instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

query_sqlA
@z80dev/cryo-mcp
fetchTransfers
@alchemyplatform/alchemy-mcp-server
get_block_info
@blockscout/mcp-server
query_blockchain_sqlA
@z80dev/cryo-mcp
getAddressLogs
@NaniDAO/agentek
get_eventsC
@Bankless/onchain-mcp

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/z80dev/cryo-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server