Skip to main content
Glama

lookup_dataset

Retrieve detailed dataset information including required parameters, schema, and example queries to ensure correct usage before querying. Essential for datasets like 'balances' that need specific inputs such as 'address'.

Instructions

Look up a specific dataset and return detailed information about it. IMPORTANT: Always use this
function before querying a new dataset to understand its required parameters and schema.

The returned information includes:
1. Required parameters for the dataset (IMPORTANT for datasets like 'balances' that need an address)
2. Schema details showing available columns and data types
3. Example queries for the dataset

When the dataset requires specific parameters like 'address' (for 'balances'),
ALWAYS use the 'contract' parameter in query_dataset() to pass these values.

Example:
For 'balances' dataset, lookup_dataset('balances') will show it requires an 'address' parameter.
You should then query it using:
query_dataset('balances', blocks='1000:1010', contract='0x1234...')

Args:
    name: The name of the dataset to look up
    sample_start_block: Optional start block for sample data (integer)
    sample_end_block: Optional end block for sample data (integer)
    use_latest_sample: If True, use the latest block for sample data
    sample_blocks_from_latest: Number of blocks before the latest to include in sample
    
Returns:
    Detailed information about the dataset including schema and available fields

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
nameYes
sample_blocks_from_latestNo
sample_end_blockNo
sample_start_blockNo
use_latest_sampleNo

Implementation Reference

  • The primary handler function for the 'lookup_dataset' tool. It is decorated with @mcp.tool() for registration in the MCP server. The function fetches dataset information using get_dataset_info, retrieves schema via cryo's --dry-run, and optionally generates sample data by querying a small block range.
    @mcp.tool()
    def lookup_dataset(
        name: str,
        sample_start_block: Optional[int] = None,
        sample_end_block: Optional[int] = None,
        use_latest_sample: bool = False,
        sample_blocks_from_latest: Optional[int] = None
    ) -> Dict[str, Any]:
        """
        Look up a specific dataset and return detailed information about it. IMPORTANT: Always use this
        function before querying a new dataset to understand its required parameters and schema.
        
        The returned information includes:
        1. Required parameters for the dataset (IMPORTANT for datasets like 'balances' that need an address)
        2. Schema details showing available columns and data types
        3. Example queries for the dataset
        
        When the dataset requires specific parameters like 'address' (for 'balances'),
        ALWAYS use the 'contract' parameter in query_dataset() to pass these values.
        
        Example:
        For 'balances' dataset, lookup_dataset('balances') will show it requires an 'address' parameter.
        You should then query it using:
        query_dataset('balances', blocks='1000:1010', contract='0x1234...')
        
        Args:
            name: The name of the dataset to look up
            sample_start_block: Optional start block for sample data (integer)
            sample_end_block: Optional end block for sample data (integer)
            use_latest_sample: If True, use the latest block for sample data
            sample_blocks_from_latest: Number of blocks before the latest to include in sample
            
        Returns:
            Detailed information about the dataset including schema and available fields
        """
        # Get basic dataset info
        info = get_dataset_info(name)
        
        # Ensure we have the RPC URL
        rpc_url = os.environ.get("ETH_RPC_URL", DEFAULT_RPC_URL)
        
        # Get schema information by running the dataset with --dry-run
        schema_result = subprocess.run(
            ["cryo", name, "--dry-run", "-r", rpc_url],
            capture_output=True,
            text=True
        )
        
        if schema_result.returncode == 0:
            info["schema"] = schema_result.stdout
        else:
            info["schema_error"] = schema_result.stderr
        
        # Try to get a sample of the dataset (first 5 records)
        try:
            data_dir = Path(os.environ.get("CRYO_DATA_DIR", DEFAULT_DATA_DIR))
            
            # Determine block range for sample (priority: latest > specified blocks > default)
            if use_latest_sample or sample_blocks_from_latest is not None:
                # Get the latest block number
                latest_block = get_latest_block_number()
                
                if latest_block is None:
                    info["sample_error"] = "Failed to get the latest block number from the RPC endpoint"
                    return info
                
                if sample_blocks_from_latest is not None:
                    # Use a range of blocks from latest-n to latest
                    block_range = f"{latest_block - sample_blocks_from_latest}:{latest_block+1}"
                else:
                    # Just the latest 5 blocks
                    block_range = f"{latest_block-4}:{latest_block+1}"
                
                info["sample_block_range"] = block_range
                
                # Use the latest directory for latest block samples
                sample_dir = data_dir / "latest"
                sample_dir.mkdir(parents=True, exist_ok=True)
                
                # Clean up the latest directory before new query
                print("Cleaning latest directory for current sample")
                existing_files = list(sample_dir.glob(f"*{name}*.*"))
                for file in existing_files:
                    try:
                        file.unlink()
                        print(f"Removed existing sample file: {file}")
                    except Exception as e:
                        print(f"Warning: Could not remove sample file {file}: {e}")
            else:
                # For historical blocks, get the start block and end block
                if sample_start_block is not None:
                    if sample_end_block is not None:
                        # Note: cryo uses [start:end) range (inclusive start, exclusive end)
                        # Add 1 to end_block to include it in the range
                        block_range = f"{sample_start_block}:{sample_end_block+1}"
                    else:
                        # Use start block and get 5 blocks
                        block_range = f"{sample_start_block}:{sample_start_block+5}"
                else:
                    # Default to a known good block range
                    block_range = "1000:1005"
                
                # For historical samples, use the main data directory
                sample_dir = data_dir
                sample_dir.mkdir(parents=True, exist_ok=True)
                    
            # Use the block range for the sample
            sample_cmd = [
                "cryo", name, 
                "-b", block_range,
                "-r", rpc_url,
                "--json", 
                "-o", str(sample_dir)
            ]
            
            print(f"Running sample command: {' '.join(sample_cmd)}")
            sample_result = subprocess.run(
                sample_cmd,
                capture_output=True,
                text=True,
                timeout=30  # Add timeout to prevent hanging
            )
            
            if sample_result.returncode == 0:
                # Try to find the report file which contains info about generated files
                report_dir = sample_dir / ".cryo" / "reports"
                if report_dir.exists():
                    # Get the most recent report file
                    report_files = sorted(report_dir.glob("*.json"), key=lambda x: x.stat().st_mtime, reverse=True)
                    if report_files:
                        with open(report_files[0], 'r') as f:
                            report_data = json.load(f)
                            # Get the list of completed files from the report
                            if "results" in report_data and "completed_paths" in report_data["results"]:
                                completed_files = report_data["results"]["completed_paths"]
                                print(f"Found {len(completed_files)} files in Cryo report: {completed_files}")
                                info["sample_files"] = completed_files
                                return info
                
                # Fallback to glob search if report file not found
                output_files = list(sample_dir.glob(f"*{name}*.json"))
                print(f"Output files found via glob: {output_files}")
                
                if output_files:
                    # Convert Path objects to strings for JSON serialization
                    file_paths = [str(file_path) for file_path in output_files]
                    info["sample_files"] = file_paths
                else:
                    info["sample_error"] = "No output files generated"
            else:
                info["sample_error"] = sample_result.stderr
                info["sample_stdout"] = sample_result.stdout  # Include stdout for debugging
        except (subprocess.TimeoutExpired, Exception) as e:
            info["sample_error"] = str(e)
        
        return info
  • Helper resource function called by lookup_dataset to get basic dataset information, description, and example queries via cryo's help command.
    @mcp.resource("dataset://{name}")
    def get_dataset_info(name: str) -> Dict[str, Any]:
        """Get information about a specific dataset"""
        # Ensure we have the RPC URL
        rpc_url = os.environ.get("ETH_RPC_URL", DEFAULT_RPC_URL)
        
        result = subprocess.run(
            ["cryo", "help", name, "-r", rpc_url],
            capture_output=True,
            text=True
        )
    
        # Get the latest block number for examples
        latest_block = get_latest_block_number()
        latest_example = ""
        
        if latest_block:
            latest_example = f"query_dataset('{name}', blocks_from_latest=10)  # Gets latest-10 to latest blocks"
        
        # Add special examples for datasets requiring address parameters
        address_example = ""
        if "address" in result.stdout.lower() and "required parameters: address" in result.stdout.lower():
            address_example = f"query_dataset('{name}', blocks='1000:1010', contract='0x123...')  # Use contract parameter for address"
        
        return {
            "name": name,
            "description": result.stdout,
            "example_queries": [
                f"query_dataset('{name}', blocks='1000:1010')",
                f"query_dataset('{name}', start_block=1000, end_block=1009)",
                f"query_dataset('{name}', use_latest=True)  # Gets just the latest block",
                latest_example,
                address_example
            ] if address_example else [
                f"query_dataset('{name}', blocks='1000:1010')",
                f"query_dataset('{name}', start_block=1000, end_block=1009)",
                f"query_dataset('{name}', use_latest=True)  # Gets just the latest block",
                latest_example
            ],
            "notes": [
                "Block ranges are inclusive for start_block and end_block when using integer parameters.",
                "Use 'use_latest=True' to query only the latest block.",
                "Use 'blocks_from_latest=N' to query the latest N blocks.",
                "IMPORTANT: For datasets requiring an 'address' parameter (like 'balances'), use the 'contract' parameter.",
                "Always check the required parameters in the dataset description and use lookup_dataset() first."
            ]
        }
  • The @mcp.tool() decorator registers the lookup_dataset function as an MCP tool.
    @mcp.tool()
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by explaining the tool's behavior: it's a read-only lookup (implied by 'look up' and 'return information'), it provides schema details and required parameters, and it includes sample data generation capabilities. However, it doesn't mention rate limits, authentication needs, or error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded with the core purpose and important usage guideline. Every sentence adds value, though the example section is somewhat lengthy. The structure flows logically from purpose to usage to parameters to returns.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 5 parameters, 0% schema coverage, no annotations, and no output schema, the description does an excellent job of explaining what the tool does, when to use it, and what parameters mean. The main gap is lack of information about return format details, though it describes what information will be included.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 0% description coverage, but the description fully compensates by explaining all 5 parameters in detail. It clarifies that 'name' is the dataset identifier, and the other 4 parameters control sample data generation (with specific examples like 'sample_blocks_from_latest'). This adds substantial meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('look up', 'return detailed information') and resource ('dataset'), distinguishing it from siblings like list_datasets (which lists datasets) or query_dataset (which queries data). It explicitly explains this is for understanding dataset parameters and schema before querying.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('Always use this function before querying a new dataset') and when to use alternatives (query_dataset for actual queries). It includes a concrete example showing the workflow between lookup_dataset and query_dataset, making the usage context clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/z80dev/cryo-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server