Skip to main content
Glama

lookup_dataset

Retrieve detailed dataset information including required parameters, schema, and example queries to ensure correct usage before querying. Essential for datasets like 'balances' that need specific inputs such as 'address'.

Instructions

Look up a specific dataset and return detailed information about it. IMPORTANT: Always use this
function before querying a new dataset to understand its required parameters and schema.

The returned information includes:
1. Required parameters for the dataset (IMPORTANT for datasets like 'balances' that need an address)
2. Schema details showing available columns and data types
3. Example queries for the dataset

When the dataset requires specific parameters like 'address' (for 'balances'),
ALWAYS use the 'contract' parameter in query_dataset() to pass these values.

Example:
For 'balances' dataset, lookup_dataset('balances') will show it requires an 'address' parameter.
You should then query it using:
query_dataset('balances', blocks='1000:1010', contract='0x1234...')

Args:
    name: The name of the dataset to look up
    sample_start_block: Optional start block for sample data (integer)
    sample_end_block: Optional end block for sample data (integer)
    use_latest_sample: If True, use the latest block for sample data
    sample_blocks_from_latest: Number of blocks before the latest to include in sample
    
Returns:
    Detailed information about the dataset including schema and available fields

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
nameYes
sample_blocks_from_latestNo
sample_end_blockNo
sample_start_blockNo
use_latest_sampleNo

Implementation Reference

  • The primary handler function for the 'lookup_dataset' tool. It is decorated with @mcp.tool() for registration in the MCP server. The function fetches dataset information using get_dataset_info, retrieves schema via cryo's --dry-run, and optionally generates sample data by querying a small block range.
    @mcp.tool()
    def lookup_dataset(
        name: str,
        sample_start_block: Optional[int] = None,
        sample_end_block: Optional[int] = None,
        use_latest_sample: bool = False,
        sample_blocks_from_latest: Optional[int] = None
    ) -> Dict[str, Any]:
        """
        Look up a specific dataset and return detailed information about it. IMPORTANT: Always use this
        function before querying a new dataset to understand its required parameters and schema.
        
        The returned information includes:
        1. Required parameters for the dataset (IMPORTANT for datasets like 'balances' that need an address)
        2. Schema details showing available columns and data types
        3. Example queries for the dataset
        
        When the dataset requires specific parameters like 'address' (for 'balances'),
        ALWAYS use the 'contract' parameter in query_dataset() to pass these values.
        
        Example:
        For 'balances' dataset, lookup_dataset('balances') will show it requires an 'address' parameter.
        You should then query it using:
        query_dataset('balances', blocks='1000:1010', contract='0x1234...')
        
        Args:
            name: The name of the dataset to look up
            sample_start_block: Optional start block for sample data (integer)
            sample_end_block: Optional end block for sample data (integer)
            use_latest_sample: If True, use the latest block for sample data
            sample_blocks_from_latest: Number of blocks before the latest to include in sample
            
        Returns:
            Detailed information about the dataset including schema and available fields
        """
        # Get basic dataset info
        info = get_dataset_info(name)
        
        # Ensure we have the RPC URL
        rpc_url = os.environ.get("ETH_RPC_URL", DEFAULT_RPC_URL)
        
        # Get schema information by running the dataset with --dry-run
        schema_result = subprocess.run(
            ["cryo", name, "--dry-run", "-r", rpc_url],
            capture_output=True,
            text=True
        )
        
        if schema_result.returncode == 0:
            info["schema"] = schema_result.stdout
        else:
            info["schema_error"] = schema_result.stderr
        
        # Try to get a sample of the dataset (first 5 records)
        try:
            data_dir = Path(os.environ.get("CRYO_DATA_DIR", DEFAULT_DATA_DIR))
            
            # Determine block range for sample (priority: latest > specified blocks > default)
            if use_latest_sample or sample_blocks_from_latest is not None:
                # Get the latest block number
                latest_block = get_latest_block_number()
                
                if latest_block is None:
                    info["sample_error"] = "Failed to get the latest block number from the RPC endpoint"
                    return info
                
                if sample_blocks_from_latest is not None:
                    # Use a range of blocks from latest-n to latest
                    block_range = f"{latest_block - sample_blocks_from_latest}:{latest_block+1}"
                else:
                    # Just the latest 5 blocks
                    block_range = f"{latest_block-4}:{latest_block+1}"
                
                info["sample_block_range"] = block_range
                
                # Use the latest directory for latest block samples
                sample_dir = data_dir / "latest"
                sample_dir.mkdir(parents=True, exist_ok=True)
                
                # Clean up the latest directory before new query
                print("Cleaning latest directory for current sample")
                existing_files = list(sample_dir.glob(f"*{name}*.*"))
                for file in existing_files:
                    try:
                        file.unlink()
                        print(f"Removed existing sample file: {file}")
                    except Exception as e:
                        print(f"Warning: Could not remove sample file {file}: {e}")
            else:
                # For historical blocks, get the start block and end block
                if sample_start_block is not None:
                    if sample_end_block is not None:
                        # Note: cryo uses [start:end) range (inclusive start, exclusive end)
                        # Add 1 to end_block to include it in the range
                        block_range = f"{sample_start_block}:{sample_end_block+1}"
                    else:
                        # Use start block and get 5 blocks
                        block_range = f"{sample_start_block}:{sample_start_block+5}"
                else:
                    # Default to a known good block range
                    block_range = "1000:1005"
                
                # For historical samples, use the main data directory
                sample_dir = data_dir
                sample_dir.mkdir(parents=True, exist_ok=True)
                    
            # Use the block range for the sample
            sample_cmd = [
                "cryo", name, 
                "-b", block_range,
                "-r", rpc_url,
                "--json", 
                "-o", str(sample_dir)
            ]
            
            print(f"Running sample command: {' '.join(sample_cmd)}")
            sample_result = subprocess.run(
                sample_cmd,
                capture_output=True,
                text=True,
                timeout=30  # Add timeout to prevent hanging
            )
            
            if sample_result.returncode == 0:
                # Try to find the report file which contains info about generated files
                report_dir = sample_dir / ".cryo" / "reports"
                if report_dir.exists():
                    # Get the most recent report file
                    report_files = sorted(report_dir.glob("*.json"), key=lambda x: x.stat().st_mtime, reverse=True)
                    if report_files:
                        with open(report_files[0], 'r') as f:
                            report_data = json.load(f)
                            # Get the list of completed files from the report
                            if "results" in report_data and "completed_paths" in report_data["results"]:
                                completed_files = report_data["results"]["completed_paths"]
                                print(f"Found {len(completed_files)} files in Cryo report: {completed_files}")
                                info["sample_files"] = completed_files
                                return info
                
                # Fallback to glob search if report file not found
                output_files = list(sample_dir.glob(f"*{name}*.json"))
                print(f"Output files found via glob: {output_files}")
                
                if output_files:
                    # Convert Path objects to strings for JSON serialization
                    file_paths = [str(file_path) for file_path in output_files]
                    info["sample_files"] = file_paths
                else:
                    info["sample_error"] = "No output files generated"
            else:
                info["sample_error"] = sample_result.stderr
                info["sample_stdout"] = sample_result.stdout  # Include stdout for debugging
        except (subprocess.TimeoutExpired, Exception) as e:
            info["sample_error"] = str(e)
        
        return info
  • Helper resource function called by lookup_dataset to get basic dataset information, description, and example queries via cryo's help command.
    @mcp.resource("dataset://{name}")
    def get_dataset_info(name: str) -> Dict[str, Any]:
        """Get information about a specific dataset"""
        # Ensure we have the RPC URL
        rpc_url = os.environ.get("ETH_RPC_URL", DEFAULT_RPC_URL)
        
        result = subprocess.run(
            ["cryo", "help", name, "-r", rpc_url],
            capture_output=True,
            text=True
        )
    
        # Get the latest block number for examples
        latest_block = get_latest_block_number()
        latest_example = ""
        
        if latest_block:
            latest_example = f"query_dataset('{name}', blocks_from_latest=10)  # Gets latest-10 to latest blocks"
        
        # Add special examples for datasets requiring address parameters
        address_example = ""
        if "address" in result.stdout.lower() and "required parameters: address" in result.stdout.lower():
            address_example = f"query_dataset('{name}', blocks='1000:1010', contract='0x123...')  # Use contract parameter for address"
        
        return {
            "name": name,
            "description": result.stdout,
            "example_queries": [
                f"query_dataset('{name}', blocks='1000:1010')",
                f"query_dataset('{name}', start_block=1000, end_block=1009)",
                f"query_dataset('{name}', use_latest=True)  # Gets just the latest block",
                latest_example,
                address_example
            ] if address_example else [
                f"query_dataset('{name}', blocks='1000:1010')",
                f"query_dataset('{name}', start_block=1000, end_block=1009)",
                f"query_dataset('{name}', use_latest=True)  # Gets just the latest block",
                latest_example
            ],
            "notes": [
                "Block ranges are inclusive for start_block and end_block when using integer parameters.",
                "Use 'use_latest=True' to query only the latest block.",
                "Use 'blocks_from_latest=N' to query the latest N blocks.",
                "IMPORTANT: For datasets requiring an 'address' parameter (like 'balances'), use the 'contract' parameter.",
                "Always check the required parameters in the dataset description and use lookup_dataset() first."
            ]
        }
  • The @mcp.tool() decorator registers the lookup_dataset function as an MCP tool.
    @mcp.tool()
Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/z80dev/cryo-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server