lookup_dataset
Retrieve detailed dataset information including required parameters, schema, and example queries to ensure correct usage before querying. Essential for datasets like 'balances' that need specific inputs such as 'address'.
Instructions
Look up a specific dataset and return detailed information about it. IMPORTANT: Always use this
function before querying a new dataset to understand its required parameters and schema.
The returned information includes:
1. Required parameters for the dataset (IMPORTANT for datasets like 'balances' that need an address)
2. Schema details showing available columns and data types
3. Example queries for the dataset
When the dataset requires specific parameters like 'address' (for 'balances'),
ALWAYS use the 'contract' parameter in query_dataset() to pass these values.
Example:
For 'balances' dataset, lookup_dataset('balances') will show it requires an 'address' parameter.
You should then query it using:
query_dataset('balances', blocks='1000:1010', contract='0x1234...')
Args:
name: The name of the dataset to look up
sample_start_block: Optional start block for sample data (integer)
sample_end_block: Optional end block for sample data (integer)
use_latest_sample: If True, use the latest block for sample data
sample_blocks_from_latest: Number of blocks before the latest to include in sample
Returns:
Detailed information about the dataset including schema and available fields
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | ||
| sample_blocks_from_latest | No | ||
| sample_end_block | No | ||
| sample_start_block | No | ||
| use_latest_sample | No |
Implementation Reference
- cryo_mcp/server.py:322-477 (handler)The primary handler function for the 'lookup_dataset' tool. It is decorated with @mcp.tool() for registration in the MCP server. The function fetches dataset information using get_dataset_info, retrieves schema via cryo's --dry-run, and optionally generates sample data by querying a small block range.@mcp.tool() def lookup_dataset( name: str, sample_start_block: Optional[int] = None, sample_end_block: Optional[int] = None, use_latest_sample: bool = False, sample_blocks_from_latest: Optional[int] = None ) -> Dict[str, Any]: """ Look up a specific dataset and return detailed information about it. IMPORTANT: Always use this function before querying a new dataset to understand its required parameters and schema. The returned information includes: 1. Required parameters for the dataset (IMPORTANT for datasets like 'balances' that need an address) 2. Schema details showing available columns and data types 3. Example queries for the dataset When the dataset requires specific parameters like 'address' (for 'balances'), ALWAYS use the 'contract' parameter in query_dataset() to pass these values. Example: For 'balances' dataset, lookup_dataset('balances') will show it requires an 'address' parameter. You should then query it using: query_dataset('balances', blocks='1000:1010', contract='0x1234...') Args: name: The name of the dataset to look up sample_start_block: Optional start block for sample data (integer) sample_end_block: Optional end block for sample data (integer) use_latest_sample: If True, use the latest block for sample data sample_blocks_from_latest: Number of blocks before the latest to include in sample Returns: Detailed information about the dataset including schema and available fields """ # Get basic dataset info info = get_dataset_info(name) # Ensure we have the RPC URL rpc_url = os.environ.get("ETH_RPC_URL", DEFAULT_RPC_URL) # Get schema information by running the dataset with --dry-run schema_result = subprocess.run( ["cryo", name, "--dry-run", "-r", rpc_url], capture_output=True, text=True ) if schema_result.returncode == 0: info["schema"] = schema_result.stdout else: info["schema_error"] = schema_result.stderr # Try to get a sample of the dataset (first 5 records) try: data_dir = Path(os.environ.get("CRYO_DATA_DIR", DEFAULT_DATA_DIR)) # Determine block range for sample (priority: latest > specified blocks > default) if use_latest_sample or sample_blocks_from_latest is not None: # Get the latest block number latest_block = get_latest_block_number() if latest_block is None: info["sample_error"] = "Failed to get the latest block number from the RPC endpoint" return info if sample_blocks_from_latest is not None: # Use a range of blocks from latest-n to latest block_range = f"{latest_block - sample_blocks_from_latest}:{latest_block+1}" else: # Just the latest 5 blocks block_range = f"{latest_block-4}:{latest_block+1}" info["sample_block_range"] = block_range # Use the latest directory for latest block samples sample_dir = data_dir / "latest" sample_dir.mkdir(parents=True, exist_ok=True) # Clean up the latest directory before new query print("Cleaning latest directory for current sample") existing_files = list(sample_dir.glob(f"*{name}*.*")) for file in existing_files: try: file.unlink() print(f"Removed existing sample file: {file}") except Exception as e: print(f"Warning: Could not remove sample file {file}: {e}") else: # For historical blocks, get the start block and end block if sample_start_block is not None: if sample_end_block is not None: # Note: cryo uses [start:end) range (inclusive start, exclusive end) # Add 1 to end_block to include it in the range block_range = f"{sample_start_block}:{sample_end_block+1}" else: # Use start block and get 5 blocks block_range = f"{sample_start_block}:{sample_start_block+5}" else: # Default to a known good block range block_range = "1000:1005" # For historical samples, use the main data directory sample_dir = data_dir sample_dir.mkdir(parents=True, exist_ok=True) # Use the block range for the sample sample_cmd = [ "cryo", name, "-b", block_range, "-r", rpc_url, "--json", "-o", str(sample_dir) ] print(f"Running sample command: {' '.join(sample_cmd)}") sample_result = subprocess.run( sample_cmd, capture_output=True, text=True, timeout=30 # Add timeout to prevent hanging ) if sample_result.returncode == 0: # Try to find the report file which contains info about generated files report_dir = sample_dir / ".cryo" / "reports" if report_dir.exists(): # Get the most recent report file report_files = sorted(report_dir.glob("*.json"), key=lambda x: x.stat().st_mtime, reverse=True) if report_files: with open(report_files[0], 'r') as f: report_data = json.load(f) # Get the list of completed files from the report if "results" in report_data and "completed_paths" in report_data["results"]: completed_files = report_data["results"]["completed_paths"] print(f"Found {len(completed_files)} files in Cryo report: {completed_files}") info["sample_files"] = completed_files return info # Fallback to glob search if report file not found output_files = list(sample_dir.glob(f"*{name}*.json")) print(f"Output files found via glob: {output_files}") if output_files: # Convert Path objects to strings for JSON serialization file_paths = [str(file_path) for file_path in output_files] info["sample_files"] = file_paths else: info["sample_error"] = "No output files generated" else: info["sample_error"] = sample_result.stderr info["sample_stdout"] = sample_result.stdout # Include stdout for debugging except (subprocess.TimeoutExpired, Exception) as e: info["sample_error"] = str(e) return info
- cryo_mcp/server.py:274-320 (helper)Helper resource function called by lookup_dataset to get basic dataset information, description, and example queries via cryo's help command.@mcp.resource("dataset://{name}") def get_dataset_info(name: str) -> Dict[str, Any]: """Get information about a specific dataset""" # Ensure we have the RPC URL rpc_url = os.environ.get("ETH_RPC_URL", DEFAULT_RPC_URL) result = subprocess.run( ["cryo", "help", name, "-r", rpc_url], capture_output=True, text=True ) # Get the latest block number for examples latest_block = get_latest_block_number() latest_example = "" if latest_block: latest_example = f"query_dataset('{name}', blocks_from_latest=10) # Gets latest-10 to latest blocks" # Add special examples for datasets requiring address parameters address_example = "" if "address" in result.stdout.lower() and "required parameters: address" in result.stdout.lower(): address_example = f"query_dataset('{name}', blocks='1000:1010', contract='0x123...') # Use contract parameter for address" return { "name": name, "description": result.stdout, "example_queries": [ f"query_dataset('{name}', blocks='1000:1010')", f"query_dataset('{name}', start_block=1000, end_block=1009)", f"query_dataset('{name}', use_latest=True) # Gets just the latest block", latest_example, address_example ] if address_example else [ f"query_dataset('{name}', blocks='1000:1010')", f"query_dataset('{name}', start_block=1000, end_block=1009)", f"query_dataset('{name}', use_latest=True) # Gets just the latest block", latest_example ], "notes": [ "Block ranges are inclusive for start_block and end_block when using integer parameters.", "Use 'use_latest=True' to query only the latest block.", "Use 'blocks_from_latest=N' to query the latest N blocks.", "IMPORTANT: For datasets requiring an 'address' parameter (like 'balances'), use the 'contract' parameter.", "Always check the required parameters in the dataset description and use lookup_dataset() first." ] }
- cryo_mcp/server.py:322-322 (registration)The @mcp.tool() decorator registers the lookup_dataset function as an MCP tool.@mcp.tool()