Amp MCP

amp-mcp
docs

glossary.md•6.2 KiB

# Glossary A glossary defining key concepts and terminology used throughout the Amp project. Organized by logical and physical architecture layers. ## Logical ### Field A column definition consisting of a triple `(name, type, nullable)`, where the `type` is an Arrow data type ([spec](https://arrow.apache.org/docs/format/Columnar.html#data-types)). ### Schema A list of [fields](#field) that defines the structure of data in a table or query result. ### Query A SQL query string or a [DataFusion](#datafusion) logical plan ([spec](https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.LogicalPlan.html)). The query output conforms to a statically-known [schema](#schema). ### View A named [query](#query) that is part of a [dataset](#dataset). Can be referred to in queries, as in `select * from dataset.view`. ### Table A named collection of data with a fixed [schema](#schema) that can be queried using SQL. Tables are the primary interface for accessing data within a [dataset](#dataset). **Key characteristics:** - Has a defined [schema](#schema) (list of [fields](#field) with names, types, and nullability) - Physically stored as [Parquet](#parquet) files, typically partitioned by block ranges for blockchain data - Accessible via SQL [queries](#query) as `"namespace/dataset_name".table_name` (quoted due to forward slash) - Can be queried through [Arrow Flight](#arrow-flight) or HTTP JSON APIs Tables contain the actual materialized data that users query, whether extracted directly from blockchain sources or computed from SQL transformations. ### Dataset A collection of [tables](#table) that represents a unit of ownership, publishing and versioning. Datasets are identified by a namespace, name, and version/revision, and define how data is extracted, transformed, and materialized into [Parquet](#parquet) files for querying. ### Dataset Namespace An organizational grouping for datasets that provides logical separation and multi-tenancy support. Namespaces help organize datasets by team, project, or organization (e.g., `my_org`, `edgeandnode`). The default namespace is represented by `_`. ### Dataset Version A specific revision of a dataset identified by either: - **Semantic version**: Following semver format (e.g., `1.0.0`, `2.1.3`) - **Special tags**: System-managed tags like `latest` (highest semantic version) or `dev` (most recent registration) - **Manifest hash**: Direct reference to a specific manifest by its content hash Datasets are referenced using the format: `namespace/name@revision` (e.g., `my_org/eth_mainnet@1.0.0`). ### Dataset Manifest A structured definition file that specifies a [dataset's](#dataset) configuration, including its [kind](#dataset-kind), data sources, transformations, [schema](#schema), and dependencies. Acts as the blueprint for how Amp should process and materialize the dataset. ### Dataset Kind The implementation type that determines how a [dataset](#dataset) processes data: - **derived**: Transforms and combines data from other datasets using SQL [queries](#query) - **evm-rpc**: Extracts blockchain data via Ethereum-compatible JSON-RPC endpoints - **firehose**: Streams real-time blockchain data through StreamingFast Firehose protocol ### Dataset Category A high-level classification grouping [datasets](#dataset) by their data processing approach: - **Raw** (a.k.a. **Extractor Datasets**): Extracts data directly from external blockchain sources (includes _evm-rpc_ and _firehose_ [kinds](#dataset-kind)) - **Derived**: Transforms and combines data from existing datasets (_derived_ [kind](#dataset-kind)) ## Physical Amp currently adopts the FDAP stack for its physical layer, see https://www.influxdata.com/glossary/fdap-stack/. ### DataFusion The query planner and execution engine used by Amp, see https://datafusion.apache.org. ### Arrow record batch Arrow is an in-memory and over-the-wire data format. Query results are returned by DataFusion as a stream of Arrow record batches. See https://arrow.apache.org/docs/index.html. ### Parquet The file format in which record batches are persisted, for example to materialize query results. See https://parquet.apache.org. ### Arrow Flight The RPC protocol Amp uses for queries, with results returned as Arrow record batches over gRPC, see https://arrow.apache.org/docs/format/Flight.html. ## Architecture Components ### Amp Engine The complete distributed system comprising all software components that run in a cluster: the [controller](#controller), [workers](#worker), [query server](#amp-server), and [metadata database](#metadata-database). The Amp engine provides the full data extraction, transformation, and query serving capabilities. ### Amp Cluster Synonym for [Amp engine](#amp-engine), typically used when referring to deployments on cloud infrastructure. Emphasizes the distributed, multi-node nature of the system. ### Amp Server The query server component of the Amp data plane that serves queries over the [Arrow Flight](#arrow-flight) protocol. Also referred to as the "Arrow Flight server" or "query server". Started via the `ampd server` command. ### Amp Daemon A continuously running background process, following the Unix daemon concept. Refers to any of the `ampd` service processes: controller daemon, worker daemons, or query server daemon. The term emphasizes the long-running, background nature of these services. ### Controller The component responsible for job scheduling, worker coordination, and exposing the [engine administration interface](#engine-administration-interface). Started via the `ampd controller` command. ### Worker A process that executes data extraction jobs scheduled by the [controller](#controller). Multiple workers can run in parallel to scale extraction throughput. Started via the `ampd worker` command. ### Engine Administration Interface The administrative API exposed by the [controller](#controller) for managing datasets, jobs, workers, providers, and storage. Accessed by the `ampctl` and `amp` CLIs. Also referred to as the "Admin API" in some contexts. ### Metadata Database A PostgreSQL database that stores metadata about datasets, jobs, workers, files, and extraction progress. Used by the [controller](#controller) for state management and coordination across distributed components.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/edgeandnode/amp-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

glossary.md•6.2 KiB