Skip to main content
Glama
AGENTS.md10.5 kB
## What are Apify Actors? - Actors are serverless cloud programs that can perform anything from a simple action, like filling out a web form, to a complex operation, like crawling an entire website or removing duplicates from a large dataset. - Actors are programs packaged as Docker images, which accept a well-defined JSON input, perform an action, and optionally produce a well-defined JSON output. ### Apify Actor directory structure ```text .actor/ ├── actor.json # Actor config: name, version, env vars, runtime settings ├── input_schema.json # Input validation & Console form definition ├── dataset_schema.json # Dataset schema definition └── output_schema.json # Specifies where an Actor stores its output src/ └── main.js # Actor entry point and orchestrator storage/ # Local storage (mirrors Cloud during development) ├── datasets/ # Output items (JSON objects) ├── key_value_stores/ # Files, config, INPUT └── request_queues/ # Pending crawl requests Dockerfile # Container image definition AGENTS.md # AI agent instructions (this file) ``` ## Apify CLI ### Installation - Install Apify CLI only if it is not already installed. - If Apify CLI is not installed, install it using the following commands: - macOS/Linux: `curl -fsSL https://apify.com/install-cli.sh | bash` - Windows: `irm https://apify.com/install-cli.ps1 | iex` ### Apify CLI Commands ```bash # Local development apify run # Run Actor locally # Authentication & deployment apify login # Authenticate account apify push # Deploy to Apify platform # Help apify help # List all commands ``` ## Do - use the default values for all fields in the actor.json, input_schema.json, output_schema.json, and main.js files - use Apify CLI to run the Actor locally, and push it to the Apify platform - accept well-defined JSON input and produce structured JSON output - use Apify SDK (`apify`) for code running ON Apify platform - validate input early with proper error handling and fail gracefully - use CheerioCrawler for static HTML content (10x faster than browsers) - use PlaywrightCrawler only for JavaScript-heavy sites and dynamic content - use router pattern (createCheerioRouter/createPlaywrightRouter) for complex crawls - implement retry strategies with exponential backoff for failed requests - use proper concurrency settings (HTTP: 10-50, Browser: 1-5) - set sensible defaults in `.actor/input_schema.json` for all optional fields - set up output schema in `.actor/output_schema.json` - clean and validate data before pushing to dataset - use semantic CSS selectors and fallback strategies for missing elements - respect robots.txt, ToS, and implement rate limiting with delays - check which tools (cheerio/playwright/crawlee) are installed before applying guidance ## Don't - do not run apify create command - do not rely on `Dataset.getInfo()` for final counts on Cloud platform - do not use browser crawlers when HTTP/Cheerio works (massive performance gains with HTTP) - do not hard code values that should be in input schema or environment variables - do not skip input validation or error handling - do not overload servers - use appropriate concurrency and delays - do not scrape prohibited content or ignore Terms of Service - do not store personal/sensitive data unless explicitly permitted - do not use deprecated options like `requestHandlerTimeoutMillis` on CheerioCrawler (v3.x) - do not use `additionalHttpHeaders` - use `preNavigationHooks` instead ## Actor Input Schema The input schema defines the input parameters for an Actor. It's a JSON object comprising various field types supported by the Apify platform. ### Structure ```json { "title": "<INPUT-SCHEMA-TITLE>", "type": "object", "schemaVersion": 1, "properties": { /* define input fields here */ }, "required": [] } ``` ## Actor Output Schema The Actor output schema builds upon the schemas for the dataset and key-value store. It specifies where an Actor stores its output and defines templates for accessing that output. Apify Console uses these output definitions to display run results. ### Structure ```json { "actorOutputSchemaVersion": 1, "title": "<OUTPUT-SCHEMA-TITLE>", "properties": { /* define your outputs here */ } } ``` ### Output Schema Template Variables - `links` (object) - Contains quick links to most commonly used URLs - `links.publicRunUrl` (string) - Public run url in format `https://console.apify.com/view/runs/:runId` - `links.consoleRunUrl` (string) - Console run url in format `https://console.apify.com/actors/runs/:runId` - `links.apiRunUrl` (string) - API run url in format `https://api.apify.com/v2/actor-runs/:runId` - `links.apiDefaultDatasetUrl` (string) - API url of default dataset in format `https://api.apify.com/v2/datasets/:defaultDatasetId` - `links.apiDefaultKeyValueStoreUrl` (string) - API url of default key-value store in format `https://api.apify.com/v2/key-value-stores/:defaultKeyValueStoreId` - `links.containerRunUrl` (string) - URL of a webserver running inside the run in format `https://<containerId>.runs.apify.net/` - `run` (object) - Contains information about the run same as it is returned from the `GET Run` API endpoint - `run.defaultDatasetId` (string) - ID of the default dataset - `run.defaultKeyValueStoreId` (string) - ID of the default key-value store ## Dataset Schema Specification The dataset schema defines how your Actor's output data is structured, transformed, and displayed in the Output tab in the Apify Console. ### Structure ```json { "actorSpecification": 1, "fields": {}, "views": { "<VIEW_NAME>": { "title": "string (required)", "description": "string (optional)", "transformation": { "fields": ["string (required)"], "unwind": ["string (optional)"], "flatten": ["string (optional)"], "omit": ["string (optional)"], "limit": "integer (optional)", "desc": "boolean (optional)" }, "display": { "component": "table (required)", "properties": { "<FIELD_NAME>": { "label": "string (optional)", "format": "text|number|date|link|boolean|image|array|object (optional)" } } } } } } ``` **Dataset Schema Properties:** - `actorSpecification` (integer, required) - Specifies the version of dataset schema structure document (currently only version 1) - `fields` (JSONSchema object, required) - Schema of one dataset object (use JsonSchema Draft 2020-12 or compatible) - `views` (DatasetView object, required) - Object with API and UI views description **DatasetView Properties:** - `title` (string, required) - Visible in UI Output tab and API - `description` (string, optional) - Only available in API response - `transformation` (ViewTransformation object, required) - Data transformation applied when loading from Dataset API - `display` (ViewDisplay object, required) - Output tab UI visualization definition **ViewTransformation Properties:** - `fields` (string[], required) - Fields to present in output (order matches column order) - `unwind` (string[], optional) - Deconstructs nested children into parent object - `flatten` (string[], optional) - Transforms nested object into flat structure - `omit` (string[], optional) - Removes specified fields from output - `limit` (integer, optional) - Maximum number of results (default: all) - `desc` (boolean, optional) - Sort order (true = newest first) **ViewDisplay Properties:** - `component` (string, required) - Only `table` is available - `properties` (Object, optional) - Keys matching `transformation.fields` with ViewDisplayProperty values **ViewDisplayProperty Properties:** - `label` (string, optional) - Table column header - `format` (string, optional) - One of: `text`, `number`, `date`, `link`, `boolean`, `image`, `array`, `object` ## Key-Value Store Schema Specification The key-value store schema organizes keys into logical groups called collections for easier data management. ### Structure ```json { "actorKeyValueStoreSchemaVersion": 1, "title": "string (required)", "description": "string (optional)", "collections": { "<COLLECTION_NAME>": { "title": "string (required)", "description": "string (optional)", "key": "string (conditional - use key OR keyPrefix)", "keyPrefix": "string (conditional - use key OR keyPrefix)", "contentTypes": ["string (optional)"], "jsonSchema": "object (optional)" } } } ``` **Key-Value Store Schema Properties:** - `actorKeyValueStoreSchemaVersion` (integer, required) - Version of key-value store schema structure document (currently only version 1) - `title` (string, required) - Title of the schema - `description` (string, optional) - Description of the schema - `collections` (Object, required) - Object where each key is a collection ID and value is a Collection object **Collection Properties:** - `title` (string, required) - Collection title shown in UI tabs - `description` (string, optional) - Description appearing in UI tooltips - `key` (string, conditional) - Single specific key for this collection - `keyPrefix` (string, conditional) - Prefix for keys included in this collection - `contentTypes` (string[], optional) - Allowed content types for validation - `jsonSchema` (object, optional) - JSON Schema Draft 07 format for `application/json` content type validation Either `key` or `keyPrefix` must be specified for each collection, but not both. ## Apify MCP Tools If MCP server is configured, use these tools for documentation: - `search-apify-docs` - Search documentation - `fetch-apify-docs` - Get full doc pages Otherwise, reference: `@https://mcp.apify.com/` ## Resources - [docs.apify.com/llms.txt](https://docs.apify.com/llms.txt) - Quick reference - [docs.apify.com/llms-full.txt](https://docs.apify.com/llms-full.txt) - Complete docs - [crawlee.dev](https://crawlee.dev) - Crawlee documentation - [whitepaper.actor](https://raw.githubusercontent.com/apify/actor-whitepaper/refs/heads/master/README.md) - Complete Actor specification

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/oabolade/figma_mcp_server_actor'

If you have feedback or need assistance with the MCP directory API, please join our Discord server