Skip to main content
Glama

airflow_best_practices

Provides best practices guidance for designing, optimizing, and securing Apache Airflow workflows on Amazon MWAA, covering DAG patterns, performance, resource management, error handling, and security.

Instructions

Get MWAA and Apache Airflow best practices guidance.

Returns comprehensive guidance on:

  • DAG design patterns

  • Performance optimization

  • Resource management

  • Error handling

  • Security best practices

  • MWAA-specific considerations

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The tool handler that registers 'airflow_best_practices' and returns the guidance content.
    @mcp.tool(name="airflow_best_practices")
    async def airflow_best_practices() -> str:
        """Get MWAA and Apache Airflow best practices guidance.
    
        Returns comprehensive guidance on:
        - DAG design patterns
        - Performance optimization
        - Resource management
        - Error handling
        - Security best practices
        - MWAA-specific considerations
        """
        return AIRFLOW_BEST_PRACTICES
  • The constant containing the actual best practices text content returned by the handler.
    AIRFLOW_BEST_PRACTICES = """
    # MWAA and Apache Airflow Best Practices Guide
    
    ## Environment Setup and Configuration
    
    ### 1. MWAA Environment Sizing
    - **Start Small**: Begin with mw1.small and scale up based on actual usage
    - **Monitor Metrics**: Use CloudWatch metrics to track worker utilization
    - **Auto-scaling**: Configure min/max workers appropriately (typically 1-10 for most workloads)
    - **Scheduler Count**: Use 2 schedulers for HA, increase only for very large deployments
    
    ### 2. S3 Bucket Organization
    ```
    s3://your-mwaa-bucket/
    ├── dags/
    │   ├── main_workflow.py
    │   └── utils/
    │       └── helpers.py
    ├── plugins/
    │   └── custom_operators.zip
    ├── requirements/
    │   └── requirements.txt
    └── scripts/
        └── startup.sh
    ```
    
    ### 3. Requirements Management
    - Pin all package versions in requirements.txt
    - Test requirements locally before deploying
    - Use constraints files for Airflow providers
    - Keep requirements minimal to reduce startup time
    
    ## DAG Design Patterns
    
    ### 1. Idempotency
    - Design tasks to be re-runnable without side effects
    - Use upsert operations instead of inserts
    - Include date partitioning in data operations
    
    ### 2. Task Dependencies
    ```python
    # Good: Clear linear dependencies
    extract >> transform >> load
    
    # Better: Parallel where possible
    extract >> [transform_a, transform_b] >> load
    
    # Best: Dynamic task mapping (Airflow 2.3+)
    @task
    def process_file(filename):
        # Process individual file
        pass
    
    filenames = get_files()
    process_file.expand(filename=filenames)
    ```
    
    ### 3. Error Handling
    ```python
    # Set appropriate retries
    default_args = {
        'retries': 2,
        'retry_delay': timedelta(minutes=5),
        'retry_exponential_backoff': True,
        'max_retry_delay': timedelta(minutes=30),
    }
    
    # Use trigger rules for error paths
    error_handler = EmptyOperator(
        task_id='handle_errors',
        trigger_rule='one_failed'
    )
    ```
    
    ## Performance Optimization
    
    ### 1. DAG Loading Time
    - Keep DAG files small and focused
    - Avoid heavy imports at module level
    - Use Jinja templating instead of Python loops for static DAGs
    - Limit the number of DAGs per file
    
    ### 2. Task Execution
    - Use appropriate task concurrency limits
    - Implement connection pooling for databases
    - Batch operations where possible
    - Use XCom sparingly (max 48KB per XCom)
    
    ### 3. Sensor Optimization
    ```python
    # Use reschedule mode for long-running sensors
    s3_sensor = S3KeySensor(
        task_id='wait_for_file',
        bucket_key='s3://bucket/key',
        mode='reschedule',  # Don't occupy worker slot
        poke_interval=300,  # Check every 5 minutes
        timeout=3600,       # 1 hour timeout
    )
    ```
    
    ## Security Best Practices
    
    ### 1. IAM Roles
    - Use separate execution role with minimal permissions
    - Implement role assumption for cross-account access
    - Avoid hardcoding credentials
    
    ### 2. Secrets Management
    ```python
    # Use Airflow connections and variables
    from airflow.models import Variable
    from airflow.hooks.base import BaseHook
    
    # Get connection
    conn = BaseHook.get_connection('my_db')
    
    # Get variable (with default)
    api_key = Variable.get('api_key', default_var='')
    
    # Use AWS Secrets Manager
    from airflow.providers.amazon.aws.hooks.secrets_manager import SecretsManagerHook
    hook = SecretsManagerHook(conn_id='aws_default')
    secret = hook.get_secret('my-secret')
    ```
    
    ### 3. Network Security
    - Use private subnets for MWAA
    - Implement VPC endpoints for AWS services
    - Configure security groups with minimal access
    
    ## Monitoring and Alerting
    
    ### 1. CloudWatch Integration
    - Monitor key metrics: CPU, memory, task duration
    - Set up alarms for failed tasks and DAG failures
    - Use custom metrics for business KPIs
    
    ### 2. Logging Best Practices
    ```python
    import logging
    
    logger = logging.getLogger(__name__)
    
    @task
    def process_data():
        logger.info("Starting data processing")
        try:
            # Process data
            logger.info(f"Processed {record_count} records")
        except Exception as e:
            logger.error(f"Processing failed: {str(e)}")
            raise
    ```
    
    ### 3. SLA Management
    ```python
    # Set SLA for critical tasks
    critical_task = PythonOperator(
        task_id='critical_process',
        python_callable=process_critical_data,
        sla=timedelta(hours=2),
    )
    
    # Define SLA miss callback
    def sla_miss_callback(dag, task_list, blocking_task_list, slas, blocking_tis):
        # Send notification
        pass
    
    dag.sla_miss_callback = sla_miss_callback
    ```
    
    ## Cost Optimization
    
    ### 1. Environment Management
    - Pause environments during non-business hours
    - Use smaller environments for dev/test
    - Clean up old logs and artifacts
    
    ### 2. Task Efficiency
    - Minimize task runtime
    - Use appropriate instance types
    - Batch small tasks together
    
    ### 3. Data Transfer
    - Process data in the same region
    - Use VPC endpoints to avoid NAT gateway costs
    - Compress data before transfer
    
    ## Common Pitfalls to Avoid
    
    1. **Top-level Code**: Avoid database queries or API calls at DAG parse time
    2. **Large XComs**: Don't pass large data through XCom
    3. **Dynamic DAGs**: Be careful with dynamic DAG generation performance
    4. **Missing Cleanup**: Always clean up temporary resources
    5. **Hardcoded Dates**: Use Airflow's execution_date context
    6. **Ignoring Idempotency**: Ensure all tasks can be safely re-run
    7. **Over-scheduling**: Don't schedule DAGs more frequently than needed
    8. **Resource Leaks**: Close connections and clean up resources
    
    ## MWAA-Specific Considerations
    
    ### 1. Limitations
    - No kubectl access to underlying Kubernetes
    - Limited pip packages (must be compatible with Amazon Linux)
    - Maximum environment size constraints
    - No direct database access
    
    ### 2. Migration Tips
    - Test DAGs in MWAA development environment
    - Verify all dependencies are available
    - Update connection strings and credentials
    - Plan for downtime during migration
    
    ### 3. Troubleshooting
    - Check CloudWatch logs for detailed errors
    - Verify S3 permissions and bucket policies
    - Ensure VPC configuration allows internet access (for PyPI)
    - Monitor environment health metrics
    
    Remember: Always test changes in a development environment before deploying to production!
    """
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It lists six content areas covered (DAG patterns, performance, security, etc.) providing scope transparency, but lacks operational details such as data source, caching behavior, or permission requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately concise with a clear first sentence stating purpose, followed by a structured bulleted list of topic areas. No extraneous information is included.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given zero input parameters and the existence of an output schema, the description adequately covers the tool's scope by enumerating specific best practice domains. It appropriately avoids duplicating return value documentation that belongs in the output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool accepts zero parameters, which per guidelines warrants a baseline score of 4. The schema confirms this with an empty properties object and additionalProperties: false.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves 'MWAA and Apache Airflow best practices guidance' and enumerates specific content domains. However, it does not explicitly differentiate from the sibling tool 'dag_design_guidance' despite overlapping content (DAG design patterns).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this comprehensive best practices tool versus the specific 'dag_design_guidance' sibling, nor are prerequisites or exclusions mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/paschmaria/mwaa-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server