Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| AWS_PROFILE | Yes | The AWS profile to use for credentials (e.g., consumersync). | |
| CONFLUENCE_PAT | Yes | Your Confluence Personal Access Token (PAT). | |
| EMR_LOG_BUCKET | Yes | The S3 bucket where EMR Serverless Spark logs are stored. | |
| MWAA_ENVIRONMENT_NAME | Yes | The name of the MWAA environment (e.g., dev, test, or prod). |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": false
} |
| prompts | {
"listChanged": false
} |
| resources | {
"subscribe": false,
"listChanged": false
} |
| experimental | {} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| server_health_check | Check connectivity to all configured services. Tests AWS credentials for each environment (dev/uat/test/prod), MWAA environments, EMR Serverless API, S3 log bucket access per env, S3 general access, Confluence PAT, and Azure DevOps connectivity. Use this FIRST to verify everything is connected before running other tools. Requires VPN and valid AWS credentials. Returns a status report for each service. |
| list_dags | List all DAGs in the MWAA environment. Args: env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. limit: Maximum number of DAGs to return (default 100). only_active: If True, show only unpaused DAGs. Returns a formatted table of DAGs with schedule interval and pause status. |
| list_dag_runs | Get DAG runs for today, yesterday, last week, or a specific date. This is the go-to tool when the user asks about DAG runs, processing status, or historical failures. Returns a numbered interactive list so the user can pick a specific run for deeper investigation. Args: env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. dag_id: Optional — filter to a specific DAG. If omitted, shows ALL DAGs. date: 'today' (default), 'yesterday', 'last_week', or ISO date (e.g. '2026-02-15'). limit: Max runs to return (default 50). Returns a numbered list of runs with status, timing and duration. Present these to the user so they can pick one by number. |
| get_dag_run_details | Get full details for a specific DAG run, including every task instance. Use this to see ALL tasks in a DAG run with their pass/fail status. For any failed tasks, the output includes ready-to-use get_task_log() hints. Common DAG task flow: start → create_arguments → check_inputs → initialise (creates EMR app) → processing → finalise. Args: dag_id: The DAG identifier. dag_run_id: The run ID (e.g. 'scheduled__2026-02-16T00:00:00+00:00' or 'manual__...'). env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. Returns formatted output showing each task with state, duration and try count. |
| get_task_log | Read the Airflow log for a specific task attempt. IMPORTANT for EMR debugging:
Args: dag_id: The DAG identifier. dag_run_id: The run ID. task_id: The task identifier. env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. try_number: Which attempt (default 1). tail_lines: Number of lines to return from the end (default 200). Returns the raw log text, trimmed to the last N lines. |
| trigger_dag | Manually trigger a DAG run. Args: dag_id: The DAG to trigger. env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. conf: Optional JSON string of DAG run configuration. Returns confirmation with the new run ID. |
| pause_dag | Pause a DAG — prevents future scheduled runs from triggering. The DAG will still be visible in the Airflow UI but won't execute on its schedule. Already-running DAG runs will continue to completion. Args: dag_id: The DAG to pause. env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. Returns confirmation of the pause action. |
| unpause_dag | Unpause a DAG — allows scheduled runs to trigger again. Args: dag_id: The DAG to unpause. env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. Returns confirmation of the unpause action. |
| clear_task_instance | Clear a task instance to retry it without re-triggering the entire DAG. Use this when a task failed due to a transient issue (e.g. network timeout, temporary S3 error) and you want to retry just that task and optionally all tasks downstream of it. Args: dag_id: The DAG identifier. dag_run_id: The run ID. task_id: The task to clear/retry. env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. include_downstream: If True, also clear all downstream tasks (default: False). Returns confirmation with the list of cleared task instances. |
| get_dag_source | Get the source code / definition of a DAG. Use this to understand what a DAG does — its tasks, operators, dependencies, schedule, and configuration. Useful for debugging or understanding a pipeline's structure. Args: dag_id: The DAG identifier. env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. Returns the DAG details including file location, schedule, tags, and task list. |
| get_dags_status_dashboard | Get a complete status dashboard of ALL DAGs — the go-to tool for any overview question. USE THIS TOOL when the user asks:
Shows every DAG with:
Args: env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. limit: Maximum number of DAGs to return (default 100). Returns a formatted status report with every DAG and its current health. |
| dag_analytics | Get run statistics and trend analysis for a specific DAG. Use this when the user asks about DAG reliability, performance trends, statistics, or historical patterns — "how's digital taxonomy been running?", "is this DAG stable?", "show me stats for HEM processing". Unlike list_dag_runs (flat list of individual runs), this tool provides:
Args: dag_id: The DAG identifier to analyse. env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. days: Number of days to look back (default: 14, max: 180). Returns a formatted analytics report with trends and patterns. |
| list_emr_applications | List all EMR Serverless applications. Note: DAGs create temporary EMR apps that are deleted after each run. If an app is not found here, it was already cleaned up — but job run details and S3 logs are still available via get_job_run_details and read_spark_driver_log using the application_id from the Airflow task log. Args: states: Optional comma-separated state filter (e.g. 'STARTED,CREATED'). env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. Returns a formatted list of applications with IDs, types and states. |
| list_job_runs | List job runs for an EMR Serverless application. Args: application_id: The EMR Serverless application ID. max_results: Max runs to return (default 30). states: Optional comma-separated state filter (e.g. 'SUCCESS,FAILED'). created_after: Optional ISO date — only runs after this date (e.g. '2026-02-16'). env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. Returns a list of job runs with status, timing and duration. |
| get_job_run_details | Get detailed information about a specific EMR Serverless job run. Shows the Spark submit config (entry point script, arguments), resource usage (vCPU hours, memory), and S3 log locations. Includes ready-to-use hints for read_spark_driver_log and browse_s3_logs. Args: application_id: The EMR Serverless application ID (from Airflow 'initialise' task log). job_run_id: The job run ID (from Airflow processing task log). env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. Returns comprehensive details: state, config, resource usage, S3 log paths. |
| read_spark_driver_log | Read the Spark driver log from S3 for an EMR Serverless job run. DEFAULT: Reads stdout.gz — this is the PRIMARY log containing Python print statements, row counts, file paths, and application errors. This is what you want 90% of the time. Use log_type='stderr' only when you need Spark framework logs (executor allocation, memory warnings, shuffle errors). Use read_both=True to get BOTH logs in one call (stdout first, then stderr filtered to ERROR lines only). How to find application_id and job_run_id:
Args: application_id: The EMR Serverless application ID (e.g. '00g16i3marao0c0t'). job_run_id: The job run ID (e.g. '00g16i5g2pm56o0v'). log_type: 'stdout' (default, Python app output) or 'stderr' (Spark framework logs). s3_log_uri: Optional full S3 URI to read directly (e.g. 's3://bucket/path/stdout.gz'). process_name: Optional folder name under spark-logs/ (e.g. 'stackadapt_main'). Speeds up log discovery. tail_lines: Number of lines from the end (default 300). Use -1 for all lines. search_text: Optional text to filter log lines (e.g. 'ERROR', 'Exception'). bucket: S3 bucket override (default from config). read_both: If True, read BOTH stdout and stderr in one call. stdout shown first, stderr filtered to ERROR lines. Returns the log content, optionally filtered and tailed. |
| browse_s3_logs | Browse the S3 log directory structure. Navigate into folders to find logs. Args: prefix: S3 prefix/path to browse (default: EMR_LOG_PREFIX from config, e.g. 'spark-logs/'). Use the output to navigate deeper, e.g. 'spark-logs/ttdgeo_metadata_SE/'. bucket: S3 bucket (default from config). max_items: Max items to show (default 50). env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. Returns a directory listing of the S3 prefix showing folders and files. |
| cancel_job_run | Cancel a running or pending EMR Serverless job run. Use this when a Spark job is stuck, taking too long, or was started with incorrect parameters. The cancellation is asynchronous — the job will transition to CANCELLING and then CANCELLED state. Args: application_id: The EMR Serverless application ID. job_run_id: The job run ID to cancel. env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. Returns confirmation of the cancellation request. |
| stop_emr_application | Stop an EMR Serverless application. If jobs are running, cancels them first. Smart flow:
Use force=True to skip the initial stop attempt and go straight to cancelling all jobs first (useful when you know jobs are running). Args: application_id: The EMR Serverless application ID. force: If True, cancel all running jobs first without trying to stop. env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. Returns a step-by-step report of what was done. |
| delete_emr_application | Delete an EMR Serverless application permanently. The application must be in STOPPED or CREATED state to be deleted. If the application is still running:
Args: application_id: The EMR Serverless application ID. force: If True, stop the app first (cancelling jobs) then delete. If False, only delete if already stopped. env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. Returns a step-by-step report of what was done. |
| read_s3_file | Read any file from S3 by its full URI and display in chat. Supports CSV, TXT, JSON, log files, .gz compressed files, and Parquet. Files larger than 5 MB are rejected to avoid crashing the server. For Parquet files: reads the file and displays the first N rows as a formatted table (default 50 rows). Parquet files are binary so they cannot be tailed or searched — use head_rows to control output. Args: s3_uri: Full S3 URI (e.g. 's3://bucket-name/path/to/file.csv'). tail_lines: Lines from the end for text files (default 100). -1 for all. search_text: Filter matching lines (text files only). head_rows: Rows to display for Parquet files (default 50). env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. Returns the file contents, optionally filtered and tailed. |
| get_emr_cost_summary | Get a summary of EMR Serverless resource usage and estimated costs. Aggregates vCPU hours, memory GB-hours, and storage GB-hours across recent job runs. Useful for understanding compute costs. Args: application_id: Optional — filter to one application. If omitted, scans all. days: Number of days to look back (default 7). env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. Returns a cost summary with per-job and total resource usage. |
| list_s3_buckets | List all S3 buckets in the AWS account. USE THIS TOOL when the user asks about S3 buckets, storage, "what buckets do we have", or anything about S3 at account level. Args: env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. Returns every bucket with its creation date, sorted alphabetically. |
| browse_s3 | Browse folders and files in any S3 bucket interactively. USE THIS TOOL when the user asks to see what's in an S3 bucket, list files in a folder, navigate S3, or explore any S3 path. Start with no prefix to see top-level folders, then drill into subfolders using the hints in the output. Args: bucket: S3 bucket name (required). prefix: S3 prefix/path to browse (default: root of bucket). Example: 'raw/hem_processing/' to see that folder. max_results: Max items to show (default 200). env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. Returns a directory listing showing folders and files with sizes and last modified times. |
| get_s3_object_info | Get detailed metadata for a single S3 object without downloading it. USE THIS TOOL when the user asks about a specific file's size, modification time, type, or other details. This is fast because it only reads the metadata header, not the file content. Args: s3_uri: Full S3 URI (e.g. 's3://bucket-name/path/to/file.csv'). env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. Returns file metadata: size, last modified, content type, storage class, encryption, and ETag. |
| list_s3_recursive | Recursively list ALL files under an S3 bucket/prefix in a single call. USE THIS TOOL when the user asks to see everything in a bucket or folder end-to-end, wants a full file listing, or needs to find files by name or extension across nested folders. Args: bucket: S3 bucket name (required). prefix: Starting prefix/folder (default: root of bucket). Example: 'raw/hem_processing/' to list that subtree. name_filter: Optional — case-insensitive substring match on filename. Example: 'taxonomy' shows only files with 'taxonomy' in the name. extension_filter: Optional — file extension to filter by (with or without dot). Example: '.csv' or 'csv' or '.parquet' or '.gz' max_results: Max files to return (default 500, max 2000). env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. Returns a full recursive file listing with sizes, plus a summary with total count, total size, and file type breakdown. |
| search_confluence | Search Confluence pages — the primary tool for finding documentation. USE THIS TOOL when the user says 'docs', 'documentation', 'wiki', 'runbook', 'find page about X', or any documentation-related question. Searches both page titles and content, ranked by relevance (same as the web UI). Args: query: The search text (e.g. 'Audience Engine'). Also supports raw CQL queries (e.g. "type=page AND title~'Walkthrough'"). space_key: Space to search in (default: ACTIVATE from config). ancestor_page_id: Optional parent page ID to scope search within a page tree. max_results: Max results to return (default 50). start: Pagination offset — skip this many results (default 0). Use for page 2, 3, etc. Returns a list of matching pages with titles, space, last modified, and URLs. Use get_page_content(page_id='...') to read any result. |
| get_page_content | Read the full content of a Confluence page. Args: page_id: The page ID (numeric). Provide this OR title. title: Page title to look up (slower, requires space_key). space_key: Space key (default from config). Required if using title. include_metadata: Include page metadata header (default True). Returns the page content converted to clean readable text. |
| get_child_pages | List all child pages of a given parent page. Args: page_id: The parent page ID. include_content: If True, include a short content preview for each child. max_results: Max children to return (default 50). Returns a list of child pages with titles and IDs. |
| get_space_pages | List all pages in a Confluence space. Args: space_key: Space key (default: ACTIVATE from config). max_results: Max pages per request (default 50, max 100). page_type: 'page' or 'blogpost' (default 'page'). start: Pagination offset (default 0). Returns a list of pages with titles, IDs, and URLs (paginated). |
| get_page_attachments | List all file attachments on a Confluence page. Args: page_id: The page ID. max_results: Max attachments to return (default 50). Returns a list of attachments with name, size, type, and download URL. |
| get_page_labels | Get all labels (tags) on a Confluence page. Args: page_id: The page ID. Returns a list of labels. Useful for finding related pages or understanding page categorization. |
| get_page_comments | Get comments on a Confluence page. Args: page_id: The page ID. max_results: Max comments to return (default 25). Returns comments with author, date, and content. Useful for understanding discussions and context around a page. |
| create_confluence_page | Create a new Confluence page. Use this to create incident reports, runbook entries, meeting notes, or any documentation. Content should be in simple HTML or plain text (which will be wrapped in HTML paragraphs). Args: title: The page title (must be unique within the space). body: The page content. Can be HTML or plain text. For plain text, paragraphs are separated by blank lines. space_key: Space key (default from config). parent_page_id: Optional parent page ID to nest under. Returns confirmation with the new page ID and URL. |
| update_confluence_page | Update an existing Confluence page's content. Args: page_id: The page ID to update. body: New page content (HTML or plain text). title: Optional new title. If omitted, keeps the existing title. append: If True, append to existing content instead of replacing. Returns confirmation of the update. |
| list_repos | List all Git repositories in the Azure DevOps project. USE THIS TOOL when the user asks about repos, code repositories, or wants to browse source code in Azure DevOps / TFS. Args: project: Project name (default: Activate from config). Returns a list of repositories with name, default branch, size, and URL. Use browse_repo(repo_name='...') to explore a repo's files. |
| browse_repo | Browse files and folders in a Git repository — like a directory listing. USE THIS TOOL when the user wants to see what files are in a repo, explore folder structure, or find a specific file path. Args: repo_name: Repository name (from list_repos). path: Folder path to browse (default '/' for root). Use forward slashes. branch: Branch name (default: repo's default branch). project: Project name (default from config). Returns a directory listing with file/folder names and paths. Use read_repo_file(repo_name='...', path='...') to read a file's content. |
| browse_repo_recursive | List ALL files in a Git repository recursively — the full file tree. USE THIS TOOL when the user asks 'what files are in this repo?', 'show me the whole repo structure', 'list all Python files', or needs to find correct file paths before reading. Much faster than calling browse_repo folder-by-folder. Args: repo_name: Repository name (from list_repos). path: Starting folder path (default '/' for entire repo). branch: Branch name (default: repo's default branch). extension_filter: Filter by file extension, e.g. '.py', '.json', '.yaml'. Case-insensitive. Only files matching this extension are shown. project: Project name (default from config). Returns a tree of ALL files and folders with full paths. Use read_repo_file(repo_name='...', path='...') to read any file. |
| read_repo_file | Read the content of a single file from a Git repository. USE THIS TOOL when the user wants to see source code, config, or content of a specific file. Get the path from browse_repo first. Args: repo_name: Repository name. path: Full file path (e.g. '/src/main.py'). Use forward slashes. branch: Branch name (default: repo's default branch). project: Project name (default from config). Returns the raw file content with metadata header. |
| get_current_sprint | Get the current (active) sprint/iteration with dates and summary. USE THIS TOOL when the user asks 'what sprint are we in?', 'current iteration', 'sprint dates', 'when does the sprint end?', or any question about the active sprint. Args: project: Project name (default from config). team: Team name (default from config). Returns current sprint name, date range, and time remaining. Use get_sprint_work_items() to see what's in the sprint. |
| get_sprint_work_items | Get all work items (PBIs, Tasks, Bugs) in a sprint with full details. USE THIS TOOL when the user asks 'what's in the sprint?', 'sprint board', 'who's working on what?', 'sprint details', or wants to see sprint content. Shows each work item with type, title, state, assigned person, effort/story points. Groups items by type (PBI, Task, Bug) with totals. Args: iteration_path: Full iteration path (e.g. 'Activate\Sprint 23 Q4 FY26'). If omitted, uses the current sprint automatically. project: Project name (default from config). team: Team name (default from config). Returns all work items grouped by type with assignee, state, effort, and priority. Use get_work_item_details(work_item_id=...) for full details of any item. |
| get_work_item_details | Get full details of a single work item (PBI, Task, Bug, Feature, Epic). USE THIS TOOL when the user asks about a specific work item by ID, wants the description, acceptance criteria, history, or full details. Args: work_item_id: The numeric work item ID (e.g. 12345). project: Project name (default from config). Returns complete work item details including description, acceptance criteria, parent/child links, and all metadata fields. |
| get_backlog | Get backlog items (PBIs/User Stories not in the current sprint). USE THIS TOOL when the user asks about the backlog, upcoming work, 'what's next?', 'what's not in the sprint?', or items waiting to be picked up. Args: project: Project name (default from config). team: Team name (default from config). max_results: Maximum items to return (default 50). Returns backlog items ordered by priority with state and effort. |
| diagnose_dag_failure | One-shot diagnosis of a failed DAG run. This tool does everything automatically:
This replaces the need to call 5-6 tools manually. Args: dag_id: The DAG to diagnose (e.g. 'ttdcustom_processing'). env: Target environment — 'dev', 'uat', 'test', or 'prod'. IMPORTANT: Do NOT guess or default. Ask the user which environment if not specified. date: Optional date to check (ISO format or 'yesterday'). Default: today. Returns a comprehensive failure report with root cause analysis. |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |