Skip to main content
Glama
Brucedh

AWS‑IReveal‑MCP

athena_query_events

Query AWS CloudTrail logs with Athena to search for specific events, users, IPs, or resources using granular filters for security analysis.

Instructions

Query Athena for granular granular searches on CloudTrail logs.

<IMPORTANT>
Before calling this tool, you must call the athena_create_cloudtrail_table tool to create the table.
If the user asks for a different region, you must call the athena_create_cloudtrail_table tool to create the table in that region.
</IMPORTANT>

Parameters:
  aws_region: The AWS region - use 'us-east-1' if not specified.
  start_time: ISO string of the start time
  end_time: ISO string of the end time
  database: Athena database name to be used - use 'default' if not specified.
  output_bucket: S3 bucket URI (e.g. 's3://my-athena-query-results/') for query results - different from cloudtrail_bucket.
  event_names: List of event names to filter on (e.g. ["GetObject", "DeleteObject"])
  principal_id: Optional principal ID to filter on. Use the percent sign (%) as a wildcard character.
  user_arn: Optional user ARN to filter on. Use the percent sign (%) as a wildcard character. This is the ARN of the user performing the action.
  user_name: Optional user name to filter on. This is the name of the user performing the action.
  role_name: Optional role name to filter on. This is the name of the role assumed by the user performing the action.
  ip_address: Optional IP address to filter on. Use the percent sign (%) as a wildcard character. This is the IP address of the user performing the action.
  bucket_name: Optional bucket name to filter on. Use the percent sign (%) as a wildcard character.
  limit: Maximum number of results to return (default is 50).

Returns:
  str: JSON-formatted result of the Athena query.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
aws_regionYes
start_timeYes
end_timeYes
databaseYes
output_bucketYes
event_namesYes
principal_idNo
user_arnNo
user_nameNo
role_nameNo
ip_addressNo
bucket_nameNo
limitNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The primary handler function for the 'athena_query_events' tool. It is decorated with @mcp.tool() which serves as both the handler implementation and registration in FastMCP. Constructs a dynamic SQL query for Athena on CloudTrail logs, filtering by time range, event names, and optional fields like principal_id, user_arn, etc. Executes via helper run_athena_query and returns JSON results. The function signature and docstring define the input schema.
    @mcp.tool()
    async def athena_query_events(
        aws_region: str,
        start_time: str, 
        end_time: str, 
        database: str,
        output_bucket: str,
        event_names: list,
        principal_id: str = None,
        user_arn: str = None,
        user_name: str = None,
        role_name: str = None,
        ip_address: str = None,
        bucket_name: str = None,
        limit: int = 50,
    ) -> str:
        """
        Query Athena for granular granular searches on CloudTrail logs.
        
        <IMPORTANT>
        Before calling this tool, you must call the athena_create_cloudtrail_table tool to create the table.
        If the user asks for a different region, you must call the athena_create_cloudtrail_table tool to create the table in that region.
        </IMPORTANT>
    
        Parameters:
          aws_region: The AWS region - use 'us-east-1' if not specified.
          start_time: ISO string of the start time
          end_time: ISO string of the end time
          database: Athena database name to be used - use 'default' if not specified.
          output_bucket: S3 bucket URI (e.g. 's3://my-athena-query-results/') for query results - different from cloudtrail_bucket.
          event_names: List of event names to filter on (e.g. ["GetObject", "DeleteObject"])
          principal_id: Optional principal ID to filter on. Use the percent sign (%) as a wildcard character.
          user_arn: Optional user ARN to filter on. Use the percent sign (%) as a wildcard character. This is the ARN of the user performing the action.
          user_name: Optional user name to filter on. This is the name of the user performing the action.
          role_name: Optional role name to filter on. This is the name of the role assumed by the user performing the action.
          ip_address: Optional IP address to filter on. Use the percent sign (%) as a wildcard character. This is the IP address of the user performing the action.
          bucket_name: Optional bucket name to filter on. Use the percent sign (%) as a wildcard character.
          limit: Maximum number of results to return (default is 50).
    
        Returns:
          str: JSON-formatted result of the Athena query.
        """
        # Construct an SQL query
        event_filter = ", ".join([f"'{name}'" for name in event_names])
        events_comment = session_comment = principal_comment = ip_comment = bucket_comment = user_comment = arn_comment = "--"
        if event_names:
            events_comment = ""
        if role_name:
            session_comment = ""
        if principal_id:
            principal_comment = ""
        if user_arn:
            arn_comment = ""
        if user_name:
            user_comment = ""
        if ip_address:
            ip_comment = ""
        if bucket_name:
            bucket_comment = ""
                
        try:
            # Convert ISO format timestamps to Athena-compatible format
            start_dt = datetime.datetime.fromisoformat(start_time.replace('T', ' ').replace('Z', ''))
            end_dt = datetime.datetime.fromisoformat(end_time.replace('T', ' ').replace('Z', ''))
            
            query = f"""
            WITH flat_logs AS (
                SELECT
                    eventTime,
                    eventName,
                    userIdentity.principalId,
                    userIdentity.arn,
                    userIdentity.userName,
                    userIdentity.sessionContext.sessionIssuer.userName as sessionUserName,
                    sourceIPAddress,
                    eventSource,
                    json_extract_scalar(requestParameters, '$.bucketName') as bucketName,
                    json_extract_scalar(requestParameters, '$.key') as object
                FROM {athena_table}
            )
            SELECT *
            FROM flat_logs
            WHERE date(from_iso8601_timestamp(eventTime)) BETWEEN timestamp '{start_dt}' AND timestamp '{end_dt}'
            {events_comment}AND eventName IN ({event_filter})
            {user_comment}AND userName = '{user_name}'
            {session_comment}AND sessionUserName = '{role_name}'
            {principal_comment}AND principalId LIKE '{principal_id}'
            {arn_comment}AND arn LIKE '{user_arn}'
            {ip_comment}AND sourceIPAddress LIKE '{ip_address}' 
            {bucket_comment}AND bucketName LIKE '{bucket_name}' 
            LIMIT {limit};
            """
        except ValueError as e:
            return f"Error processing timestamp formats: {str(e)}"
        try:
            athena_client = boto3.client('athena', region_name=aws_region)
            result = await run_athena_query(athena_client, database, output_bucket, query)
        except Exception as e:
            return f"Error querying Athena: {str(e)}"
        return result
  • Supporting helper function 'run_athena_query' called by athena_query_events to start, poll, and retrieve results from an Athena query execution, parsing results into JSON format.
    async def run_athena_query(athena_client, database: str, output_bucket: str, query: str):
        response = athena_client.start_query_execution(
            QueryString=query,
            QueryExecutionContext={'Database': database},
            ResultConfiguration={'OutputLocation': output_bucket}
        )
        query_execution_id = response['QueryExecutionId']
    
        # Wait for the query to complete
        max_attempts = 10
        attempts = 0
        while attempts < max_attempts:
            status = athena_client.get_query_execution(QueryExecutionId=query_execution_id)
            state = status['QueryExecution']['Status']['State']
            if state in ['SUCCEEDED', 'FAILED', 'CANCELLED']:
                break
            time.sleep(3)
            attempts += 1
        if state != 'SUCCEEDED':
            raise Exception(f"Athena query {query_execution_id} did not succeed; state: {state}")
        
        # Retrieve query results
        results_paginator = athena_client.get_paginator('get_query_results')
        result_rows = []
        for page in results_paginator.paginate(QueryExecutionId=query_execution_id):
            result_rows.extend(page['ResultSet']['Rows'])
        # Check if there are results
        if len(result_rows) < 2:
            return "No results found."
        # Extract the header and data
        # Process rows (skipping header row) and convert to JSON
        header = [col['VarCharValue'] for col in result_rows[0]['Data']]
        data = []
        for row in result_rows[1:]:
            row_data = {}
            for idx, col in enumerate(row['Data']):
                row_data[header[idx]] = col.get('VarCharValue', '')
            data.append(row_data)
        return json.dumps(data, indent=2)
  • Global variable 'athena_table' used by athena_query_events to reference the CloudTrail logs table name (set in athena_create_cloudtrail_table tool).
    athena_table = None
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions prerequisites and that results are JSON-formatted, but lacks details on error handling, rate limits, authentication needs, or query execution time. It partially compensates by noting the output format and dependencies, but more behavioral context would be helpful for a mutation/query tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a clear purpose statement, important prerequisites in a dedicated section, and organized parameter explanations. It's appropriately sized for a complex tool but could be slightly more concise by integrating the parameter list more seamlessly. Every sentence adds value, though minor trimming is possible.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (13 parameters, no annotations, but has an output schema), the description is largely complete. It covers prerequisites, parameter semantics, and notes the JSON return format. However, it lacks details on error cases, performance implications, or how results are paginated/limited, leaving some gaps for a query tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, so the description must fully compensate. It provides detailed semantics for all 13 parameters, including defaults (e.g., 'use 'us-east-1' if not specified'), examples (e.g., 's3://my-athena-query-results/'), wildcard usage (percent sign %), and clarifications (e.g., 'different from cloudtrail_bucket'). This adds significant value beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Query Athena for granular searches on CloudTrail logs.' It specifies the verb ('query'), resource ('Athena'), and target data ('CloudTrail logs'), making the function evident. However, it doesn't explicitly differentiate from sibling tools like 'cloudtrail_lookup_events' or 'athena_create_cloudtrail_table', which would be needed for a score of 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidelines in the <IMPORTANT> section, stating prerequisites (must call 'athena_create_cloudtrail_table' first, and again for different regions). This clearly defines when to use this tool versus alternatives, including specific conditions and dependencies.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Brucedh/aws-ireveal-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server