Skip to main content
Glama
zachegner

EPA Envirofacts MCP Server

by zachegner

environmental_summary_by_location

Retrieve environmental data for any U.S. location, including nearby regulated facilities, chemical releases, water quality violations, and hazardous waste sites within a specified radius.

Instructions

Get comprehensive environmental data for a location.

Provides environmental summary including nearby regulated facilities, chemical releases, water quality violations, and hazardous waste sites within a specified radius.

Args: location: Address, city, or ZIP code radius_miles: Search radius in miles (default: 5.0)

Returns: Comprehensive environmental summary

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
locationYes
radius_milesNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
locationYesOriginal location query
coordinatesNoResolved coordinates
data_sourcesNoEPA data sources queried
radius_milesYesSearch radius used
summary_statsNoAdditional summary statistics
water_systemsNoWater systems in area
top_facilitiesNoTop facilities within radius
facility_countsNoCount of facilities by type
hazardous_sitesNoRCRA hazardous waste sites
query_timestampNoWhen the query was executed
total_facilitiesNoTotal facilities found
total_violationsNoTotal active violations
water_violationsNoActive water violations
chemical_releasesNoChemical release summary
total_hazardous_sitesNoTotal hazardous waste sites

Implementation Reference

  • The FastMCP @mcp.tool() handler function named 'environmental_summary_by_location' that defines the tool interface and delegates to the core implementation.
    async def environmental_summary_by_location(
        location: str,
        radius_miles: float = 5.0
    ) -> EnvironmentalSummary:
        """Get comprehensive environmental data for a location.
        
        Provides environmental summary including nearby regulated facilities, chemical releases,
        water quality violations, and hazardous waste sites within a specified radius.
        
        Args:
            location: Address, city, or ZIP code
            radius_miles: Search radius in miles (default: 5.0)
            
        Returns:
            Comprehensive environmental summary
        """
        return await get_environmental_summary_by_location(location, radius_miles)
  • Core helper function implementing the full tool logic: geocoding, state-based EPA API queries, client-side distance filtering, facility aggregation, ranking, and EnvironmentalSummary formatting.
    async def get_environmental_summary_by_location(
        location: str,
        radius_miles: float = 5.0
    ) -> EnvironmentalSummary:
        """Get comprehensive environmental data for a location.
        
        This tool provides a comprehensive environmental summary for any location in the United States,
        including nearby regulated facilities, chemical releases, water quality violations, and hazardous
        waste sites within a specified radius.
        
        Args:
            location: Address, city, or ZIP code (e.g., "New York, NY", "10001", "Los Angeles, CA")
            radius_miles: Search radius in miles (default: 5.0, max: 100.0)
        
        Returns:
            EnvironmentalSummary containing:
            - Location coordinates and search parameters
            - Count of facilities by type (TRI, RCRA, SDWIS, FRS)
            - Top facilities ranked by distance
            - Water systems and active violations
            - Chemical release summary with top chemicals
            - Hazardous waste sites
            - Summary statistics
        
        Raises:
            ValueError: If location cannot be geocoded or parameters are invalid
            Exception: If EPA API queries fail
        
        Example:
            >>> summary = await get_environmental_summary_by_location("10001", 3.0)
            >>> print(f"Found {summary.total_facilities} facilities")
            >>> print(f"Active violations: {summary.total_violations}")
        """
        # Validate input parameters
        if not location or not location.strip():
            raise ValueError("Location cannot be empty")
        
        if not (0.1 <= radius_miles <= 100.0):
            raise ValueError("Radius must be between 0.1 and 100.0 miles")
        
        location = location.strip()
        
        try:
            logger.info(f"Getting environmental summary for {location} (radius: {radius_miles} miles)")
            
            # Step 1: Enhanced geocoding to get coordinates and state
            try:
                location_info = await geocode_location(location)
                coordinates = location_info.coordinates
                state_code = location_info.state_code
                
                logger.info(f"Geocoded {location} to {location_info}")
                
                if not state_code:
                    raise ValueError(f"Could not determine state from location '{location}'. Please try a different address, city, or ZIP code.")
                    
            except ValueError as e:
                logger.error(f"Geocoding failed for {location}: {e}")
                raise ValueError(f"Could not find location '{location}'. Please try a different address, city, or ZIP code.")
            
            # Step 2: State-based queries (no bounding box needed)
            logger.info(f"Querying EPA data sources for state {state_code}...")
            
            # Initialize EPA API clients
            async with FRSClient() as frs_client, \
                       TRIClient() as tri_client, \
                       SDWISClient() as sdwis_client, \
                       RCRAClient() as rcra_client:
                
                # Execute all state-based queries in parallel
                results = await asyncio.gather(
                    # FRS facilities
                    frs_client.get_facilities_by_state(state_code, limit=1000),
                    
                    # TRI facilities and releases (use 2022 - latest available)
                    tri_client.get_tri_facilities_by_state(state_code, year=2022, limit=1000),
                    tri_client.get_tri_releases_by_state(state_code, year=2022, limit=1000),
                    
                    # SDWIS water systems (violations disabled due to API issues)
                    sdwis_client.get_water_systems_by_state(state_code, limit=1000),
                    # sdwis_client.get_violations_by_state(state_code, active_only=True, limit=1000),
                    
                    # RCRA hazardous waste sites
                    rcra_client.get_rcra_sites_by_state(state_code, limit=1000),
                    
                    return_exceptions=True
                )
                
                # Unpack results
                frs_facilities = results[0] if not isinstance(results[0], Exception) else []
                tri_facilities = results[1] if not isinstance(results[1], Exception) else []
                tri_releases = results[2] if not isinstance(results[2], Exception) else []
                water_systems = results[3] if not isinstance(results[3], Exception) else []
                water_violations = []  # Disabled due to API issues
                rcra_sites = results[4] if not isinstance(results[4], Exception) else []
                
                # Log any errors
                for i, result in enumerate(results):
                    if isinstance(result, Exception):
                        logger.warning(f"EPA API query {i} failed: {result}")
            
            # Step 3: Filter facilities by distance (client-side filtering)
            logger.info("Filtering facilities by distance...")
            filtered_frs = filter_by_distance(frs_facilities, coordinates, radius_miles)
            filtered_tri = filter_by_distance(tri_facilities, coordinates, radius_miles)
            filtered_rcra = filter_by_distance(rcra_sites, coordinates, radius_miles)
            
            # For facilities without coordinates, include them if they're in the same state
            # (since we queried by state, they're already geographically relevant)
            for facility in frs_facilities:
                if facility.coordinates is None and facility.state == state_code:
                    facility.distance_miles = None  # Unknown distance
                    if facility not in filtered_frs:
                        filtered_frs.append(facility)
            
            for facility in tri_facilities:
                if facility.coordinates is None and facility.state == state_code:
                    facility.distance_miles = None  # Unknown distance
                    if facility not in filtered_tri:
                        filtered_tri.append(facility)
            
            for facility in rcra_sites:
                if facility.coordinates is None and facility.state == state_code:
                    facility.distance_miles = None  # Unknown distance
                    if facility not in filtered_rcra:
                        filtered_rcra.append(facility)
            
            # Filter water systems by distance
            filtered_water_systems = []
            for system in water_systems:
                if system.coordinates:
                    from ..utils.distance import haversine_distance
                    distance = haversine_distance(coordinates, system.coordinates)
                    if distance <= radius_miles:
                        system.distance_miles = distance
                        filtered_water_systems.append(system)
                elif system.state == state_code:
                    # Include water systems without coordinates if in same state
                    system.distance_miles = None
                    filtered_water_systems.append(system)
            
            # Step 4: Aggregate facilities
            logger.info("Aggregating facility data...")
            all_facilities = aggregate_facilities(
                filtered_frs, filtered_tri, filtered_rcra, filtered_water_systems
            )
            
            # Step 5: Rank facilities by distance (top 50)
            top_facilities = rank_facilities(all_facilities, limit=50)
            
            # Step 6: Build environmental summary
            logger.info("Building environmental summary...")
            summary = format_environmental_summary(
                location=location,
                coordinates=coordinates,
                radius_miles=radius_miles,
                facilities=top_facilities,
                water_systems=filtered_water_systems,
                water_violations=water_violations,
                chemical_releases=tri_releases,
                hazardous_sites=filtered_rcra
            )
            
            logger.info(f"Environmental summary complete: {summary.total_facilities} facilities, "
                       f"{summary.total_violations} violations, {summary.total_hazardous_sites} hazardous sites")
            
            return summary
            
        except ValueError:
            # Re-raise validation errors
            raise
        except Exception as e:
            logger.error(f"Failed to get environmental summary for {location}: {e}")
            raise Exception(f"Failed to retrieve environmental data for {location}: {e}")
  • Pydantic BaseModel defining the output schema returned by the tool, including location info, facility counts, top facilities, water violations, chemical releases, hazardous sites, and summary stats.
    class EnvironmentalSummary(BaseModel):
        """Comprehensive environmental data summary for a location."""
        
        # Location information
        location: str = Field(..., description="Original location query")
        coordinates: Optional[Coordinates] = Field(None, description="Resolved coordinates")
        radius_miles: float = Field(..., description="Search radius used")
        
        # Facility counts by type
        facility_counts: Dict[str, int] = Field(default_factory=dict, description="Count of facilities by type")
        total_facilities: int = Field(default=0, ge=0, description="Total facilities found")
        
        # Top facilities (ranked by distance)
        top_facilities: List[FacilityInfo] = Field(default_factory=list, description="Top facilities within radius")
        
        # Water system information
        water_systems: List[WaterSystem] = Field(default_factory=list, description="Water systems in area")
        water_violations: List[WaterViolation] = Field(default_factory=list, description="Active water violations")
        total_violations: int = Field(default=0, ge=0, description="Total active violations")
        
        # Chemical release information
        chemical_releases: ReleaseSummary = Field(default_factory=lambda: ReleaseSummary(
            total_facilities=0,
            total_chemicals=0,
            total_releases=0.0,
            reporting_year=2023
        ), description="Chemical release summary")
        
        # Hazardous waste sites
        hazardous_sites: List[FacilityInfo] = Field(default_factory=list, description="RCRA hazardous waste sites")
        total_hazardous_sites: int = Field(default=0, ge=0, description="Total hazardous waste sites")
        
        # Summary statistics
        summary_stats: Dict[str, str | int | float | bool] = Field(default_factory=dict, description="Additional summary statistics")
        
        # Metadata
        query_timestamp: Optional[str] = Field(None, description="When the query was executed")
        data_sources: List[str] = Field(default_factory=list, description="EPA data sources queried")
        
        def __str__(self) -> str:
            return f"Environmental Summary for {self.location} ({self.total_facilities} facilities)"
  • Pydantic BaseModel defining input parameters matching the tool signature (location: str, radius_miles: float), with validation.
    class LocationParams(BaseModel):
        """Parameters for location-based queries."""
        
        location: str = Field(..., min_length=1, max_length=200, description="Address, city, or ZIP code")
        radius_miles: float = Field(default=5.0, ge=0.1, le=100.0, description="Search radius in miles")
        
        @field_validator('location')
        @classmethod
        def validate_location(cls, v):
            """Basic validation for location string."""
            if not v.strip():
                raise ValueError('Location cannot be empty')
            return v.strip()
  • Registration function that defines and registers the MCP tool using @mcp.tool() decorator, called from server.py.
    def register_tool(mcp: FastMCP):
        """Register the environmental summary tool with FastMCP.
        
        Args:
            mcp: FastMCP instance
        """
        @mcp.tool()
        async def environmental_summary_by_location(
            location: str,
            radius_miles: float = 5.0
        ) -> EnvironmentalSummary:
            """Get comprehensive environmental data for a location.
            
            Provides environmental summary including nearby regulated facilities, chemical releases,
            water quality violations, and hazardous waste sites within a specified radius.
            
            Args:
                location: Address, city, or ZIP code
                radius_miles: Search radius in miles (default: 5.0)
                
            Returns:
                Comprehensive environmental summary
            """
            return await get_environmental_summary_by_location(location, radius_miles)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the tool 'Provides environmental summary' but doesn't specify whether this is a read-only operation, potential rate limits, data freshness, or error conditions. The description is functional but lacks critical behavioral context for a tool with no annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded with the core purpose first, followed by details. The Args and Returns sections are well-structured. Minor improvement could be made by integrating the parameter explanations more seamlessly rather than as separate sections.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema, the description doesn't need to detail return values. However, for a tool with no annotations and 2 parameters, the description provides adequate purpose and parameter context but lacks behavioral transparency about how the tool operates, which is a significant gap for a data retrieval tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description compensates well by explaining both parameters in the Args section: 'location: Address, city, or ZIP code' clarifies the format beyond just 'string', and 'radius_miles: Search radius in miles (default: 5.0)' provides units and default value. This adds meaningful context that the bare schema lacks.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Get comprehensive environmental data') and resource ('for a location'), distinguishing it from siblings by specifying it provides a multi-faceted environmental summary rather than focused data on chemical releases, compliance history, or facility searches.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by listing the types of environmental data included, but doesn't explicitly state when to use this tool versus alternatives like get_chemical_release_data or get_facility_compliance_history_tool. It provides some guidance through the data scope but lacks explicit comparisons or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/zachegner/envirofacts-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server