Skip to main content
Glama

ols_with_spatial_diagnostics_safe

Perform Ordinary Least Squares regression with spatial diagnostics to analyze geographic data relationships while detecting spatial autocorrelation patterns.

Instructions

Safe MCP pipeline: Read shapefile, build/load W, convert numeric, check NaNs, run OLS.

Parameters:

  • data_path: path to shapefile or GeoPackage

  • y_field: dependent variable column name

  • x_fields: list of independent variable column names

  • weights_path: optional path to existing weights file (.gal or .gwt)

  • weights_method: 'queen', 'rook', 'distance_band', or 'knn' (used if weights_path not provided)

  • id_field: optional attribute name to use as observation IDs

  • threshold: required if method='distance_band'

  • k: required if method='knn'

  • binary: True for binary weights (DistanceBand only)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
data_pathYes
y_fieldYes
x_fieldsYes
weights_pathNo
weights_methodNoqueen
id_fieldNo
thresholdNo
kNo
binaryNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • Handler function implementing the OLS regression tool with spatial diagnostics using libpysal and spreg libraries. Includes data loading, weight construction, model fitting, and spatial diagnostics (Moran's I on residuals). Registered via @gis_mcp.tool() decorator.
    @gis_mcp.tool()
    def ols_with_spatial_diagnostics_safe(
        data_path: str,
        y_field: str,
        x_fields: List[str],
        weights_path: Optional[str] = None,
        weights_method: str = "queen",
        id_field: Optional[str] = None,
        threshold: Optional[float] = None,
        k: Optional[int] = None,
        binary: bool = True
    ) -> Dict[str, Any]:
        """
        Safe MCP pipeline: Read shapefile, build/load W, convert numeric, check NaNs, run OLS.
    
        Parameters:
        - data_path: path to shapefile or GeoPackage
        - y_field: dependent variable column name
        - x_fields: list of independent variable column names
        - weights_path: optional path to existing weights file (.gal or .gwt)
        - weights_method: 'queen', 'rook', 'distance_band', or 'knn' (used if weights_path not provided)
        - id_field: optional attribute name to use as observation IDs
        - threshold: required if method='distance_band'
        - k: required if method='knn'
        - binary: True for binary weights (DistanceBand only)
        """
        try:
            # --- Step 1: Read data ---
            if not os.path.exists(data_path):
                return {"status": "error", "message": f"Data file not found: {data_path}"}
    
            gdf = gpd.read_file(data_path)
            if gdf.empty:
                return {"status": "error", "message": "Input file contains no features"}
    
            # --- Step 2: Extract and convert y and X ---
            if y_field not in gdf.columns:
                return {"status": "error", "message": f"Dependent variable '{y_field}' not found in dataset"}
            if any(xf not in gdf.columns for xf in x_fields):
                return {"status": "error", "message": f"Independent variable(s) {x_fields} not found in dataset"}
    
            y = gdf[[y_field]].astype(float).values
            X = gdf[x_fields].astype(float).values
    
            # --- Step 3: Check for NaNs or infinite values ---
            if not np.all(np.isfinite(y)):
                return {"status": "error", "message": "Dependent variable contains NaN or infinite values"}
            if not np.all(np.isfinite(X)):
                return {"status": "error", "message": "Independent variables contain NaN or infinite values"}
    
            # --- Step 4: Load or build weights ---
            import libpysal
            if weights_path:
                if not os.path.exists(weights_path):
                    return {"status": "error", "message": f"Weights file not found: {weights_path}"}
                w = libpysal.open(weights_path).read()
            else:
                coords = [(geom.x, geom.y) for geom in gdf.geometry]
                wm = weights_method.lower()
                if wm == "queen":
                    w = libpysal.weights.Queen.from_dataframe(gdf, idVariable=id_field)
                elif wm == "rook":
                    w = libpysal.weights.Rook.from_dataframe(gdf, idVariable=id_field)
                elif wm == "distance_band":
                    if threshold is None:
                        return {"status": "error", "message": "Threshold is required for distance_band"}
                    ids = gdf[id_field].tolist() if id_field and id_field in gdf.columns else None
                    w = libpysal.weights.DistanceBand(coords, threshold=threshold, binary=binary, ids=ids)
                elif wm == "knn":
                    if k is None:
                        return {"status": "error", "message": "k is required for knn"}
                    ids = gdf[id_field].tolist() if id_field and id_field in gdf.columns else None
                    w = libpysal.weights.KNN(coords, k=k, ids=ids)
                else:
                    return {"status": "error", "message": f"Unsupported weights method: {weights_method}"}
    
            w.transform = "r"  # Row-standardize for regression
    
            # --- Step 5: Fit OLS with spatial diagnostics ---
            try:
                from spreg import OLS
            except ModuleNotFoundError:
                return {"status": "error", "message": "The 'spreg' package is not installed. Install with: pip install spreg"}
            
            # Ensure y is shape (n, 1) and X is (n, k) for spreg
            y = y.reshape(-1, 1) if len(y.shape) == 1 else y
            if X.shape[1] != len(x_fields):
                X = X.T if X.shape[0] == len(x_fields) else X
            
            # Run OLS with spatial diagnostics
            ols_model = OLS(y, X, w=w, spat_diag=True, name_y=y_field, name_x=x_fields, name_ds="dataset")
    
            # --- Step 6: Collect results ---
            # Extract Moran's I for residuals if available (from diagnostics)
            moran_residual = None
            moran_pvalue = None
            if hasattr(ols_model, "diagnostics") and ols_model.diagnostics is not None:
                diag = ols_model.diagnostics
                # Check for Moran's I in diagnostics
                if isinstance(diag, dict):
                    if "moran_res" in diag and diag["moran_res"] is not None:
                        mor_res = diag["moran_res"]
                        if len(mor_res) >= 2:
                            moran_residual = float(mor_res[0])
                            moran_pvalue = float(mor_res[1])
            
            # Extract beta names
            beta_names = []
            if hasattr(ols_model, "name_x") and ols_model.name_x:
                beta_names = list(ols_model.name_x)
            else:
                beta_names = [f"x_{i}" for i in range(len(x_fields))]
            
            # Add constant to names if present
            has_constant = hasattr(ols_model, "constant") and ols_model.constant
            if has_constant:
                beta_names = ["constant"] + beta_names
            
            # Extract betas
            betas_dict = {}
            if hasattr(ols_model, "betas") and ols_model.betas is not None:
                betas_flat = ols_model.betas.flatten()
                # Match betas to names
                for i, beta_val in enumerate(betas_flat):
                    if i < len(beta_names):
                        betas_dict[beta_names[i]] = float(beta_val)
                    else:
                        betas_dict[f"beta_{i}"] = float(beta_val)
            
            # Extract standard errors
            std_err_list = []
            if hasattr(ols_model, "std_err") and ols_model.std_err is not None:
                std_err_list = ols_model.std_err.tolist() if hasattr(ols_model.std_err, "tolist") else list(ols_model.std_err)
            
            results = {
                "n_obs": int(ols_model.n) if hasattr(ols_model, "n") else int(len(y)),
                "r2": float(ols_model.r2) if hasattr(ols_model, "r2") else None,
                "std_error": std_err_list,
                "betas": betas_dict,
                "moran_residual": moran_residual,
                "moran_pvalue": moran_pvalue,
            }
    
            return {
                "status": "success",
                "message": "OLS regression with spatial diagnostics completed successfully",
                "result": results,
                "regression_results": results  # Also include as regression_results for test compatibility
            }
    
        except Exception as e:
            logger.error(f"Error in ols_with_spatial_diagnostics_safe: {str(e)}")
            return {"status": "error", "message": f"Failed to run OLS regression: {str(e)}"}
  • The @gis_mcp.tool() decorator registers this function as an MCP tool named 'ols_with_spatial_diagnostics_safe'. The FastMCP instance 'gis_mcp' is defined in src/gis_mcp/mcp.py.
    @gis_mcp.tool()
  • Helper function for loading data and creating spatial weights, used in several esda tools but not directly in this OLS tool.
    def pysal_load_data(shapefile_path: str, dependent_var: str, target_crs: str, distance_threshold: float):
        """Common loader and weight creation for esda statistics."""
        if not os.path.exists(shapefile_path):
            return None, None, None, None, f"Shapefile not found: {shapefile_path}"
    
        gdf = gpd.read_file(shapefile_path)
        if dependent_var not in gdf.columns:
            return None, None, None, None, f"Dependent variable '{dependent_var}' not found in shapefile columns"
    
        gdf = gdf.to_crs(target_crs)
    
        effective_threshold = distance_threshold
        unit = "meters"
        if target_crs.upper() == "EPSG:4326":
            effective_threshold = distance_threshold / 111000
            unit = "degrees"
    
        y = gdf[dependent_var].values.astype(np.float64)
        import libpysal
        w = libpysal.weights.DistanceBand.from_dataframe(gdf, threshold=effective_threshold, binary=False)
        w.transform = 'r'
    
        for island in w.islands:
            w.weights[island] = [0] * len(w.weights[island])
            w.cardinalities[island] = 0
    
        return gdf, y, w, (effective_threshold, unit), None
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It mentions the 'safe' pipeline aspect but doesn't explain what makes it safe (e.g., error handling, validation steps). It doesn't disclose performance characteristics, memory usage, output format, or what happens with invalid inputs. The description focuses on parameter requirements rather than behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized but not optimally structured. The first sentence efficiently summarizes the pipeline, but the parameter section uses inconsistent formatting (some parameters get explanations in parentheses, others don't). The 'binary' parameter explanation is tucked at the end without clear connection to its parenthetical note about DistanceBand.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (9 parameters, spatial statistics), no annotations, but with an output schema present, the description is moderately complete. It covers the parameter semantics adequately but lacks behavioral context about the OLS implementation, spatial diagnostics included, or what 'safe' entails. The output schema will handle return values, but the description should better explain the tool's scope and limitations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description compensates well by explaining all 9 parameters. It clarifies conditional requirements (threshold for distance_band, k for knn), enumerates weights_method options, and explains the binary parameter's scope. However, it doesn't provide format details for data_path or weights_path files, or explain what 'observation IDs' means for id_field.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action: 'run OLS' (Ordinary Least Squares regression) with spatial diagnostics. It distinguishes from siblings by mentioning the full pipeline including reading shapefiles, building/loading spatial weights, and checking data quality, which is unique among the listed tools that focus on individual operations like clustering, buffering, or spatial joins.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites, when spatial diagnostics are appropriate, or compare to non-spatial regression tools. The agent must infer usage from the tool name and parameter list alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mahdin75/gis-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server