Skip to main content
Glama
HeetVekariya

Linear Regression MCP

by HeetVekariya

train_linear_regression_model

Train a linear regression model on uploaded CSV data to predict values and evaluate performance using RMSE metrics.

Instructions

This function trains linear regression model.

Args: Takes input for output column name.

Returns: String which contains the RMSE value.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
output_columnYes

Implementation Reference

  • Implements the training of a linear regression model using scikit-learn. It retrieves data from context, prepares features and target, splits into train/test sets (90/10), fits the model, predicts on test set, computes and returns RMSE.
    def train_linear_regression_model(output_column: str) -> str:
        """
        This function trains linear regression model.
    
        Args:
            Takes input for output column name.
    
        Returns:
            String which contains the RMSE value.
        """
    
        try:
            data = context.get_data()
    
            # Check if the output column exists in the dataset
            if output_column not in data.columns:
                return f"Error: '{output_column}' column not found in the dataset."
    
            # Prepare the features (X) and target variable (y)
            X = data.drop(columns=[output_column])  # Drop the target column for features
            y = data[output_column]  # The target variable is the output column
    
            # Split the data into training and test sets (80% train, 20% test)
            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)
    
            # Initialize the Linear Regression model
            model = LinearRegression()
    
            # Train the model
            model.fit(X_train, y_train)
    
            # Predict on the test set
            y_pred = model.predict(X_test)
    
            # Calculate RMSE (Root Mean Squared Error)
            rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    
            # Return the RMSE value
            return f"Model trained successfully. RMSE: {rmse:.4f}"
    
        except Exception as e:
            return f"An error occurred while training the model: {str(e)}"
  • server.py:122-122 (registration)
    Registers the train_linear_regression_model function as an MCP tool using the FastMCP decorator.
    @mcp.tool()
  • DataContext dataclass provides shared storage for the pandas DataFrame used by the tool (via global context instance). Used in the handler to get_data().
    @dataclass
    class DataContext():
        """
        A class that stores the DataFrame in the context.
        """
        _data: pd.DataFrame = None
    
        def set_data(self, new_data: pd.DataFrame):
            """
            Method to set or update the data.
            """
            self._data = new_data
    
        def get_data(self) -> pd.DataFrame:
            """
            Method to get the data from the context.
            """
            return self._data
    
    # Initialize the DataContext instance globally
    context = DataContext()
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but offers minimal behavioral context. It mentions training and returning RMSE, but doesn't disclose important traits like whether this is a destructive operation (overwrites existing models), computational requirements, convergence criteria, or what happens to the trained model after execution.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief (4 sentences) but not optimally structured. The 'Args' and 'Returns' sections are helpful, but the opening sentence is redundant with the tool name. The information is somewhat front-loaded but could be more efficiently organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a model training tool with no annotations, no output schema, and 0% parameter documentation in the schema, the description is inadequate. It doesn't explain what data is used for training, how features are selected, model storage/retrieval, or error conditions. The return value description ('String which contains the RMSE value') is helpful but insufficient for proper tool understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage for the single parameter, the description adds minimal value. It mentions 'output column name' but doesn't explain what this column represents (predicted values? model name?), what format it should be in, or how it relates to the training process. The schema only shows it's a required string titled 'Output Column'.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool 'trains linear regression model', which is a clear verb+resource combination. However, it doesn't distinguish this from potential sibling tools (like other model training functions) and is somewhat vague about what exactly gets trained (on what data, with what features).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. There's no mention of prerequisites (like data preparation), when this model type is appropriate, or how it differs from other modeling approaches available in the sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/HeetVekariya/Linear-Regression-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server