Berlin Open Data MCP Server

Overview Schema Related Servers Score Discussions

berlin_analyze_datasets

Read-onlyIdempotent

Analyzes Berlin open datasets for relevance, freshness, and available formats to identify suitable data for your projects.

Instructions

Analysiert Datensaetze umfassend: Relevanz, Aktualitaet und verfuegbare Formate.

Kombiniert Suche mit Analyse der Metadaten und Ressourcen-Formate.
Besonders nuetzlich um herauszufinden, welche Daten verfuegbar sind
und wie aktuell sie sind.

Hinweis: Berlins CKAN hat keinen DataStore – Daten muessen ueber
die Ressourcen-URLs heruntergeladen werden.

Returns:
    Umfassender Analyse-Report mit Relevanz, Aktualitaet und Formaten

Input Schema

TableJSON Schema

Name	Required	Description	Default
`params`	Yes

Output Schema

TableJSON Schema

Name	Required	Description	Default
`result`	Yes

Implementation Reference

src/berlin_opendata_mcp/server.py:313-394 (handler)

Handler function for the 'berlin_analyze_datasets' tool.

@mcp.tool(
    name="berlin_analyze_datasets",
    annotations={
        "title": "Datensaetze analysieren",
        "readOnlyHint": True,
        "destructiveHint": False,
        "idempotentHint": True,
        "openWorldHint": True,
    },
)
async def berlin_analyze_datasets(params: AnalyzeDatasetInput) -> str:
    """Analysiert Datensaetze umfassend: Relevanz, Aktualitaet und verfuegbare Formate.

    Kombiniert Suche mit Analyse der Metadaten und Ressourcen-Formate.
    Besonders nuetzlich um herauszufinden, welche Daten verfuegbar sind
    und wie aktuell sie sind.

    Hinweis: Berlins CKAN hat keinen DataStore – Daten muessen ueber
    die Ressourcen-URLs heruntergeladen werden.

    Returns:
        Umfassender Analyse-Report mit Relevanz, Aktualitaet und Formaten
    """
    try:
        result = await ckan_request(
            "package_search",
            {
                "q": params.query,
                "rows": params.max_datasets,
                "sort": "score desc",
            },
        )
        datasets = result["results"]
        total = result["count"]

        if not datasets:
            return f"Keine Datensaetze gefunden fuer '{params.query}'."

        lines = [
            f"## Analyse: '{params.query}'",
            f"**{total} Datensaetze gefunden**, Top {len(datasets)} analysiert:\n",
        ]

        for i, ds in enumerate(datasets, 1):
            name = ds.get("name", "")
            title = ds.get("title", "?")
            modified = ds.get("metadata_modified", "?")[:10]
            resources = ds.get("resources", [])
            formats = sorted(set(r.get("format", "?") for r in resources))

            # Berlin-specific extras
            extras = {e["key"]: e["value"] for e in ds.get("extras", [])}
            date_updated = extras.get("date_updated", "")
            geo_coverage = extras.get("geographical_coverage", "")

            lines.append(f"### {i}. {title}")
            lines.append(f"- **ID**: `{name}`")
            lines.append(f"- **Formate**: {', '.join(formats)}")
            lines.append(f"- **Ressourcen**: {len(resources)}")

            if params.include_freshness:
                lines.append(f"- **Letzte Aenderung**: {modified}")
                if date_updated:
                    lines.append(f"- **Daten aktualisiert**: {date_updated}")

            if params.include_structure:
                for res in resources:
                    res_format = res.get("format", "?")
                    res_name = res.get("name", "Unbenannt")
                    res_url = res.get("url", "")
                    lines.append(f"  - {res_name} ({res_format}): {res_url}")

            if geo_coverage:
                lines.append(f"- **Raeumliche Abdeckung**: {geo_coverage}")

            lines.append(f"- **URL**: {PORTAL_URL}/datensaetze/{name}\n")

        return "\n".join(lines)

    except Exception as e:
        return handle_api_error(e, "Datensatz-Analyse")

src/berlin_opendata_mcp/server.py:298-311 (schema)

Input validation schema for the 'berlin_analyze_datasets' tool.

class AnalyzeDatasetInput(BaseModel):
    """Input fuer Datensatz-Analyse."""

    model_config = ConfigDict(str_strip_whitespace=True, extra="forbid")

    query: str = Field(
        ...,
        description="Suchbegriff fuer die Analyse, z.B. 'Einwohner', 'Verkehr', 'Wohnen'",
        min_length=1,
    )
    max_datasets: int = Field(default=5, description="Maximale Anzahl zu analysierender Datensaetze", ge=1, le=20)
    include_structure: bool = Field(default=True, description="Ressourcen-Formate einschliessen")
    include_freshness: bool = Field(default=True, description="Aktualitaets-Analyse einschliessen")

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide key behavioral hints: readOnlyHint=true, destructiveHint=false, openWorldHint=true, and idempotentHint=true. The description adds useful context beyond this: it notes that Berlin's CKAN has no DataStore, so data must be downloaded via resource URLs, and it specifies the tool returns a comprehensive analysis report. This adds value but doesn't detail aspects like rate limits or authentication needs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, starting with the core purpose and key features. It uses three concise paragraphs with no wasted sentences, each adding value: analysis scope, utility, and a technical note. However, the mix of German and English might slightly hinder clarity for non-German speakers.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (analysis with multiple parameters), rich annotations (covering safety and behavior), and the presence of an output schema (implied by 'Returns' in description), the description is complete enough. It covers purpose, usage context, key behavioral notes, and output format, without needing to repeat structured data from annotations or schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage (all parameters are well-documented with titles and descriptions), so the baseline is 3. The description doesn't add specific parameter semantics beyond what the schema provides; it mentions analyzing relevance, freshness, and formats, which aligns with parameters like 'include_freshness' and 'include_structure', but doesn't explain syntax or usage details for parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: analyzing datasets for relevance, freshness, and available formats, combining search with metadata and resource format analysis. It distinguishes itself from siblings like 'berlin_search_datasets' by emphasizing comprehensive analysis rather than just searching. However, it doesn't explicitly contrast with all siblings (e.g., 'berlin_get_dataset' might also retrieve metadata).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: to find out what data is available and how current it is, particularly useful for understanding dataset availability and freshness. It mentions Berlin's CKAN lacks a DataStore, implying data must be downloaded via resource URLs, which guides usage. However, it doesn't explicitly state when not to use it or name alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/tifa365/berlin-opendata-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server