Treasure Data MCP Server

td_smart_search

Search across Treasure Data projects, workflows, and tables using smart ranking for comprehensive discovery of data assets and resources.

Instructions

Universal search across Treasure Data - best for broad queries.

One-stop search for projects, workflows, and tables with smart ranking.
Use when unsure what resource type you're looking for or need comprehensive results.

Common scenarios:
- "Find anything related to customer analytics"
- Discovering resources around a topic/keyword
- Broad exploration of available data assets
- Finding resources when type is unknown
- Cross-resource impact analysis

Search modes:
- exact: Precise string matching only
- fuzzy: Partial matches and substrings (default)
- semantic: Word-based matching for concepts

Scopes: "all", "projects", "workflows", "tables"
Returns ranked results with relevance scores (0-1).

Input Schema

TableJSON Schema

Name	Required	Default
`query`	Yes
`search_scope`	No	all
`search_mode`	No	fuzzy
`active_only`	No
`min_relevance`	No

Implementation Reference

td_mcp_server/search_tools.py:308-532 (handler)

Main execution logic for td_smart_search tool: validates input, creates TD client, implements calculate_relevance helper, searches specified scopes (projects, workflows, tables), computes relevance scores, sorts and limits results, handles errors.

async def td_smart_search(
    query: str,
    search_scope: str = "all",
    search_mode: str = "fuzzy",
    active_only: bool = True,
    min_relevance: float = 0.7,
) -> dict[str, Any]:
    """Universal search across Treasure Data - best for broad queries.

    One-stop search for projects, workflows, and tables with smart ranking.
    Use when unsure what resource type you're looking for or need comprehensive results.

    Common scenarios:
    - "Find anything related to customer analytics"
    - Discovering resources around a topic/keyword
    - Broad exploration of available data assets
    - Finding resources when type is unknown
    - Cross-resource impact analysis

    Search modes:
    - exact: Precise string matching only
    - fuzzy: Partial matches and substrings (default)
    - semantic: Word-based matching for concepts

    Scopes: "all", "projects", "workflows", "tables"
    Returns ranked results with relevance scores (0-1).
    """
    if not query or not query.strip():
        return _format_error_response("Search query cannot be empty")

    if search_scope not in ["projects", "workflows", "tables", "all"]:
        return _format_error_response("Invalid search scope")

    if search_mode not in ["exact", "fuzzy", "semantic"]:
        return _format_error_response("Invalid search mode")

    client = _create_client(include_workflow=True)
    if isinstance(client, dict):
        return client

    results: dict[str, Any] = {
        "query": query,
        "search_scope": search_scope,
        "search_mode": search_mode,
        "results": [],
        "total_found": 0,
    }

    try:
        # Helper function to calculate relevance score
        def calculate_relevance(text: str, query: str, exact: bool = False) -> float:
            text_lower = text.lower()
            query_lower = query.lower()

            if exact:
                return 1.0 if query_lower == text_lower else 0.0

            # Exact match gets highest score
            if query_lower == text_lower:
                return 1.0

            # Substring match
            if query_lower in text_lower:
                # Score based on position and length ratio
                position_score = 1.0 - (text_lower.index(query_lower) / len(text_lower))
                length_score = len(query_lower) / len(text_lower)
                return (position_score + length_score) / 2

            # Fuzzy matching for semantic mode
            if search_mode == "semantic":
                # Simple word-based matching
                query_words = set(query_lower.split())
                text_words = set(text_lower.split())
                if query_words:
                    overlap = len(query_words & text_words) / len(query_words)
                    return overlap * 0.8  # Slightly lower score for word matches

            return 0.0

        # Search projects
        if search_scope in ["projects", "all"]:
            try:
                projects = client.get_projects(limit=200, all_results=True)
                for project in projects:
                    relevance = calculate_relevance(
                        project.name, query, exact=(search_mode == "exact")
                    )

                    if relevance >= min_relevance:
                        results["results"].append(
                            {
                                "type": "project",
                                "relevance": round(relevance, 3),
                                "resource": {
                                    "id": project.id,
                                    "name": project.name,
                                    "created_at": project.created_at,
                                    "updated_at": project.updated_at,
                                },
                                "match_context": f"Project name: {project.name}",
                            }
                        )
            except Exception:
                # Log error but continue with other searches
                pass

        # Search workflows
        if search_scope in ["workflows", "all"]:
            try:
                workflows = client.get_workflows(count=1000, all_results=True)
                for workflow in workflows:
                    # Check workflow name
                    workflow_relevance = calculate_relevance(
                        workflow.name, query, exact=(search_mode == "exact")
                    )

                    # Also check project name for better context
                    project_relevance = calculate_relevance(
                        workflow.project.name, query, exact=(search_mode == "exact")
                    )

                    # Take the higher relevance
                    relevance = max(workflow_relevance, project_relevance * 0.7)

                    if relevance >= min_relevance:
                        # Get latest status
                        latest_status = "no_runs"
                        if workflow.latest_sessions:
                            latest_status = workflow.latest_sessions[
                                0
                            ].last_attempt.status

                        results["results"].append(
                            {
                                "type": "workflow",
                                "relevance": round(relevance, 3),
                                "resource": {
                                    "id": workflow.id,
                                    "name": workflow.name,
                                    "project": workflow.project.name,
                                    "scheduled": workflow.schedule is not None,
                                    "latest_status": latest_status,
                                },
                                "match_context": (
                                    f"Workflow: {workflow.name} "
                                    f"in project: {workflow.project.name}"
                                ),
                            }
                        )
            except Exception:
                # Log error but continue
                pass

        # Search tables
        if search_scope in ["tables", "all"]:
            try:
                # Get all databases first
                databases = client.get_databases(all_results=True)

                for database in databases[:10]:  # Limit to avoid too many API calls
                    try:
                        tables = client.get_tables(database.name, all_results=True)
                        for table in tables:
                            # Check table name
                            table_relevance = calculate_relevance(
                                table.name, query, exact=(search_mode == "exact")
                            )

                            # Also consider database name
                            db_relevance = calculate_relevance(
                                database.name, query, exact=(search_mode == "exact")
                            )

                            relevance = max(table_relevance, db_relevance * 0.5)

                            if relevance >= min_relevance:
                                results["results"].append(
                                    {
                                        "type": "table",
                                        "relevance": round(relevance, 3),
                                        "resource": {
                                            "name": table.name,
                                            "database": database.name,
                                            "full_name": (
                                                f"{database.name}.{table.name}"
                                            ),
                                            "type": table.type,
                                            "count": table.count,
                                        },
                                        "match_context": (
                                            f"Table: {table.name} "
                                            f"in database: {database.name}"
                                        ),
                                    }
                                )
                    except Exception:
                        # Skip databases with access issues
                        continue
            except Exception:
                # Log error but continue
                pass

        # Sort results by relevance
        results["results"].sort(key=lambda x: x["relevance"], reverse=True)
        results["total_found"] = len(results["results"])

        # Add search suggestions if few results
        if results["total_found"] < 5 and search_mode == "exact":
            results[
                "suggestion"
            ] = "Try using fuzzy or semantic search mode for more results"

        # Limit results to prevent token overflow
        if len(results["results"]) > 50:
            results["results"] = results["results"][:50]
            results["truncated"] = True
            results[
                "truncated_message"
            ] = f"Showing top 50 of {results['total_found']} results"

        return results

    except Exception as e:
        return _format_error_response(f"Search failed: {str(e)}")

td_mcp_server/search_tools.py:30-30 (registration)
Direct registration of the td_smart_search handler using mcp.tool() decorator inside register_mcp_tools function.
```
mcp.tool()(td_smart_search)
```
td_mcp_server/mcp_impl.py:707-709 (registration)
Invocation of search_tools.register_mcp_tools which registers td_smart_search among other search tools to the MCP server instance.
```
search_tools.register_mcp_tools(
    mcp, _create_client, _format_error_response, _validate_project_id
)
```
td_mcp_server/search_tools.py:308-314 (schema)
Type annotations and default values defining the input schema for td_smart_search (inferred by MCP decorator); comprehensive docstring describes parameters, modes, scopes, and return format.
```
async def td_smart_search(
    query: str,
    search_scope: str = "all",
    search_mode: str = "fuzzy",
    active_only: bool = True,
    min_relevance: float = 0.7,
) -> dict[str, Any]:
```

td_mcp_server/search_tools.py:358-386 (helper)

Inner helper function used by td_smart_search for computing relevance scores based on search mode (exact, fuzzy, semantic).

def calculate_relevance(text: str, query: str, exact: bool = False) -> float:
    text_lower = text.lower()
    query_lower = query.lower()

    if exact:
        return 1.0 if query_lower == text_lower else 0.0

    # Exact match gets highest score
    if query_lower == text_lower:
        return 1.0

    # Substring match
    if query_lower in text_lower:
        # Score based on position and length ratio
        position_score = 1.0 - (text_lower.index(query_lower) / len(text_lower))
        length_score = len(query_lower) / len(text_lower)
        return (position_score + length_score) / 2

    # Fuzzy matching for semantic mode
    if search_mode == "semantic":
        # Simple word-based matching
        query_words = set(query_lower.split())
        text_words = set(text_lower.split())
        if query_words:
            overlap = len(query_words & text_words) / len(query_words)
            return overlap * 0.8  # Slightly lower score for word matches

    return 0.0

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: the tool returns ranked results with relevance scores (0-1), supports multiple search modes (exact, fuzzy, semantic) and scopes (all, projects, workflows, tables), and defaults to fuzzy search and 'all' scope. It doesn't mention rate limits, authentication needs, or pagination, but covers core operational traits well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded: it starts with the core purpose, then usage guidelines, common scenarios, and technical details. Every sentence adds value—no fluff or repetition. It uses bullet points for readability without wasting space, making it efficient for an AI agent to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, no annotations, no output schema), the description does a strong job. It covers purpose, usage, behaviors, and some parameter semantics. Gaps include no output format details (though it mentions relevance scores) and incomplete parameter coverage. For a search tool with moderate complexity, this is nearly complete but could slightly enhance parameter documentation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It adds meaningful context for parameters: it explains search_mode options (exact, fuzzy, semantic) and scopes (all, projects, workflows, tables), which aren't in the schema. However, it doesn't cover active_only or min_relevance parameters. The description provides substantial value beyond the bare schema but doesn't fully document all parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Universal search across Treasure Data - best for broad queries. One-stop search for projects, workflows, and tables with smart ranking.' It specifies the verb ('search'), resources ('projects, workflows, and tables'), and distinguishes it from siblings by emphasizing its broad, cross-resource nature versus more specific lookup tools like td_find_project or td_get_workflow.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: 'Use when unsure what resource type you're looking for or need comprehensive results.' It lists common scenarios (e.g., 'Broad exploration of available data assets') and contrasts with more targeted tools by implication, as siblings include specific find/get tools for individual resource types.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/knishioka/td-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server