Skip to main content
Glama
knishioka

Treasure Data MCP Server

by knishioka

td_smart_search

Search across Treasure Data projects, workflows, and tables using smart ranking for comprehensive discovery of data assets and resources.

Instructions

Universal search across Treasure Data - best for broad queries.

One-stop search for projects, workflows, and tables with smart ranking.
Use when unsure what resource type you're looking for or need comprehensive results.

Common scenarios:
- "Find anything related to customer analytics"
- Discovering resources around a topic/keyword
- Broad exploration of available data assets
- Finding resources when type is unknown
- Cross-resource impact analysis

Search modes:
- exact: Precise string matching only
- fuzzy: Partial matches and substrings (default)
- semantic: Word-based matching for concepts

Scopes: "all", "projects", "workflows", "tables"
Returns ranked results with relevance scores (0-1).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYes
search_scopeNoall
search_modeNofuzzy
active_onlyNo
min_relevanceNo

Implementation Reference

  • Main execution logic for td_smart_search tool: validates input, creates TD client, implements calculate_relevance helper, searches specified scopes (projects, workflows, tables), computes relevance scores, sorts and limits results, handles errors.
    async def td_smart_search(
        query: str,
        search_scope: str = "all",
        search_mode: str = "fuzzy",
        active_only: bool = True,
        min_relevance: float = 0.7,
    ) -> dict[str, Any]:
        """Universal search across Treasure Data - best for broad queries.
    
        One-stop search for projects, workflows, and tables with smart ranking.
        Use when unsure what resource type you're looking for or need comprehensive results.
    
        Common scenarios:
        - "Find anything related to customer analytics"
        - Discovering resources around a topic/keyword
        - Broad exploration of available data assets
        - Finding resources when type is unknown
        - Cross-resource impact analysis
    
        Search modes:
        - exact: Precise string matching only
        - fuzzy: Partial matches and substrings (default)
        - semantic: Word-based matching for concepts
    
        Scopes: "all", "projects", "workflows", "tables"
        Returns ranked results with relevance scores (0-1).
        """
        if not query or not query.strip():
            return _format_error_response("Search query cannot be empty")
    
        if search_scope not in ["projects", "workflows", "tables", "all"]:
            return _format_error_response("Invalid search scope")
    
        if search_mode not in ["exact", "fuzzy", "semantic"]:
            return _format_error_response("Invalid search mode")
    
        client = _create_client(include_workflow=True)
        if isinstance(client, dict):
            return client
    
        results: dict[str, Any] = {
            "query": query,
            "search_scope": search_scope,
            "search_mode": search_mode,
            "results": [],
            "total_found": 0,
        }
    
        try:
            # Helper function to calculate relevance score
            def calculate_relevance(text: str, query: str, exact: bool = False) -> float:
                text_lower = text.lower()
                query_lower = query.lower()
    
                if exact:
                    return 1.0 if query_lower == text_lower else 0.0
    
                # Exact match gets highest score
                if query_lower == text_lower:
                    return 1.0
    
                # Substring match
                if query_lower in text_lower:
                    # Score based on position and length ratio
                    position_score = 1.0 - (text_lower.index(query_lower) / len(text_lower))
                    length_score = len(query_lower) / len(text_lower)
                    return (position_score + length_score) / 2
    
                # Fuzzy matching for semantic mode
                if search_mode == "semantic":
                    # Simple word-based matching
                    query_words = set(query_lower.split())
                    text_words = set(text_lower.split())
                    if query_words:
                        overlap = len(query_words & text_words) / len(query_words)
                        return overlap * 0.8  # Slightly lower score for word matches
    
                return 0.0
    
            # Search projects
            if search_scope in ["projects", "all"]:
                try:
                    projects = client.get_projects(limit=200, all_results=True)
                    for project in projects:
                        relevance = calculate_relevance(
                            project.name, query, exact=(search_mode == "exact")
                        )
    
                        if relevance >= min_relevance:
                            results["results"].append(
                                {
                                    "type": "project",
                                    "relevance": round(relevance, 3),
                                    "resource": {
                                        "id": project.id,
                                        "name": project.name,
                                        "created_at": project.created_at,
                                        "updated_at": project.updated_at,
                                    },
                                    "match_context": f"Project name: {project.name}",
                                }
                            )
                except Exception:
                    # Log error but continue with other searches
                    pass
    
            # Search workflows
            if search_scope in ["workflows", "all"]:
                try:
                    workflows = client.get_workflows(count=1000, all_results=True)
                    for workflow in workflows:
                        # Check workflow name
                        workflow_relevance = calculate_relevance(
                            workflow.name, query, exact=(search_mode == "exact")
                        )
    
                        # Also check project name for better context
                        project_relevance = calculate_relevance(
                            workflow.project.name, query, exact=(search_mode == "exact")
                        )
    
                        # Take the higher relevance
                        relevance = max(workflow_relevance, project_relevance * 0.7)
    
                        if relevance >= min_relevance:
                            # Get latest status
                            latest_status = "no_runs"
                            if workflow.latest_sessions:
                                latest_status = workflow.latest_sessions[
                                    0
                                ].last_attempt.status
    
                            results["results"].append(
                                {
                                    "type": "workflow",
                                    "relevance": round(relevance, 3),
                                    "resource": {
                                        "id": workflow.id,
                                        "name": workflow.name,
                                        "project": workflow.project.name,
                                        "scheduled": workflow.schedule is not None,
                                        "latest_status": latest_status,
                                    },
                                    "match_context": (
                                        f"Workflow: {workflow.name} "
                                        f"in project: {workflow.project.name}"
                                    ),
                                }
                            )
                except Exception:
                    # Log error but continue
                    pass
    
            # Search tables
            if search_scope in ["tables", "all"]:
                try:
                    # Get all databases first
                    databases = client.get_databases(all_results=True)
    
                    for database in databases[:10]:  # Limit to avoid too many API calls
                        try:
                            tables = client.get_tables(database.name, all_results=True)
                            for table in tables:
                                # Check table name
                                table_relevance = calculate_relevance(
                                    table.name, query, exact=(search_mode == "exact")
                                )
    
                                # Also consider database name
                                db_relevance = calculate_relevance(
                                    database.name, query, exact=(search_mode == "exact")
                                )
    
                                relevance = max(table_relevance, db_relevance * 0.5)
    
                                if relevance >= min_relevance:
                                    results["results"].append(
                                        {
                                            "type": "table",
                                            "relevance": round(relevance, 3),
                                            "resource": {
                                                "name": table.name,
                                                "database": database.name,
                                                "full_name": (
                                                    f"{database.name}.{table.name}"
                                                ),
                                                "type": table.type,
                                                "count": table.count,
                                            },
                                            "match_context": (
                                                f"Table: {table.name} "
                                                f"in database: {database.name}"
                                            ),
                                        }
                                    )
                        except Exception:
                            # Skip databases with access issues
                            continue
                except Exception:
                    # Log error but continue
                    pass
    
            # Sort results by relevance
            results["results"].sort(key=lambda x: x["relevance"], reverse=True)
            results["total_found"] = len(results["results"])
    
            # Add search suggestions if few results
            if results["total_found"] < 5 and search_mode == "exact":
                results[
                    "suggestion"
                ] = "Try using fuzzy or semantic search mode for more results"
    
            # Limit results to prevent token overflow
            if len(results["results"]) > 50:
                results["results"] = results["results"][:50]
                results["truncated"] = True
                results[
                    "truncated_message"
                ] = f"Showing top 50 of {results['total_found']} results"
    
            return results
    
        except Exception as e:
            return _format_error_response(f"Search failed: {str(e)}")
  • Direct registration of the td_smart_search handler using mcp.tool() decorator inside register_mcp_tools function.
    mcp.tool()(td_smart_search)
  • Invocation of search_tools.register_mcp_tools which registers td_smart_search among other search tools to the MCP server instance.
    search_tools.register_mcp_tools(
        mcp, _create_client, _format_error_response, _validate_project_id
    )
  • Type annotations and default values defining the input schema for td_smart_search (inferred by MCP decorator); comprehensive docstring describes parameters, modes, scopes, and return format.
    async def td_smart_search(
        query: str,
        search_scope: str = "all",
        search_mode: str = "fuzzy",
        active_only: bool = True,
        min_relevance: float = 0.7,
    ) -> dict[str, Any]:
  • Inner helper function used by td_smart_search for computing relevance scores based on search mode (exact, fuzzy, semantic).
    def calculate_relevance(text: str, query: str, exact: bool = False) -> float:
        text_lower = text.lower()
        query_lower = query.lower()
    
        if exact:
            return 1.0 if query_lower == text_lower else 0.0
    
        # Exact match gets highest score
        if query_lower == text_lower:
            return 1.0
    
        # Substring match
        if query_lower in text_lower:
            # Score based on position and length ratio
            position_score = 1.0 - (text_lower.index(query_lower) / len(text_lower))
            length_score = len(query_lower) / len(text_lower)
            return (position_score + length_score) / 2
    
        # Fuzzy matching for semantic mode
        if search_mode == "semantic":
            # Simple word-based matching
            query_words = set(query_lower.split())
            text_words = set(text_lower.split())
            if query_words:
                overlap = len(query_words & text_words) / len(query_words)
                return overlap * 0.8  # Slightly lower score for word matches
    
        return 0.0
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: the tool returns ranked results with relevance scores (0-1), supports multiple search modes (exact, fuzzy, semantic) and scopes (all, projects, workflows, tables), and defaults to fuzzy search and 'all' scope. It doesn't mention rate limits, authentication needs, or pagination, but covers core operational traits well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded: it starts with the core purpose, then usage guidelines, common scenarios, and technical details. Every sentence adds value—no fluff or repetition. It uses bullet points for readability without wasting space, making it efficient for an AI agent to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, no annotations, no output schema), the description does a strong job. It covers purpose, usage, behaviors, and some parameter semantics. Gaps include no output format details (though it mentions relevance scores) and incomplete parameter coverage. For a search tool with moderate complexity, this is nearly complete but could slightly enhance parameter documentation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It adds meaningful context for parameters: it explains search_mode options (exact, fuzzy, semantic) and scopes (all, projects, workflows, tables), which aren't in the schema. However, it doesn't cover active_only or min_relevance parameters. The description provides substantial value beyond the bare schema but doesn't fully document all parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Universal search across Treasure Data - best for broad queries. One-stop search for projects, workflows, and tables with smart ranking.' It specifies the verb ('search'), resources ('projects, workflows, and tables'), and distinguishes it from siblings by emphasizing its broad, cross-resource nature versus more specific lookup tools like td_find_project or td_get_workflow.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: 'Use when unsure what resource type you're looking for or need comprehensive results.' It lists common scenarios (e.g., 'Broad exploration of available data assets') and contrasts with more targeted tools by implication, as siblings include specific find/get tools for individual resource types.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/knishioka/td-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server