Skip to main content
Glama
knishioka

Treasure Data MCP Server

by knishioka

td_smart_search

Search across Treasure Data projects, workflows, and tables using smart ranking for comprehensive discovery of data assets and resources.

Instructions

Universal search across Treasure Data - best for broad queries.

One-stop search for projects, workflows, and tables with smart ranking.
Use when unsure what resource type you're looking for or need comprehensive results.

Common scenarios:
- "Find anything related to customer analytics"
- Discovering resources around a topic/keyword
- Broad exploration of available data assets
- Finding resources when type is unknown
- Cross-resource impact analysis

Search modes:
- exact: Precise string matching only
- fuzzy: Partial matches and substrings (default)
- semantic: Word-based matching for concepts

Scopes: "all", "projects", "workflows", "tables"
Returns ranked results with relevance scores (0-1).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYes
search_scopeNoall
search_modeNofuzzy
active_onlyNo
min_relevanceNo

Implementation Reference

  • Main execution logic for td_smart_search tool: validates input, creates TD client, implements calculate_relevance helper, searches specified scopes (projects, workflows, tables), computes relevance scores, sorts and limits results, handles errors.
    async def td_smart_search(
        query: str,
        search_scope: str = "all",
        search_mode: str = "fuzzy",
        active_only: bool = True,
        min_relevance: float = 0.7,
    ) -> dict[str, Any]:
        """Universal search across Treasure Data - best for broad queries.
    
        One-stop search for projects, workflows, and tables with smart ranking.
        Use when unsure what resource type you're looking for or need comprehensive results.
    
        Common scenarios:
        - "Find anything related to customer analytics"
        - Discovering resources around a topic/keyword
        - Broad exploration of available data assets
        - Finding resources when type is unknown
        - Cross-resource impact analysis
    
        Search modes:
        - exact: Precise string matching only
        - fuzzy: Partial matches and substrings (default)
        - semantic: Word-based matching for concepts
    
        Scopes: "all", "projects", "workflows", "tables"
        Returns ranked results with relevance scores (0-1).
        """
        if not query or not query.strip():
            return _format_error_response("Search query cannot be empty")
    
        if search_scope not in ["projects", "workflows", "tables", "all"]:
            return _format_error_response("Invalid search scope")
    
        if search_mode not in ["exact", "fuzzy", "semantic"]:
            return _format_error_response("Invalid search mode")
    
        client = _create_client(include_workflow=True)
        if isinstance(client, dict):
            return client
    
        results: dict[str, Any] = {
            "query": query,
            "search_scope": search_scope,
            "search_mode": search_mode,
            "results": [],
            "total_found": 0,
        }
    
        try:
            # Helper function to calculate relevance score
            def calculate_relevance(text: str, query: str, exact: bool = False) -> float:
                text_lower = text.lower()
                query_lower = query.lower()
    
                if exact:
                    return 1.0 if query_lower == text_lower else 0.0
    
                # Exact match gets highest score
                if query_lower == text_lower:
                    return 1.0
    
                # Substring match
                if query_lower in text_lower:
                    # Score based on position and length ratio
                    position_score = 1.0 - (text_lower.index(query_lower) / len(text_lower))
                    length_score = len(query_lower) / len(text_lower)
                    return (position_score + length_score) / 2
    
                # Fuzzy matching for semantic mode
                if search_mode == "semantic":
                    # Simple word-based matching
                    query_words = set(query_lower.split())
                    text_words = set(text_lower.split())
                    if query_words:
                        overlap = len(query_words & text_words) / len(query_words)
                        return overlap * 0.8  # Slightly lower score for word matches
    
                return 0.0
    
            # Search projects
            if search_scope in ["projects", "all"]:
                try:
                    projects = client.get_projects(limit=200, all_results=True)
                    for project in projects:
                        relevance = calculate_relevance(
                            project.name, query, exact=(search_mode == "exact")
                        )
    
                        if relevance >= min_relevance:
                            results["results"].append(
                                {
                                    "type": "project",
                                    "relevance": round(relevance, 3),
                                    "resource": {
                                        "id": project.id,
                                        "name": project.name,
                                        "created_at": project.created_at,
                                        "updated_at": project.updated_at,
                                    },
                                    "match_context": f"Project name: {project.name}",
                                }
                            )
                except Exception:
                    # Log error but continue with other searches
                    pass
    
            # Search workflows
            if search_scope in ["workflows", "all"]:
                try:
                    workflows = client.get_workflows(count=1000, all_results=True)
                    for workflow in workflows:
                        # Check workflow name
                        workflow_relevance = calculate_relevance(
                            workflow.name, query, exact=(search_mode == "exact")
                        )
    
                        # Also check project name for better context
                        project_relevance = calculate_relevance(
                            workflow.project.name, query, exact=(search_mode == "exact")
                        )
    
                        # Take the higher relevance
                        relevance = max(workflow_relevance, project_relevance * 0.7)
    
                        if relevance >= min_relevance:
                            # Get latest status
                            latest_status = "no_runs"
                            if workflow.latest_sessions:
                                latest_status = workflow.latest_sessions[
                                    0
                                ].last_attempt.status
    
                            results["results"].append(
                                {
                                    "type": "workflow",
                                    "relevance": round(relevance, 3),
                                    "resource": {
                                        "id": workflow.id,
                                        "name": workflow.name,
                                        "project": workflow.project.name,
                                        "scheduled": workflow.schedule is not None,
                                        "latest_status": latest_status,
                                    },
                                    "match_context": (
                                        f"Workflow: {workflow.name} "
                                        f"in project: {workflow.project.name}"
                                    ),
                                }
                            )
                except Exception:
                    # Log error but continue
                    pass
    
            # Search tables
            if search_scope in ["tables", "all"]:
                try:
                    # Get all databases first
                    databases = client.get_databases(all_results=True)
    
                    for database in databases[:10]:  # Limit to avoid too many API calls
                        try:
                            tables = client.get_tables(database.name, all_results=True)
                            for table in tables:
                                # Check table name
                                table_relevance = calculate_relevance(
                                    table.name, query, exact=(search_mode == "exact")
                                )
    
                                # Also consider database name
                                db_relevance = calculate_relevance(
                                    database.name, query, exact=(search_mode == "exact")
                                )
    
                                relevance = max(table_relevance, db_relevance * 0.5)
    
                                if relevance >= min_relevance:
                                    results["results"].append(
                                        {
                                            "type": "table",
                                            "relevance": round(relevance, 3),
                                            "resource": {
                                                "name": table.name,
                                                "database": database.name,
                                                "full_name": (
                                                    f"{database.name}.{table.name}"
                                                ),
                                                "type": table.type,
                                                "count": table.count,
                                            },
                                            "match_context": (
                                                f"Table: {table.name} "
                                                f"in database: {database.name}"
                                            ),
                                        }
                                    )
                        except Exception:
                            # Skip databases with access issues
                            continue
                except Exception:
                    # Log error but continue
                    pass
    
            # Sort results by relevance
            results["results"].sort(key=lambda x: x["relevance"], reverse=True)
            results["total_found"] = len(results["results"])
    
            # Add search suggestions if few results
            if results["total_found"] < 5 and search_mode == "exact":
                results[
                    "suggestion"
                ] = "Try using fuzzy or semantic search mode for more results"
    
            # Limit results to prevent token overflow
            if len(results["results"]) > 50:
                results["results"] = results["results"][:50]
                results["truncated"] = True
                results[
                    "truncated_message"
                ] = f"Showing top 50 of {results['total_found']} results"
    
            return results
    
        except Exception as e:
            return _format_error_response(f"Search failed: {str(e)}")
  • Direct registration of the td_smart_search handler using mcp.tool() decorator inside register_mcp_tools function.
    mcp.tool()(td_smart_search)
  • Invocation of search_tools.register_mcp_tools which registers td_smart_search among other search tools to the MCP server instance.
    search_tools.register_mcp_tools(
        mcp, _create_client, _format_error_response, _validate_project_id
    )
  • Type annotations and default values defining the input schema for td_smart_search (inferred by MCP decorator); comprehensive docstring describes parameters, modes, scopes, and return format.
    async def td_smart_search(
        query: str,
        search_scope: str = "all",
        search_mode: str = "fuzzy",
        active_only: bool = True,
        min_relevance: float = 0.7,
    ) -> dict[str, Any]:
  • Inner helper function used by td_smart_search for computing relevance scores based on search mode (exact, fuzzy, semantic).
    def calculate_relevance(text: str, query: str, exact: bool = False) -> float:
        text_lower = text.lower()
        query_lower = query.lower()
    
        if exact:
            return 1.0 if query_lower == text_lower else 0.0
    
        # Exact match gets highest score
        if query_lower == text_lower:
            return 1.0
    
        # Substring match
        if query_lower in text_lower:
            # Score based on position and length ratio
            position_score = 1.0 - (text_lower.index(query_lower) / len(text_lower))
            length_score = len(query_lower) / len(text_lower)
            return (position_score + length_score) / 2
    
        # Fuzzy matching for semantic mode
        if search_mode == "semantic":
            # Simple word-based matching
            query_words = set(query_lower.split())
            text_words = set(text_lower.split())
            if query_words:
                overlap = len(query_words & text_words) / len(query_words)
                return overlap * 0.8  # Slightly lower score for word matches
    
        return 0.0

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/knishioka/td-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server