Skip to main content
Glama

github_search

Search GitHub repositories and commits, then cross-reference with conversation history to validate dates or find related discussions.

Instructions

    Search GitHub repos, commits, and cross-reference with conversations.

    Args:
        query: Search query (used for code semantic search or as conversation_id for validate mode)
        project: Project/repo name (used for timeline and conversations modes)
        mode: Search mode:
            - "timeline" (default): Project creation date, commits, activity windows
            - "conversations": Find conversations mentioning a project
            - "code": Semantic search across commits AND conversations
            - "validate": Check conversation date validity via GitHub evidence.
                          Pass conversation_id as query.
        limit: Max results (default 10)
    

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryNo
projectNo
modeNotimeline
limitNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The main github_search tool handler function. Registered as an MCP tool via @mcp.tool() decorator. Dispatches to mode-specific sub-functions: _conversation_project_context (conversations), _code_to_conversation (code), _validate_date_with_github (validate). Default mode is 'timeline' which queries GitHub repos and commits via DuckDB.
    @mcp.tool()
    def github_search(
        query: str = "",
        project: str = None,
        mode: str = "timeline",
        limit: int = 10,
    ) -> str:
        """
        Search GitHub repos, commits, and cross-reference with conversations.
    
        Args:
            query: Search query (used for code semantic search or as conversation_id for validate mode)
            project: Project/repo name (used for timeline and conversations modes)
            mode: Search mode:
                - "timeline" (default): Project creation date, commits, activity windows
                - "conversations": Find conversations mentioning a project
                - "code": Semantic search across commits AND conversations
                - "validate": Check conversation date validity via GitHub evidence.
                              Pass conversation_id as query.
            limit: Max results (default 10)
        """
        cfg = get_config()
    
        if mode == "conversations":
            return _conversation_project_context(project or query, limit)
        elif mode == "code":
            return _code_to_conversation(query, limit)
        elif mode == "validate":
            return _validate_date_with_github(query)
    
        # Default: timeline mode
        project_name = project or query
        if not cfg.github_repos_parquet.exists():
            return "GitHub data not imported. Add GitHub data and run ingest first."
    
        con = get_github_db()
        if not con:
            return "GitHub database not available."
    
        pattern = f"%{project_name.lower()}%"
    
        try:
            repos = con.execute("""
                SELECT repo_name, created_at, pushed_at, description,
                       language, is_private, stars, url
                FROM github_repos
                WHERE LOWER(repo_name) LIKE ?
                ORDER BY created_at DESC
                LIMIT 5
            """, [pattern]).fetchall()
        except Exception as e:
            return f"Error querying GitHub repos: {e}"
    
        if not repos:
            return f"No GitHub project found matching '{project_name}'"
    
        output = [f"## GitHub Project Timeline: '{project_name}'\n"]
    
        for repo in repos:
            name, created, pushed, desc, lang, private, stars, url = repo
            output.append(f"### {name}")
            output.append(f"**Created**: {str(created)[:10]}")
            output.append(f"**Last pushed**: {str(pushed)[:10] if pushed else 'N/A'}")
            output.append(f"**Language**: {lang or 'N/A'}")
            output.append(f"**Private**: {'Yes' if private else 'No'}")
            output.append(f"**Stars**: {stars}")
            if desc:
                output.append(f"**Description**: {desc[:100]}")
            output.append(f"**URL**: {url}\n")
    
            # Get commits for this repo
            if cfg.github_commits_parquet.exists():
                try:
                    commits = con.execute("""
                        SELECT timestamp, message, author
                        FROM github_commits
                        WHERE repo_name = ?
                        ORDER BY timestamp DESC
                        LIMIT 10
                    """, [name]).fetchall()
    
                    if commits:
                        output.append("**Recent Commits**:")
                        for ts, msg, _ in commits:
                            msg_preview = msg.split('\n')[0][:60]
                            output.append(f"  - [{str(ts)[:10]}] {msg_preview}...")
    
                        monthly = con.execute("""
                            SELECT strftime(timestamp, '%Y-%m') as month,
                                   COUNT(*) as count
                            FROM github_commits
                            WHERE repo_name = ?
                            GROUP BY 1 ORDER BY 1
                        """, [name]).fetchall()
    
                        if monthly:
                            output.append("\n**Activity by Month**:")
                            for month, count in monthly[-6:]:
                                bar = "█" * min(count, 20)
                                output.append(f"  {month}: {bar} ({count})")
                except Exception:
                    pass
    
        return "\n".join(output)
  • Helper for 'conversations' mode: finds conversations mentioning a project name, optionally flags conversations that predate the project's GitHub creation date.
    def _conversation_project_context(project: str, limit: int = 10) -> str:
        """Find conversations mentioning a specific GitHub project."""
        cfg = get_config()
    
        try:
            con = get_conversations()
        except FileNotFoundError as e:
            return str(e)
    
        pattern = f"%{project.lower()}%"
    
        results = con.execute("""
            SELECT conversation_title,
                   substr(content, 1, 250) as preview,
                   created, role, conversation_id
            FROM conversations
            WHERE (LOWER(content) LIKE ? OR LOWER(conversation_title) LIKE ?)
              AND role = 'user'
            ORDER BY created DESC
            LIMIT ?
        """, [pattern, pattern, limit]).fetchall()
    
        if not results:
            return f"No conversations found mentioning '{project}'"
    
        # Get project creation date for validation
        project_created = None
        if cfg.github_repos_parquet.exists():
            gh_db = get_github_db()
            if gh_db:
                try:
                    repo_result = gh_db.execute("""
                        SELECT repo_name, created_at
                        FROM github_repos
                        WHERE LOWER(repo_name) LIKE ?
                        LIMIT 1
                    """, [pattern]).fetchone()
                    if repo_result:
                        project_created = repo_result[1]
                except Exception:
                    pass
    
        output = [f"## Conversations about: '{project}'\n"]
    
        if project_created:
            output.append(f"_GitHub project created: {str(project_created)[:10]}_\n")
    
        for title, preview, created, _, conv_id in results:
            date_flag = ""
            if project_created:
                try:
                    if created < project_created:
                        date_flag = " ⚠️ PREDATES PROJECT"
                except Exception:
                    pass
    
            output.append(
                f"**[{str(created)[:10]}]** "
                f"{title or 'Untitled'}{date_flag}"
            )
            output.append(f"> {preview}...")
            output.append(f"_ID: {conv_id[:20]}..._\n")
    
        return "\n".join(output)
  • Helper for 'code' mode: performs semantic search across commits (via LanceDB) and conversations, returning related results with similarity scores.
    def _code_to_conversation(query: str, limit: int = 10) -> str:
        """Semantic search across commits AND conversations."""
        cfg = get_config()
    
        embedding = get_embedding(query)
        if not embedding:
            return "Could not generate embedding for query."
    
        output = [f"## Code ↔ Conversation Search: '{query}'\n"]
    
        if cfg.lance_path.exists():
            db = get_lance_db()
            if db:
                try:
                    table_names = (
                        db.table_names()
                        if hasattr(db, "table_names")
                        else []
                    )
                    if "commit" in table_names:
                        tbl = db.open_table("commit")
                        commit_df = tbl.search(embedding).limit(
                            limit // 2
                        ).to_pandas()
                        if not commit_df.empty:
                            output.append("### Related Commits")
                            for _, row in commit_df.iterrows():
                                repo = row.get("repo_name", "unknown")
                                msg = str(row.get("message", ""))
                                ts = row.get("timestamp", "")
                                sim = 1 / (1 + row.get("_distance", 0))
                                msg_preview = msg.split("\n")[0][:80]
                                output.append(f"**[{repo}]** {msg_preview}")
                                output.append(
                                    f"  {str(ts)[:10]} | "
                                    f"Similarity: {sim:.3f}\n"
                                )
    
                    conv_results = lance_search(embedding, limit=limit // 2)
                    if conv_results:
                        output.append("### Related Conversations")
                        for title, content, year, month, sim in conv_results:
                            preview = content[:150]
                            output.append(
                                f"**[{year}-{month:02d}]** "
                                f"{title or 'Untitled'}"
                            )
                            output.append(f"> {preview}...")
                            output.append(f"Similarity: {sim:.3f}\n")
                except Exception as e:
                    output.append(f"_Search error: {e}_")
    
        if len(output) == 1:
            output.append(
                "_No embeddings found. Run the embed pipeline first._"
            )
    
        return "\n".join(output)
  • Helper for 'validate' mode: checks if a conversation's timestamp is consistent with GitHub project creation dates. Returns a verdict (likely correct, uncertain, or likely incorrect).
    def _validate_date_with_github(conversation_id: str) -> str:
        """Check conversation date validity via GitHub evidence."""
        cfg = get_config()
    
        try:
            con = get_conversations()
        except FileNotFoundError as e:
            return str(e)
    
        conv = con.execute("""
            SELECT conversation_title,
                   MIN(created) as first_msg,
                   MAX(created) as last_msg,
                   COUNT(*) as msg_count,
                   MAX(timestamp_is_fallback) as has_fallback
            FROM conversations
            WHERE conversation_id = ?
            GROUP BY conversation_title
        """, [conversation_id]).fetchone()
    
        if not conv:
            return f"Conversation not found: {conversation_id}"
    
        title, first_msg, _, msg_count, has_fallback = conv
    
        output = [f"## Date Validation: {title or 'Untitled'}\n"]
        output.append(f"**Conversation ID**: {conversation_id[:30]}...")
        output.append(f"**Recorded date**: {str(first_msg)[:10]}")
        output.append(f"**Messages**: {msg_count}")
        output.append(
            f"**Fallback timestamp**: "
            f"{'Yes ⚠️' if has_fallback else 'No ✓'}\n"
        )
    
        content = con.execute("""
            SELECT content FROM conversations
            WHERE conversation_id = ? AND role = 'user'
            LIMIT 50
        """, [conversation_id]).fetchall()
    
        all_content = " ".join([c[0] for c in content if c[0]])
    
        if not cfg.github_repos_parquet.exists():
            output.append(
                "_GitHub data not available for validation._"
            )
            return "\n".join(output)
    
        gh_db = get_github_db()
        if not gh_db:
            output.append("_GitHub database not available._")
            return "\n".join(output)
    
        try:
            repos = gh_db.execute("""
                SELECT repo_name, created_at FROM github_repos
            """).fetchall()
        except Exception:
            output.append("_Could not query GitHub repos._")
            return "\n".join(output)
    
        issues = []
        validations = []
    
        for repo_name, created_at in repos:
            if re.search(
                rf"\b{re.escape(repo_name)}\b", all_content, re.IGNORECASE
            ):
                if first_msg < created_at:
                    issues.append({
                        "project": repo_name,
                        "project_created": str(created_at)[:10],
                        "conv_date": str(first_msg)[:10],
                        "days_before": (created_at - first_msg).days,
                    })
                else:
                    validations.append({
                        "project": repo_name,
                        "project_created": str(created_at)[:10],
                    })
    
        if issues:
            output.append("### ⚠️ Date Conflicts Found")
            output.append(
                "_Conversation mentions projects that didn't exist yet_\n"
            )
            for issue in issues:
                output.append(
                    f"- **{issue['project']}** created "
                    f"{issue['project_created']}"
                )
                output.append(
                    f"  But conversation dated {issue['conv_date']} "
                    f"({issue['days_before']} days before!)"
                )
    
        if validations:
            output.append("\n### ✓ Valid Project References")
            for v in validations[:5]:
                output.append(
                    f"- {v['project']} (created {v['project_created']})"
                )
    
        if not issues and not validations:
            output.append(
                "_No GitHub project references found in this conversation._"
            )
    
        output.append("\n### Verdict")
        if issues:
            output.append(
                "🔴 **DATE LIKELY INCORRECT** — Conversation references "
                "projects that didn't exist yet."
            )
            output.append(
                f"   Earliest valid date: "
                f"{max(i['project_created'] for i in issues)}"
            )
        elif has_fallback:
            output.append(
                "🟡 **UNCERTAIN** — Uses fallback timestamp, "
                "but no conflicting evidence found."
            )
        else:
            output.append(
                "🟢 **LIKELY VALID** — No date conflicts detected."
            )
    
        return "\n".join(output)
  • Registration point: imports tools_github module and calls tools_github.register(mcp), which registers the github_search tool via @mcp.tool() decorator.
    from . import tools_github
    from . import tools_analytics
    
    tools_conversations.register(mcp)
    tools_search.register(mcp)
    tools_synthesis.register(mcp)
    tools_stats.register(mcp)
    tools_prosthetic.register(mcp)
    tools_github.register(mcp)
    tools_analytics.register(mcp)
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must disclose all behavioral traits. It explains the search functionality and modes but does not mention whether the tool is read-only, requires authentication, has rate limits, or any side effects. It is adequate but lacks safety/permission details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and front-loaded with the overall purpose. It then breaks down modes and parameters efficiently. Minor redundancy exists with the opening sentence being somewhat repeated in the mode list, but overall every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multiple modes, 4 parameters, no annotations) and the existence of an output schema, the description is quite complete. It covers all parameters and mode behaviors. It does not explain output format or edge cases, but the output schema fills that gap. Slightly more detail on scaling or error conditions would improve, but it's sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 0% description coverage, so the description fully compensates by explaining each parameter's role. It details how 'query' is used for code search or as conversation_id, 'project' for timeline/conversations, 'mode' with four clear options, and 'limit' as max results. This adds significant meaning beyond the raw JSON schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches GitHub repos, commits, and cross-references with conversations. It lists specific modes (timeline, conversations, code, validate), which differentiates it from sibling tools like 'search_conversations' or 'unified_search' that focus only on conversation data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives explicit guidance for each mode, including how to pass parameters (e.g., 'Pass conversation_id as query' for validate mode). However, it does not directly compare to sibling tools or state when not to use this tool, leaving some ambiguity about alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mordechaipotash/brain-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server