Skip to main content
Glama

build_community_evidence_pack

Collects evidence from academic literature by sampling chunks from top community mentions to build structured evidence packs for research analysis.

Instructions

为社区构建证据包

从社区 top entities 的 mentions 中采样 chunks,写入证据包。

Args: comm_id: 社区 ID max_chunks: 最大 chunk 数量,默认 100 per_doc_limit: 每篇文档最多 chunk 数,默认 4

Returns: 证据包信息,包含 pack_id、文档数和 chunk 数

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
comm_idYes
max_chunksNo
per_doc_limitNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The core execution logic for the 'build_community_evidence_pack' tool. Validates community, retrieves top-weighted entity members, samples high-confidence mentions/chunks with limits, creates an evidence pack in the database, and returns pack metadata.
    @mcp.tool()
    def build_community_evidence_pack(
        comm_id: int,
        max_chunks: int = 100,
        per_doc_limit: int = 4,
    ) -> dict[str, Any]:
        """为社区构建证据包
        
        从社区 top entities 的 mentions 中采样 chunks,写入证据包。
        
        Args:
            comm_id: 社区 ID
            max_chunks: 最大 chunk 数量,默认 100
            per_doc_limit: 每篇文档最多 chunk 数,默认 4
            
        Returns:
            证据包信息,包含 pack_id、文档数和 chunk 数
        """
        try:
            # 验证社区存在
            community = query_one(
                "SELECT comm_id, level FROM communities WHERE comm_id = %s",
                (comm_id,)
            )
            
            if not community:
                return BuildCommunityEvidencePackOut(
                    pack_id=0,
                    docs=0,
                    chunks=0,
                    error=MCPErrorModel(code="NOT_FOUND", message=f"Community {comm_id} not found"),
                ).model_dump()
            
            # 获取社区成员(按权重排序)
            members = query_all(
                """
                SELECT entity_id, weight
                FROM community_members
                WHERE comm_id = %s
                ORDER BY weight DESC
                """,
                (comm_id,)
            )
            
            if not members:
                return BuildCommunityEvidencePackOut(
                    pack_id=0,
                    docs=0,
                    chunks=0,
                    error=MCPErrorModel(code="NOT_FOUND", message="No members in community"),
                ).model_dump()
            
            entity_ids = [m["entity_id"] for m in members]
            
            # 获取这些实体的 mentions -> chunks
            mentions = query_all(
                """
                SELECT m.doc_id, m.chunk_id, MAX(m.confidence) AS conf
                FROM mentions m
                WHERE m.entity_id = ANY(%s)
                GROUP BY m.doc_id, m.chunk_id
                ORDER BY conf DESC
                LIMIT 5000
                """,
                (entity_ids,)
            )
            
            if not mentions:
                return BuildCommunityEvidencePackOut(
                    pack_id=0,
                    docs=0,
                    chunks=0,
                    error=MCPErrorModel(code="NOT_FOUND", message="No mentions found for community entities"),
                ).model_dump()
            
            # 应用 per_doc_limit
            doc_counts: dict[str, int] = defaultdict(int)
            selected_chunks: list[tuple[str, int]] = []
            
            for m in mentions:
                if doc_counts[m["doc_id"]] < per_doc_limit:
                    selected_chunks.append((m["doc_id"], m["chunk_id"]))
                    doc_counts[m["doc_id"]] += 1
                    
                    if len(selected_chunks) >= max_chunks:
                        break
            
            if not selected_chunks:
                return BuildCommunityEvidencePackOut(
                    pack_id=0,
                    docs=0,
                    chunks=0,
                    error=MCPErrorModel(code="NOT_FOUND", message="No chunks selected"),
                ).model_dump()
            
            # 创建证据包
            with get_db() as conn:
                with conn.cursor() as cur:
                    # 创建包
                    cur.execute(
                        """
                        INSERT INTO evidence_packs(query, params_json)
                        VALUES (%s, %s::jsonb)
                        RETURNING pack_id
                        """,
                        (
                            f"Community {comm_id} evidence",
                            json.dumps({
                                "comm_id": comm_id,
                                "max_chunks": max_chunks,
                                "per_doc_limit": per_doc_limit,
                            })
                        )
                    )
                    result = cur.fetchone()
                    pack_id = result["pack_id"]
                    
                    # 写入条目
                    for rank, (doc_id, chunk_id) in enumerate(selected_chunks):
                        cur.execute(
                            """
                            INSERT INTO evidence_pack_items(pack_id, doc_id, chunk_id, rank)
                            VALUES (%s, %s, %s, %s)
                            ON CONFLICT DO NOTHING
                            """,
                            (pack_id, doc_id, chunk_id, rank)
                        )
            
            return BuildCommunityEvidencePackOut(
                pack_id=pack_id,
                docs=len(doc_counts),
                chunks=len(selected_chunks),
            ).model_dump()
            
        except Exception as e:
            return BuildCommunityEvidencePackOut(
                pack_id=0,
                docs=0,
                chunks=0,
                error=MCPErrorModel(code="DB_CONN_ERROR", message=str(e)),
            ).model_dump()
  • Pydantic models defining the input parameters (comm_id, max_chunks, per_doc_limit) and output structure (pack_id, docs, chunks, optional error) for the tool.
    class BuildCommunityEvidencePackIn(BaseModel):
        """build_community_evidence_pack 输入"""
        comm_id: int
        max_chunks: int = 100
        per_doc_limit: int = 4
    
    
    class BuildCommunityEvidencePackOut(BaseModel):
        """build_community_evidence_pack 输出"""
        pack_id: int
        docs: int
        chunks: int
        error: Optional[MCPErrorModel] = None
  • Invocation of register_graph_community_tools on the MCP instance, which defines and registers the build_community_evidence_pack tool using @mcp.tool() decorator.
    register_graph_community_tools(mcp)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It states the tool writes ('写入证据包') which implies a mutation operation, but doesn't disclose behavioral traits like: whether this requires specific permissions, if it's idempotent, what happens if the community doesn't exist, rate limits, or error conditions. The description adds minimal behavioral context beyond the basic operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized with clear structure: purpose statement, operation explanation, parameter definitions, and return information. Every sentence adds value. It could be slightly more front-loaded by moving the parameter explanations into the main description flow rather than a separate Args section.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (community-based evidence pack creation), no annotations, but with output schema present (so return values don't need explanation), the description is reasonably complete. It covers purpose, parameters, and basic operation. However, for a mutation tool with no annotations, it should ideally include more about permissions, idempotency, or error handling.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must compensate - and it does by explaining all three parameters in the Args section: comm_id (community ID), max_chunks (maximum chunk count with default 100), and per_doc_limit (chunks per document limit with default 4). This adds crucial meaning beyond the bare schema types. However, it doesn't explain parameter interactions or constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: '为社区构建证据包' (build evidence pack for community) and specifies it samples chunks from community top entities' mentions. It distinguishes from obvious siblings like 'build_evidence_pack' (no community focus) and 'get_evidence_pack_info' (read-only). However, it doesn't explicitly differentiate from all potential siblings like 'collect_evidence' or 'select_high_value_chunks'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context: when you need to create an evidence pack for a specific community by sampling from its top entities. However, it provides no explicit guidance on when to use this versus alternatives like 'build_evidence_pack' (non-community) or 'collect_evidence', nor does it mention prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paperlib-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server