Skip to main content
Glama

merge_entities

Merge duplicate entities in academic literature databases by transferring references from one entity to another and removing duplicates to maintain clean data.

Instructions

手动合并两个实体

将 from_entity 的所有引用迁移到 to_entity,然后删除 from_entity。

Args: from_entity_id: 要被合并的实体 ID to_entity_id: 目标实体 ID reason: 合并原因

Returns: 操作结果

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
from_entity_idYes
to_entity_idYes
reasonYes

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The core execution logic of the merge_entities tool. It validates the existence and type compatibility of the two entities, performs the merge using the execute_merge helper, handles errors, and returns a standardized output using MergeEntitiesOut.
    @mcp.tool()
    def merge_entities(from_entity_id: int, to_entity_id: int, reason: str) -> dict[str, Any]:
        """手动合并两个实体
        
        将 from_entity 的所有引用迁移到 to_entity,然后删除 from_entity。
        
        Args:
            from_entity_id: 要被合并的实体 ID
            to_entity_id: 目标实体 ID
            reason: 合并原因
            
        Returns:
            操作结果
        """
        try:
            # 验证两个实体都存在
            from_entity = query_one(
                "SELECT entity_id, type FROM entities WHERE entity_id = %s",
                (from_entity_id,)
            )
            to_entity = query_one(
                "SELECT entity_id, type FROM entities WHERE entity_id = %s",
                (to_entity_id,)
            )
            
            if not from_entity:
                return MergeEntitiesOut(
                    ok=False,
                    error=MCPErrorModel(code="NOT_FOUND", message=f"From entity {from_entity_id} not found"),
                ).model_dump()
            
            if not to_entity:
                return MergeEntitiesOut(
                    ok=False,
                    error=MCPErrorModel(code="NOT_FOUND", message=f"To entity {to_entity_id} not found"),
                ).model_dump()
            
            # 类型检查(可选:只允许同类型合并)
            if from_entity["type"] != to_entity["type"]:
                return MergeEntitiesOut(
                    ok=False,
                    error=MCPErrorModel(
                        code="VALIDATION_ERROR",
                        message=f"Cannot merge different types: {from_entity['type']} -> {to_entity['type']}"
                    ),
                ).model_dump()
            
            with get_db() as conn:
                execute_merge(conn, to_entity_id, [from_entity_id], reason)
            
            return MergeEntitiesOut(ok=True).model_dump()
            
        except Exception as e:
            return MergeEntitiesOut(
                ok=False,
                error=MCPErrorModel(code="DB_CONN_ERROR", message=str(e)),
            ).model_dump()
  • Pydantic input (MergeEntitiesIn) and output (MergeEntitiesOut) schemas for the merge_entities tool, defining parameters and response structure including error handling.
    class MergeEntitiesIn(BaseModel):
        """merge_entities 输入"""
        from_entity_id: int
        to_entity_id: int
        reason: str
    
    
    class MergeEntitiesOut(BaseModel):
        """merge_entities 输出"""
        ok: bool
        error: Optional[MCPErrorModel] = None
  • Top-level registration call in the main MCP server that invokes the function to register graph canonicalization tools, including merge_entities.
    register_graph_canonicalize_tools(mcp)
  • Supporting helper function that executes the actual database merges: updates mentions, relations, aliases to point to winner, logs the merge, and deletes loser entities.
    def execute_merge(conn, winner_id: int, loser_ids: list[int], reason: str = "auto_canonicalize"):
        """执行实体合并"""
        with conn.cursor() as cur:
            # 1. Remap mentions
            cur.execute(
                "UPDATE mentions SET entity_id = %s WHERE entity_id = ANY(%s)",
                (winner_id, loser_ids)
            )
            
            # 2. Remap relations (both subj and obj)
            cur.execute(
                "UPDATE relations SET subj_entity_id = %s WHERE subj_entity_id = ANY(%s)",
                (winner_id, loser_ids)
            )
            cur.execute(
                "UPDATE relations SET obj_entity_id = %s WHERE obj_entity_id = ANY(%s)",
                (winner_id, loser_ids)
            )
            
            # 3. Remap aliases
            cur.execute(
                "UPDATE entity_aliases SET entity_id = %s WHERE entity_id = ANY(%s)",
                (winner_id, loser_ids)
            )
            
            # 4. 记录合并日志
            for loser_id in loser_ids:
                cur.execute(
                    """
                    INSERT INTO entity_merge_log(from_entity_id, to_entity_id, reason)
                    VALUES (%s, %s, %s)
                    """,
                    (loser_id, winner_id, reason)
                )
            
            # 5. 删除被合并的实体
            cur.execute(
                "DELETE FROM entities WHERE entity_id = ANY(%s)",
                (loser_ids,)
            )
  • Import of the registration function for graph canonicalize tools (containing merge_entities) in the main server.
    from paperlib_mcp.tools.graph_canonicalize import register_graph_canonicalize_tools
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must fully disclose behavioral traits. It states the tool performs a destructive operation (deleting from_entity) and migrates references, which is helpful. However, it lacks critical details: whether this operation is reversible, what permissions are required, how it handles errors (e.g., if entities don't exist), or any rate limits. For a destructive tool with no annotations, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately concise and well-structured: a brief purpose statement followed by parameter and return explanations. Every sentence adds value without redundancy. However, it could be slightly more front-loaded by emphasizing the destructive nature earlier.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (destructive merge operation), no annotations, and an output schema (which covers return values), the description is moderately complete. It explains the core action and parameters but lacks behavioral context like error handling or prerequisites. The output schema reduces the need to describe returns, but more guidance on usage and risks is warranted for such a tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, so the description must compensate. It provides clear semantics for all three parameters: from_entity_id (entity to be merged), to_entity_id (target entity), and reason (merge reason). This adds meaningful context beyond the basic schema types, though it doesn't specify format constraints (e.g., reason length or content). With 0% coverage, this is strong compensation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: '手动合并两个实体' (manually merge two entities) with specific actions: migrate references from from_entity to to_entity and delete from_entity. It distinguishes itself from siblings by focusing on entity merging rather than other operations like canonicalization or locking. However, it doesn't explicitly differentiate from tools like 'canonicalize_entities_v1' which might have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites, such as whether entities must be unlocked or in a specific state, or compare it to sibling tools like 'canonicalize_entities_v1' or 'lock_entity'. The lack of usage context leaves the agent to infer appropriate scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paperlib-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server