Skip to main content
Glama

merge_entities

Merge duplicate entities in academic literature databases by transferring references from one entity to another and removing duplicates to maintain clean data.

Instructions

手动合并两个实体

将 from_entity 的所有引用迁移到 to_entity,然后删除 from_entity。

Args: from_entity_id: 要被合并的实体 ID to_entity_id: 目标实体 ID reason: 合并原因

Returns: 操作结果

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
from_entity_idYes
to_entity_idYes
reasonYes

Implementation Reference

  • The core execution logic of the merge_entities tool. It validates the existence and type compatibility of the two entities, performs the merge using the execute_merge helper, handles errors, and returns a standardized output using MergeEntitiesOut.
    @mcp.tool()
    def merge_entities(from_entity_id: int, to_entity_id: int, reason: str) -> dict[str, Any]:
        """手动合并两个实体
        
        将 from_entity 的所有引用迁移到 to_entity,然后删除 from_entity。
        
        Args:
            from_entity_id: 要被合并的实体 ID
            to_entity_id: 目标实体 ID
            reason: 合并原因
            
        Returns:
            操作结果
        """
        try:
            # 验证两个实体都存在
            from_entity = query_one(
                "SELECT entity_id, type FROM entities WHERE entity_id = %s",
                (from_entity_id,)
            )
            to_entity = query_one(
                "SELECT entity_id, type FROM entities WHERE entity_id = %s",
                (to_entity_id,)
            )
            
            if not from_entity:
                return MergeEntitiesOut(
                    ok=False,
                    error=MCPErrorModel(code="NOT_FOUND", message=f"From entity {from_entity_id} not found"),
                ).model_dump()
            
            if not to_entity:
                return MergeEntitiesOut(
                    ok=False,
                    error=MCPErrorModel(code="NOT_FOUND", message=f"To entity {to_entity_id} not found"),
                ).model_dump()
            
            # 类型检查(可选:只允许同类型合并)
            if from_entity["type"] != to_entity["type"]:
                return MergeEntitiesOut(
                    ok=False,
                    error=MCPErrorModel(
                        code="VALIDATION_ERROR",
                        message=f"Cannot merge different types: {from_entity['type']} -> {to_entity['type']}"
                    ),
                ).model_dump()
            
            with get_db() as conn:
                execute_merge(conn, to_entity_id, [from_entity_id], reason)
            
            return MergeEntitiesOut(ok=True).model_dump()
            
        except Exception as e:
            return MergeEntitiesOut(
                ok=False,
                error=MCPErrorModel(code="DB_CONN_ERROR", message=str(e)),
            ).model_dump()
  • Pydantic input (MergeEntitiesIn) and output (MergeEntitiesOut) schemas for the merge_entities tool, defining parameters and response structure including error handling.
    class MergeEntitiesIn(BaseModel):
        """merge_entities 输入"""
        from_entity_id: int
        to_entity_id: int
        reason: str
    
    
    class MergeEntitiesOut(BaseModel):
        """merge_entities 输出"""
        ok: bool
        error: Optional[MCPErrorModel] = None
  • Top-level registration call in the main MCP server that invokes the function to register graph canonicalization tools, including merge_entities.
    register_graph_canonicalize_tools(mcp)
  • Supporting helper function that executes the actual database merges: updates mentions, relations, aliases to point to winner, logs the merge, and deletes loser entities.
    def execute_merge(conn, winner_id: int, loser_ids: list[int], reason: str = "auto_canonicalize"):
        """执行实体合并"""
        with conn.cursor() as cur:
            # 1. Remap mentions
            cur.execute(
                "UPDATE mentions SET entity_id = %s WHERE entity_id = ANY(%s)",
                (winner_id, loser_ids)
            )
            
            # 2. Remap relations (both subj and obj)
            cur.execute(
                "UPDATE relations SET subj_entity_id = %s WHERE subj_entity_id = ANY(%s)",
                (winner_id, loser_ids)
            )
            cur.execute(
                "UPDATE relations SET obj_entity_id = %s WHERE obj_entity_id = ANY(%s)",
                (winner_id, loser_ids)
            )
            
            # 3. Remap aliases
            cur.execute(
                "UPDATE entity_aliases SET entity_id = %s WHERE entity_id = ANY(%s)",
                (winner_id, loser_ids)
            )
            
            # 4. 记录合并日志
            for loser_id in loser_ids:
                cur.execute(
                    """
                    INSERT INTO entity_merge_log(from_entity_id, to_entity_id, reason)
                    VALUES (%s, %s, %s)
                    """,
                    (loser_id, winner_id, reason)
                )
            
            # 5. 删除被合并的实体
            cur.execute(
                "DELETE FROM entities WHERE entity_id = ANY(%s)",
                (loser_ids,)
            )
  • Import of the registration function for graph canonicalize tools (containing merge_entities) in the main server.
    from paperlib_mcp.tools.graph_canonicalize import register_graph_canonicalize_tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paperlib-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server