merge_entities
Merge duplicate entities in academic literature databases by transferring references from one entity to another and removing duplicates to maintain clean data.
Instructions
手动合并两个实体
将 from_entity 的所有引用迁移到 to_entity,然后删除 from_entity。
Args: from_entity_id: 要被合并的实体 ID to_entity_id: 目标实体 ID reason: 合并原因
Returns: 操作结果
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| from_entity_id | Yes | ||
| to_entity_id | Yes | ||
| reason | Yes |
Implementation Reference
- The core execution logic of the merge_entities tool. It validates the existence and type compatibility of the two entities, performs the merge using the execute_merge helper, handles errors, and returns a standardized output using MergeEntitiesOut.@mcp.tool() def merge_entities(from_entity_id: int, to_entity_id: int, reason: str) -> dict[str, Any]: """手动合并两个实体 将 from_entity 的所有引用迁移到 to_entity,然后删除 from_entity。 Args: from_entity_id: 要被合并的实体 ID to_entity_id: 目标实体 ID reason: 合并原因 Returns: 操作结果 """ try: # 验证两个实体都存在 from_entity = query_one( "SELECT entity_id, type FROM entities WHERE entity_id = %s", (from_entity_id,) ) to_entity = query_one( "SELECT entity_id, type FROM entities WHERE entity_id = %s", (to_entity_id,) ) if not from_entity: return MergeEntitiesOut( ok=False, error=MCPErrorModel(code="NOT_FOUND", message=f"From entity {from_entity_id} not found"), ).model_dump() if not to_entity: return MergeEntitiesOut( ok=False, error=MCPErrorModel(code="NOT_FOUND", message=f"To entity {to_entity_id} not found"), ).model_dump() # 类型检查(可选:只允许同类型合并) if from_entity["type"] != to_entity["type"]: return MergeEntitiesOut( ok=False, error=MCPErrorModel( code="VALIDATION_ERROR", message=f"Cannot merge different types: {from_entity['type']} -> {to_entity['type']}" ), ).model_dump() with get_db() as conn: execute_merge(conn, to_entity_id, [from_entity_id], reason) return MergeEntitiesOut(ok=True).model_dump() except Exception as e: return MergeEntitiesOut( ok=False, error=MCPErrorModel(code="DB_CONN_ERROR", message=str(e)), ).model_dump()
- Pydantic input (MergeEntitiesIn) and output (MergeEntitiesOut) schemas for the merge_entities tool, defining parameters and response structure including error handling.class MergeEntitiesIn(BaseModel): """merge_entities 输入""" from_entity_id: int to_entity_id: int reason: str class MergeEntitiesOut(BaseModel): """merge_entities 输出""" ok: bool error: Optional[MCPErrorModel] = None
- src/paperlib_mcp/server.py:41-41 (registration)Top-level registration call in the main MCP server that invokes the function to register graph canonicalization tools, including merge_entities.register_graph_canonicalize_tools(mcp)
- Supporting helper function that executes the actual database merges: updates mentions, relations, aliases to point to winner, logs the merge, and deletes loser entities.def execute_merge(conn, winner_id: int, loser_ids: list[int], reason: str = "auto_canonicalize"): """执行实体合并""" with conn.cursor() as cur: # 1. Remap mentions cur.execute( "UPDATE mentions SET entity_id = %s WHERE entity_id = ANY(%s)", (winner_id, loser_ids) ) # 2. Remap relations (both subj and obj) cur.execute( "UPDATE relations SET subj_entity_id = %s WHERE subj_entity_id = ANY(%s)", (winner_id, loser_ids) ) cur.execute( "UPDATE relations SET obj_entity_id = %s WHERE obj_entity_id = ANY(%s)", (winner_id, loser_ids) ) # 3. Remap aliases cur.execute( "UPDATE entity_aliases SET entity_id = %s WHERE entity_id = ANY(%s)", (winner_id, loser_ids) ) # 4. 记录合并日志 for loser_id in loser_ids: cur.execute( """ INSERT INTO entity_merge_log(from_entity_id, to_entity_id, reason) VALUES (%s, %s, %s) """, (loser_id, winner_id, reason) ) # 5. 删除被合并的实体 cur.execute( "DELETE FROM entities WHERE entity_id = ANY(%s)", (loser_ids,) )
- src/paperlib_mcp/server.py:20-20 (registration)Import of the registration function for graph canonicalize tools (containing merge_entities) in the main server.from paperlib_mcp.tools.graph_canonicalize import register_graph_canonicalize_tools