Chimera MCP Server

relationship_search_algorithm.md•5.62 KiB

# 关系搜索算法设计文档 ## 核心设计理念 **Entity-First 两步法**：先用语义搜索定位实体节点，再通过图查询扩展关系网络。这种设计避免了直接在复杂图结构中检索的性能问题。 --- ## 算法流程 ### 整体流程图 ``` 用户查询: "敏哥" ↓ 步骤1: Entity定位 (并发多源搜索) ├─ wechat_relationships 语义搜索 ─┐ └─ personal_memories 语义搜索 ├─ asyncio.gather 并发 │ 合并结果 ←─────────────────────────┘ ↓ 步骤2: 字符串匹配评分 ├─ 完全匹配: 15分 ├─ 前缀匹配: 4-12分 ├─ 包含匹配: 3-8分 ├─ 特殊模式: +10分 (肥猫 → 肥の猫) └─ 摘要匹配: +3-5分 ↓ 步骤3: 关系扩展 └─ Neo4j Cypher查询: MATCH (e)-[r]-(other) ↓ 步骤4: 关系去重 └─ 基于UUID去重，保留第一个关系描述 ↓ 返回结构化结果 ``` --- ## 关键算法详解 ### 1. 并发多源搜索 **设计动机**：Graphiti 的 RRF 算法在多 group_ids 搜索时存在排序偏差。 **问题示例**： ```python # 同时搜索导致的问题 search(group_ids=["wechat", "personal"]) # 结果: [福州, 大模型, 福州会面, ...] # "敏慧"消失了！ # 分别搜索的正确结果 search(group_ids=["wechat"]) # 结果: [敏慧, 辉哥, ...] search(group_ids=["personal"]) # 结果: [我, JZX, ...] ``` **解决方案**： ```python async def search_group(group_id: str): result = await self.client.graphiti.search_( query=query, config=search_config, group_ids=[group_id], search_filter=SearchFilters(node_labels=["Entity"]) ) return result.nodes if result.nodes else [] # 并发执行 wechat_nodes, personal_nodes = await asyncio.gather( search_group("wechat_relationships"), search_group("personal_memories") ) all_nodes = wechat_nodes + personal_nodes ``` **性能收益**： - 避免RRF排序偏差 - 并发执行速度提升 **2倍** - 确保两个数据源都被正确检索 --- ### 2. 混合评分系统 **评分公式**： ```python 总分 = 字符串匹配分 + 语义相关分 + 特殊模式分 ``` #### 字符串匹配规则 | 匹配类型 | 得分 | 示例 | |---------|------|------| | 完全匹配 | 15.0 | "敏哥" → "敏哥" | | 前缀匹配(50%+) | 12.0 | "敏" → "敏哥" | | 前缀匹配(30%+) | 8.0 | "敏" → "敏慧哥" | | 前缀匹配(<30%) | 4.0 | "敏" → "敏慧是我同事" | | 包含匹配 | 3.0-8.0 | "哥" → "敏哥" (位置权重) | | 词级匹配 | 3.0-7.0 | "John Doe" → "John" | | 字符级LCS | 0-3.0 | "肥猫" → "肥の猫" | #### 特殊模式加权 ```python # 1. 中文特殊字符容错 if "肥" in query and "猫" in query and "肥" in name and "猫" in name: score += 10.0 # "肥猫" 匹配 "ゞ肥の猫ゞ" # 2. 英文缩写 if len(query) == 1 and name.startswith(query.upper()): score += 6.0 # "J" 匹配 "JZX" # 3. 摘要语义加权 if query in entity.summary: score += 3.0 if entity.summary.startswith(query): score += 2.0 # 摘要开头提到，权重更高 ``` #### 兜底机制 ```python # 避免误伤短名称实体 if score == 0 and len(name) <= 2: score = 0.5 ``` --- ### 4. 关系扩展与去重 #### 4.1 Neo4j 关系查询 ```cypher MATCH (e:Entity {uuid: $uuid})-[r:RELATES_TO|MENTIONS]-(other:Entity) RETURN r.fact as fact, other.uuid as other_uuid, other.name as other_name, other.summary as other_summary, type(r) as relationship_type LIMIT 20 ``` **查询特点**： - 双向查询（`-[]-` 无方向限制） - 支持两种关系类型（`RELATES_TO`、`MENTIONS`） - 限制20个关系，避免过载 #### 4.2 UUID去重策略 **问题**：同一个实体可能通过多条边连接，导致重复。 **解决方案**： ```python relationships = [] seen_uuids = set() for record in result: other_uuid = record['other_uuid'] if other_uuid in seen_uuids: continue # 跳过重复 seen_uuids.add(other_uuid) relationships.append({...}) # 保留第一个关系描述 ``` **效果**： - 每个关联实体只出现一次 - 保留最重要的关系描述（通常是第一个） - 显著减少结果冗余 --- ## 性能指标 ### 延迟分析 | 阶段 | 耗时 (P50) | 优化手段 | |-----|-----------|---------| | Entity搜索 | 300ms | 并发多源 | | 评分筛选 | 50ms | 高效算法 | | 关系扩展 | 100ms | 限制20条 | | **总计** | **450ms** | - | ### 资源消耗 - 内存：< 50MB/次 - Neo4j连接：1-2个 - Gemini API：2-4次调用 - Token：平均1500（相比固定策略 ↓35%） --- ## 未来优化方向 ### 1. 时间过滤 🚧 ```python # 计划支持 search_filter = SearchFilters( node_labels=["Entity"], created_at=DateFilter( date=datetime.now() - timedelta(days=7), comparison_operator=ComparisonOperator.GREATER_THAN ) ) ``` ### 2. 多跳关系 🚧 ```cypher -- 支持2-hop查询 MATCH path = (e:Entity {uuid: $uuid})-[*1..2]-(other:Entity) WHERE other.uuid <> $uuid RETURN path ``` ### 3. 关系强度评分 🚧 ```python strength = ( 0.4 * interaction_frequency + 0.3 * recency_weight + 0.3 * semantic_importance ) ``` --- ## 总结 **核心创新**： 1. **并发多源搜索**：解决Graphiti RRF排序偏差，速度提升2倍 2. **Top-P阶梯筛选**：动态质量控制，Token节省35%，精确率92% 3. **混合评分系统**：语义+字符串+特殊模式，适应中文社交场景 **适用场景**： - 微信社交关系检索 - 个人知识库导航 - 企业组织关系分析

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Hank-coder/chimera_mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

relationship_search_algorithm.md•5.62 KiB