find_similar_news
Find news articles similar to a given title by adjusting similarity threshold and result limit.
Instructions
查找与指定新闻标题相似的其他新闻
Args: reference_title: 新闻标题(完整或部分) threshold: 相似度阈值,0-1之间,默认0.6 注意:阈值越高匹配越严格,返回结果越少 limit: 返回条数限制,默认50,最大100 注意:实际返回数量取决于相似度匹配结果,可能少于请求值 include_url: 是否包含URL链接,默认False(节省token)
Returns: JSON格式的相似新闻列表,包含相似度分数
重要:数据展示策略
本工具返回完整的相似新闻列表
默认展示方式:展示全部返回的新闻(包括相似度分数)
仅在用户明确要求"总结"或"挑重点"时才进行筛选
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| reference_title | Yes | ||
| threshold | No | ||
| limit | No | ||
| include_url | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |
Implementation Reference
- mcp_server/tools/analytics.py:910-1028 (handler)Actual handler implementation of find_similar_news in AnalyticsTools class. Reads titles from data service, computes similarity using SequenceMatcher, filters by threshold, sorts by similarity, and returns results.
def find_similar_news( self, reference_title: str, threshold: float = 0.6, limit: int = 50, include_url: bool = False ) -> Dict: """ 相似新闻查找 - 基于标题相似度查找相关新闻 Args: reference_title: 参考标题 threshold: 相似度阈值(0-1之间) limit: 返回条数限制,默认50 include_url: 是否包含URL链接,默认False(节省token) Returns: 相似新闻列表 Examples: 用户询问示例: - "找出和'特斯拉降价'相似的新闻" - "查找关于iPhone发布的类似报道" - "看看有没有和这条新闻相似的报道" 代码调用示例: >>> tools = AnalyticsTools() >>> result = tools.find_similar_news( ... reference_title="特斯拉宣布降价", ... threshold=0.6, ... limit=10 ... ) >>> print(result['similar_news']) """ try: # 参数验证 reference_title = validate_keyword(reference_title) if not 0 <= threshold <= 1: raise InvalidParameterError( "threshold 必须在 0 到 1 之间", suggestion="推荐值:0.5-0.8" ) limit = validate_limit(limit, default=50) # 读取数据 all_titles, id_to_name, _ = self.data_service.parser.read_all_titles_for_date() # 计算相似度 similar_items = [] for platform_id, titles in all_titles.items(): platform_name = id_to_name.get(platform_id, platform_id) for title, info in titles.items(): if title == reference_title: continue # 计算相似度 similarity = self._calculate_similarity(reference_title, title) if similarity >= threshold: news_item = { "title": title, "platform": platform_id, "platform_name": platform_name, "similarity": round(similarity, 3), "rank": info["ranks"][0] if info["ranks"] else 0 } # 条件性添加 URL 字段 if include_url: news_item["url"] = info.get("url", "") similar_items.append(news_item) # 按相似度排序 similar_items.sort(key=lambda x: x["similarity"], reverse=True) # 限制数量 result_items = similar_items[:limit] if not result_items: raise DataNotFoundError( f"未找到相似度超过 {threshold} 的新闻", suggestion="请降低相似度阈值或尝试其他标题" ) result = { "success": True, "summary": { "total_found": len(similar_items), "returned_count": len(result_items), "requested_limit": limit, "threshold": threshold, "reference_title": reference_title }, "similar_news": result_items } if len(similar_items) < limit: result["note"] = f"相似度阈值 {threshold} 下仅找到 {len(similar_items)} 条相似新闻" return result except MCPError as e: return { "success": False, "error": e.to_dict() } except Exception as e: return { "success": False, "error": { "code": "INTERNAL_ERROR", "message": str(e) } } - mcp_server/server.py:398-431 (registration)MCP tool registration via @mcp.tool decorator. The async function find_similar_news delegates to tools['analytics'].find_similar_news() and returns JSON.
@mcp.tool async def find_similar_news( reference_title: str, threshold: float = 0.6, limit: int = 50, include_url: bool = False ) -> str: """ 查找与指定新闻标题相似的其他新闻 Args: reference_title: 新闻标题(完整或部分) threshold: 相似度阈值,0-1之间,默认0.6 注意:阈值越高匹配越严格,返回结果越少 limit: 返回条数限制,默认50,最大100 注意:实际返回数量取决于相似度匹配结果,可能少于请求值 include_url: 是否包含URL链接,默认False(节省token) Returns: JSON格式的相似新闻列表,包含相似度分数 **重要:数据展示策略** - 本工具返回完整的相似新闻列表 - **默认展示方式**:展示全部返回的新闻(包括相似度分数) - 仅在用户明确要求"总结"或"挑重点"时才进行筛选 """ tools = _get_tools() result = tools['analytics'].find_similar_news( reference_title=reference_title, threshold=threshold, limit=limit, include_url=include_url ) return json.dumps(result, ensure_ascii=False, indent=2) - Helper method _calculate_similarity using difflib.SequenceMatcher.ratio() to compute text similarity.
def _calculate_similarity(self, text1: str, text2: str) -> float: """ 计算两个文本的相似度 Args: text1: 文本1 text2: 文本2 Returns: 相似度分数(0-1之间) """ # 使用 SequenceMatcher 计算相似度 return SequenceMatcher(None, text1, text2).ratio() - mcp_server/server.py:29-37 (helper)Registration/initialization of AnalyticsTools singleton in _get_tools() which is called by the MCP tool handler.
def _get_tools(project_root: Optional[str] = None): """获取或创建工具实例(单例模式)""" if not _tools_instances: _tools_instances['data'] = DataQueryTools(project_root) _tools_instances['analytics'] = AnalyticsTools(project_root) _tools_instances['search'] = SearchTools(project_root) _tools_instances['config'] = ConfigManagementTools(project_root) _tools_instances['system'] = SystemManagementTools(project_root) return _tools_instances - Function signature defining input parameters: reference_title (str), threshold (float, default 0.6), limit (int, default 50), include_url (bool, default False).
def find_similar_news( self, reference_title: str, threshold: float = 0.6, limit: int = 50, include_url: bool = False ) -> Dict: