Skip to main content
Glama
YangLang116

TrendRadar

by YangLang116

find_similar_news

Find news articles similar to a given title by adjusting similarity threshold and result limit.

Instructions

查找与指定新闻标题相似的其他新闻

Args: reference_title: 新闻标题(完整或部分) threshold: 相似度阈值,0-1之间,默认0.6 注意:阈值越高匹配越严格,返回结果越少 limit: 返回条数限制,默认50,最大100 注意:实际返回数量取决于相似度匹配结果,可能少于请求值 include_url: 是否包含URL链接,默认False(节省token)

Returns: JSON格式的相似新闻列表,包含相似度分数

重要:数据展示策略

  • 本工具返回完整的相似新闻列表

  • 默认展示方式:展示全部返回的新闻(包括相似度分数)

  • 仅在用户明确要求"总结"或"挑重点"时才进行筛选

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
reference_titleYes
thresholdNo
limitNo
include_urlNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • Actual handler implementation of find_similar_news in AnalyticsTools class. Reads titles from data service, computes similarity using SequenceMatcher, filters by threshold, sorts by similarity, and returns results.
    def find_similar_news(
        self,
        reference_title: str,
        threshold: float = 0.6,
        limit: int = 50,
        include_url: bool = False
    ) -> Dict:
        """
        相似新闻查找 - 基于标题相似度查找相关新闻
    
        Args:
            reference_title: 参考标题
            threshold: 相似度阈值(0-1之间)
            limit: 返回条数限制,默认50
            include_url: 是否包含URL链接,默认False(节省token)
    
        Returns:
            相似新闻列表
    
        Examples:
            用户询问示例:
            - "找出和'特斯拉降价'相似的新闻"
            - "查找关于iPhone发布的类似报道"
            - "看看有没有和这条新闻相似的报道"
    
            代码调用示例:
            >>> tools = AnalyticsTools()
            >>> result = tools.find_similar_news(
            ...     reference_title="特斯拉宣布降价",
            ...     threshold=0.6,
            ...     limit=10
            ... )
            >>> print(result['similar_news'])
        """
        try:
            # 参数验证
            reference_title = validate_keyword(reference_title)
    
            if not 0 <= threshold <= 1:
                raise InvalidParameterError(
                    "threshold 必须在 0 到 1 之间",
                    suggestion="推荐值:0.5-0.8"
                )
    
            limit = validate_limit(limit, default=50)
    
            # 读取数据
            all_titles, id_to_name, _ = self.data_service.parser.read_all_titles_for_date()
    
            # 计算相似度
            similar_items = []
    
            for platform_id, titles in all_titles.items():
                platform_name = id_to_name.get(platform_id, platform_id)
    
                for title, info in titles.items():
                    if title == reference_title:
                        continue
    
                    # 计算相似度
                    similarity = self._calculate_similarity(reference_title, title)
    
                    if similarity >= threshold:
                        news_item = {
                            "title": title,
                            "platform": platform_id,
                            "platform_name": platform_name,
                            "similarity": round(similarity, 3),
                            "rank": info["ranks"][0] if info["ranks"] else 0
                        }
    
                        # 条件性添加 URL 字段
                        if include_url:
                            news_item["url"] = info.get("url", "")
    
                        similar_items.append(news_item)
    
            # 按相似度排序
            similar_items.sort(key=lambda x: x["similarity"], reverse=True)
    
            # 限制数量
            result_items = similar_items[:limit]
    
            if not result_items:
                raise DataNotFoundError(
                    f"未找到相似度超过 {threshold} 的新闻",
                    suggestion="请降低相似度阈值或尝试其他标题"
                )
    
            result = {
                "success": True,
                "summary": {
                    "total_found": len(similar_items),
                    "returned_count": len(result_items),
                    "requested_limit": limit,
                    "threshold": threshold,
                    "reference_title": reference_title
                },
                "similar_news": result_items
            }
    
            if len(similar_items) < limit:
                result["note"] = f"相似度阈值 {threshold} 下仅找到 {len(similar_items)} 条相似新闻"
    
            return result
    
        except MCPError as e:
            return {
                "success": False,
                "error": e.to_dict()
            }
        except Exception as e:
            return {
                "success": False,
                "error": {
                    "code": "INTERNAL_ERROR",
                    "message": str(e)
                }
            }
  • MCP tool registration via @mcp.tool decorator. The async function find_similar_news delegates to tools['analytics'].find_similar_news() and returns JSON.
    @mcp.tool
    async def find_similar_news(
        reference_title: str,
        threshold: float = 0.6,
        limit: int = 50,
        include_url: bool = False
    ) -> str:
        """
        查找与指定新闻标题相似的其他新闻
    
        Args:
            reference_title: 新闻标题(完整或部分)
            threshold: 相似度阈值,0-1之间,默认0.6
                       注意:阈值越高匹配越严格,返回结果越少
            limit: 返回条数限制,默认50,最大100
                   注意:实际返回数量取决于相似度匹配结果,可能少于请求值
            include_url: 是否包含URL链接,默认False(节省token)
    
        Returns:
            JSON格式的相似新闻列表,包含相似度分数
    
        **重要:数据展示策略**
        - 本工具返回完整的相似新闻列表
        - **默认展示方式**:展示全部返回的新闻(包括相似度分数)
        - 仅在用户明确要求"总结"或"挑重点"时才进行筛选
        """
        tools = _get_tools()
        result = tools['analytics'].find_similar_news(
            reference_title=reference_title,
            threshold=threshold,
            limit=limit,
            include_url=include_url
        )
        return json.dumps(result, ensure_ascii=False, indent=2)
  • Helper method _calculate_similarity using difflib.SequenceMatcher.ratio() to compute text similarity.
    def _calculate_similarity(self, text1: str, text2: str) -> float:
        """
        计算两个文本的相似度
    
        Args:
            text1: 文本1
            text2: 文本2
    
        Returns:
            相似度分数(0-1之间)
        """
        # 使用 SequenceMatcher 计算相似度
        return SequenceMatcher(None, text1, text2).ratio()
  • Registration/initialization of AnalyticsTools singleton in _get_tools() which is called by the MCP tool handler.
    def _get_tools(project_root: Optional[str] = None):
        """获取或创建工具实例(单例模式)"""
        if not _tools_instances:
            _tools_instances['data'] = DataQueryTools(project_root)
            _tools_instances['analytics'] = AnalyticsTools(project_root)
            _tools_instances['search'] = SearchTools(project_root)
            _tools_instances['config'] = ConfigManagementTools(project_root)
            _tools_instances['system'] = SystemManagementTools(project_root)
        return _tools_instances
  • Function signature defining input parameters: reference_title (str), threshold (float, default 0.6), limit (int, default 50), include_url (bool, default False).
    def find_similar_news(
        self,
        reference_title: str,
        threshold: float = 0.6,
        limit: int = 50,
        include_url: bool = False
    ) -> Dict:
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully covers behavioral traits: threshold controls match strictness, limit may not be met, include_url saves tokens, and the display strategy clarifies default behavior. All important traits are disclosed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is structured with Args, Returns, and a strategy note. While clear and front-loaded, the strategy section is somewhat lengthy for a parameter definition but adds value. Minor improvement could be to separate strategy into usage notes.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and an existing output schema (context signal), the description covers return format, parameter behavior, and display strategy. It provides complete guidance for an agent to understand invocation and result handling.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so description must add meaning. It explains each parameter: threshold with effect and range, limit with caveat, include_url with token-saving rationale. The description compensates fully for lacking schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The tool name 'find_similar_news' and description '查找与指定新闻标题相似的其他新闻' clearly state the action (find) and resource (similar news). It distinguishes itself from sibling tools like search_news (general search) and get_latest_news (recent news) by focusing on similarity matching.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides parameter details and a display strategy ('默认展示方式' vs user request for summary), but does not explicitly state when to use this tool over alternatives like search_news or search_related_news_history. Usage context is implied but not directly guided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/YangLang116/TrendRadar'

If you have feedback or need assistance with the MCP directory API, please join our Discord server