Skip to main content
Glama

summarize_webpage

Extract and condense webpage content to a specified length, providing concise summaries for quick understanding. Input a URL and set target compression ratio for tailored results.

Instructions

抓取网页内容并总结为指定比例的长度(默认20%)

Args:
    url: 要抓取和总结的网页URL
    target_ratio: 目标压缩比例,0.1-1.0之间

Returns:
    网页内容总结

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
target_ratioNo
urlYes

Implementation Reference

  • The primary handler function for the 'summarize_webpage' tool. It validates parameters, scrapes the webpage using the WebScraper instance, summarizes the title and content using the ContentSummarizer instance, and returns a formatted summary with title and URL.
    @mcp.tool()
    async def summarize_webpage(url: str, ctx: Context, target_ratio: float = 0.2) -> str:
        """
        抓取网页内容并总结为指定比例的长度(默认20%)
        
        Args:
            url: 要抓取和总结的网页URL
            target_ratio: 目标压缩比例,0.1-1.0之间
        
        Returns:
            网页内容总结
        """
        try:
            # 验证参数
            if not 0.1 <= target_ratio <= 1.0:
                return "错误: target_ratio 必须在 0.1 到 1.0 之间"
            
            ctx.info(f"开始抓取并总结网页: {url}")
            
            # 先抓取网页
            title, content = await scraper.scrape_url(url)
            
            # 然后总结
            full_content = f"网页标题: {title}\n\n{content}"
            summary = await summarizer.summarize_content(full_content, target_ratio)
            
            ctx.info("网页抓取和总结完成")
            return f"网页: {url}\n标题: {title}\n\n总结:\n{summary}"
            
        except Exception as e:
            logger.error(f"网页总结失败: {e}")
            return f"网页总结失败: {str(e)}"
  • The @mcp.tool() decorator registers the summarize_webpage function as an MCP tool.
    @mcp.tool()
  • Function signature and docstring defining the input parameters (url: str, ctx: Context, target_ratio: float=0.2) and output (str), serving as the tool schema.
    async def summarize_webpage(url: str, ctx: Context, target_ratio: float = 0.2) -> str:
        """
        抓取网页内容并总结为指定比例的长度(默认20%)
        
        Args:
            url: 要抓取和总结的网页URL
            target_ratio: 目标压缩比例,0.1-1.0之间
        
        Returns:
            网页内容总结
        """
  • WebScraper.scrape_url: Supporting function that fetches and parses the webpage HTML using httpx and BeautifulSoup, extracts title and cleans the text content.
    async def scrape_url(self, url: str) -> tuple[str, str]:
        """
        抓取网页内容
        
        Args:
            url: 目标URL
            
        Returns:
            (title, content): 网页标题和清理后的文本内容
        """
        try:
            response = await self.session.get(url)
            response.raise_for_status()
            
            # 使用BeautifulSoup解析HTML
            soup = BeautifulSoup(response.text, 'html.parser')
            
            # 获取标题
            title = soup.find('title')
            title = title.get_text().strip() if title else "无标题"
            
            # 移除script和style标签
            for script in soup(["script", "style"]):
                script.decompose()
            
            # 提取主要内容
            content = soup.get_text()
            
            # 清理文本
            lines = (line.strip() for line in content.splitlines())
            chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
            content = ' '.join(chunk for chunk in chunks if chunk)
            
            return title, content
            
        except Exception as e:
            logger.error(f"网页抓取失败 {url}: {e}")
            raise Exception(f"无法抓取网页: {str(e)}")
  • ContentSummarizer.summarize_content: Supporting function that uses OpenAI-compatible API (MiniMax) to summarize the provided content based on target ratio, with custom prompt option.
        async def summarize_content(self, content: str, target_ratio: float = 0.2, 
                                  custom_prompt: str = None) -> str:
            """
            使用大模型总结内容
            
            Args:
                content: 要总结的内容
                target_ratio: 目标压缩比例 (默认20%)
                custom_prompt: 自定义总结提示词
            
            Returns:
                总结后的内容
            """
            try:
                # 检查内容长度,避免超出限制
                if len(content) > MAX_INPUT_TOKENS * 3:  # 粗略估算token
                    content = content[:MAX_INPUT_TOKENS * 3]
                    logger.warning("内容过长,已截断")
                
                # 构建总结提示词
                if custom_prompt:
                    prompt = custom_prompt
                else:
                    target_length = min(max(int(len(content) * target_ratio), 100), 1000)
    
                    prompt = f"""请将以下内容总结为约{target_length}字的精炼版本,保留核心信息和关键要点:
    
    {content}
    
    总结要求:
    1. 保持原文的主要观点和逻辑结构
    2. 去除冗余和次要信息
    3. 使用简洁明了的语言
    4. 确保信息的准确性和完整性"""
    
                response = self.client.chat.completions.create(
                    model=OPENAI_MODEL,
                    messages=[
                        {"role": "system", "content": "你是一个专业的内容总结专家,擅长将长文本压缩为精炼的摘要。"},
                        {"role": "user", "content": prompt}
                    ],
                    max_tokens=MAX_OUTPUT_TOKENS,
                    temperature=0.1
                )
                
                return response.choices[0].message.content.strip()
                
            except Exception as e:
                logger.error(f"内容总结失败: {e}")
                return f"总结失败: {str(e)}"
Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/yzfly/fullscope-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server