Skip to main content
Glama
aigo666

MCP Development Framework

parse_markdown

Extract structured content from Markdown files including headings, lists, and text for processing in development workflows.

Instructions

解析Markdown文件内容,提取标题结构、列表和文本内容

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYesMarkdown文件的本地路径,例如'/path/to/document.md'

Implementation Reference

  • Main handler method for the parse_markdown tool. It validates input, processes the file path, and delegates to the private parsing method.
    async def execute(self, arguments: Dict[str, Any]) -> List[types.TextContent | types.ImageContent | types.EmbeddedResource]:
        """
        解析Markdown文件
        
        Args:
            arguments: 参数字典,必须包含'file_path'键
            
        Returns:
            解析结果列表
        """
        if "file_path" not in arguments:
            return [types.TextContent(
                type="text",
                text="错误: 缺少必要参数 'file_path'"
            )]
        
        # 处理文件路径,支持挂载目录的转换
        file_path = self.process_file_path(arguments["file_path"])
        
        return await self._parse_markdown_file(file_path)
  • Input schema defining the required 'file_path' parameter for the tool.
    input_schema = {
        "type": "object",
        "required": ["file_path"],
        "properties": {
            "file_path": {
                "type": "string",
                "description": "Markdown文件的本地路径,例如'/path/to/document.md'",
            }
        },
    }
  • Registers the tool class with the name 'parse_markdown' using ToolRegistry.
    @ToolRegistry.register
    class MarkdownTool(BaseTool):
        """
        用于解析Markdown文件的工具,提取文本内容、标题结构和列表等信息
        """
        
        name = "parse_markdown"
  • Private helper method containing the core logic for parsing Markdown files, including file validation, content reading, structure analysis, and error handling.
    async def _parse_markdown_file(self, file_path: str) -> List[types.TextContent | types.ImageContent | types.EmbeddedResource]:
        """
        解析Markdown文件内容
        
        Args:
            file_path: Markdown文件路径
            
        Returns:
            Markdown文件内容解析结果列表
        """
        results = []
        
        # 检查文件是否存在
        if not os.path.exists(file_path):
            return [types.TextContent(
                type="text",
                text=f"错误: 文件不存在: {file_path}\n请检查路径是否正确,并确保文件可访问。"
            )]
        
        # 检查文件扩展名
        if not file_path.lower().endswith('.md'):
            return [types.TextContent(
                type="text",
                text=f"错误: 不支持的文件格式: {file_path}\n仅支持.md格式的Markdown文件。"
            )]
        
        try:
            # 添加文件信息
            file_size_kb = os.path.getsize(file_path) / 1024
            results.append(types.TextContent(
                type="text",
                text=f"# Markdown文件解析\n\n文件大小: {file_size_kb:.2f} KB"
            ))
            
            # 读取文件内容
            with open(file_path, 'r', encoding='utf-8') as file:
                content = file.read()
            
            # 基本文件信息
            file_info = f"## 文件基本信息\n\n"
            file_info += f"- 文件名: {os.path.basename(file_path)}\n"
            file_info += f"- 路径: {file_path}\n"
            file_info += f"- 大小: {file_size_kb:.2f} KB\n"
            file_info += f"- 最后修改时间: {os.path.getmtime(file_path)}\n"
            
            results.append(types.TextContent(
                type="text",
                text=file_info
            ))
            
            # 解析Markdown内容结构
            structure = self._analyze_markdown_structure(content)
            results.append(types.TextContent(
                type="text",
                text=structure
            ))
            
            # 添加原始内容
            results.append(types.TextContent(
                type="text",
                text=f"## 原始Markdown内容\n\n```markdown\n{content}\n```"
            ))
            
            # 添加处理完成的提示
            results.append(types.TextContent(
                type="text",
                text="Markdown文件处理完成!"
            ))
            
            return results
        except Exception as e:
            error_details = traceback.format_exc()
            return [types.TextContent(
                type="text",
                text=f"错误: 解析Markdown文件失败: {str(e)}\n"
                     f"可能的原因:\n"
                     f"1. 文件编码不兼容\n"
                     f"2. 文件已损坏\n"
                     f"3. 文件内容格式异常\n\n"
                     f"详细错误信息: {error_details}"
            )]
  • Supporting utility that analyzes the structure of Markdown content, extracting headings, counting code blocks, lists, links, images, and tables.
    def _analyze_markdown_structure(self, content: str) -> str:
        """
        分析Markdown文件结构
        
        Args:
            content: Markdown文件内容
            
        Returns:
            结构分析结果
        """
        lines = content.split('\n')
        
        # 分析标题
        headings = {
            "h1": [],
            "h2": [],
            "h3": [],
            "h4": [],
            "h5": [],
            "h6": []
        }
        
        # 计数
        code_blocks = 0
        lists = 0
        links = 0
        images = 0
        tables = 0
        
        in_code_block = False
        
        for line in lines:
            line = line.strip()
            
            # 检测代码块
            if line.startswith('```'):
                in_code_block = not in_code_block
                if not in_code_block:
                    code_blocks += 1
                continue
                
            if in_code_block:
                continue
                
            # 检测标题
            if line.startswith('# '):
                headings["h1"].append(line[2:])
            elif line.startswith('## '):
                headings["h2"].append(line[3:])
            elif line.startswith('### '):
                headings["h3"].append(line[4:])
            elif line.startswith('#### '):
                headings["h4"].append(line[5:])
            elif line.startswith('##### '):
                headings["h5"].append(line[6:])
            elif line.startswith('###### '):
                headings["h6"].append(line[7:])
                
            # 检测列表
            if line.startswith('- ') or line.startswith('* ') or line.startswith('+ ') or \
               (line and line[0].isdigit() and '.' in line[:3]):
                lists += 1
                
            # 检测链接和图片
            if '](' in line:
                if line.count('![') > 0:
                    images += line.count('![')
                links += line.count('](') - line.count('![')
                
            # 检测表格
            if line.startswith('|') and line.endswith('|'):
                tables += 1
                
        # 生成结构报告
        structure = "## Markdown结构分析\n\n"
        
        # 标题结构
        structure += "### 标题结构\n\n"
        has_headings = False
        for level, titles in headings.items():
            if titles:
                has_headings = True
                indent = "  " * (int(level[1]) - 1)
                for title in titles:
                    structure += f"{indent}- {title}\n"
                    
        if not has_headings:
            structure += "文档中未检测到标题结构\n"
            
        # 内容元素统计
        structure += "\n### 内容元素统计\n\n"
        structure += f"- 代码块: {code_blocks} 个\n"
        structure += f"- 列表项: {lists} 个\n"
        structure += f"- 链接: {links} 个\n"
        structure += f"- 图片: {images} 个\n"
        structure += f"- 表格行: {tables} 行\n"
        
        return structure 
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states what the tool does (parsing and extraction) but doesn't describe behavioral traits like error handling (e.g., what happens with invalid file paths or non-Markdown files), performance characteristics, output format, or whether it modifies files. The description is functional but lacks transparency about how the tool behaves in different scenarios.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: a single sentence in Chinese that efficiently states the tool's purpose and extraction targets. It's front-loaded with the core function and wastes no words. Every element earns its place by specifying what is parsed (Markdown files) and what is extracted (headings, lists, text).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (parsing and extracting structured data from files), lack of annotations, and no output schema, the description is incomplete. It doesn't explain what the extracted data looks like (e.g., structured JSON, plain text), error cases, or limitations. For a tool that presumably returns parsed content, the absence of output information is a significant gap, making it inadequate for an agent to understand the full context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with the single parameter 'file_path' well-documented in the schema. The description adds no additional parameter information beyond what's in the schema. According to scoring rules, when schema_description_coverage is high (>80%), the baseline is 3 even with no param info in the description. The description doesn't compensate but doesn't need to given the schema's completeness.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: '解析Markdown文件内容' (parse Markdown file content) with specific extraction targets: '提取标题结构、列表和文本内容' (extract heading structure, lists, and text content). It distinguishes itself from sibling tools like parse_csv and parse_pdf by specifying Markdown format, though it doesn't explicitly differentiate from parse_file which might handle multiple formats. The purpose is specific but could be more precise about sibling differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention when to choose parse_markdown over parse_file (which might handle Markdown among other formats) or other parsing tools like parse_pdf. There's no context about prerequisites, file format requirements, or error conditions. Usage is implied by the tool name and description but lacks explicit guidelines.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/aigo666/mcp-framework'

If you have feedback or need assistance with the MCP directory API, please join our Discord server