Skip to main content
Glama
aigo666

MCP Development Framework

parse_markdown

Parses Markdown files to extract title structure, lists, and text content from a local file path.

Instructions

解析Markdown文件内容,提取标题结构、列表和文本内容

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYesMarkdown文件的本地路径,例如'/path/to/document.md'

Implementation Reference

  • The execute() method is the main handler for the parse_markdown tool. It validates the file_path argument, processes it, and delegates to _parse_markdown_file().
    async def execute(self, arguments: Dict[str, Any]) -> List[types.TextContent | types.ImageContent | types.EmbeddedResource]:
        """
        解析Markdown文件
        
        Args:
            arguments: 参数字典,必须包含'file_path'键
            
        Returns:
            解析结果列表
        """
        if "file_path" not in arguments:
            return [types.TextContent(
                type="text",
                text="错误: 缺少必要参数 'file_path'"
            )]
        
        # 处理文件路径,支持挂载目录的转换
        file_path = self.process_file_path(arguments["file_path"])
        
        return await self._parse_markdown_file(file_path)
  • The _parse_markdown_file() helper method reads the file, validates it's a .md file, extracts file info, analyzes structure, and returns the parsed results as TextContent items.
    async def _parse_markdown_file(self, file_path: str) -> List[types.TextContent | types.ImageContent | types.EmbeddedResource]:
        """
        解析Markdown文件内容
        
        Args:
            file_path: Markdown文件路径
            
        Returns:
            Markdown文件内容解析结果列表
        """
        results = []
        
        # 检查文件是否存在
        if not os.path.exists(file_path):
            return [types.TextContent(
                type="text",
                text=f"错误: 文件不存在: {file_path}\n请检查路径是否正确,并确保文件可访问。"
            )]
        
        # 检查文件扩展名
        if not file_path.lower().endswith('.md'):
            return [types.TextContent(
                type="text",
                text=f"错误: 不支持的文件格式: {file_path}\n仅支持.md格式的Markdown文件。"
            )]
        
        try:
            # 添加文件信息
            file_size_kb = os.path.getsize(file_path) / 1024
            results.append(types.TextContent(
                type="text",
                text=f"# Markdown文件解析\n\n文件大小: {file_size_kb:.2f} KB"
            ))
            
            # 读取文件内容
            with open(file_path, 'r', encoding='utf-8') as file:
                content = file.read()
            
            # 基本文件信息
            file_info = f"## 文件基本信息\n\n"
            file_info += f"- 文件名: {os.path.basename(file_path)}\n"
            file_info += f"- 路径: {file_path}\n"
            file_info += f"- 大小: {file_size_kb:.2f} KB\n"
            file_info += f"- 最后修改时间: {os.path.getmtime(file_path)}\n"
            
            results.append(types.TextContent(
                type="text",
                text=file_info
            ))
            
            # 解析Markdown内容结构
            structure = self._analyze_markdown_structure(content)
            results.append(types.TextContent(
                type="text",
                text=structure
            ))
            
            # 添加原始内容
            results.append(types.TextContent(
                type="text",
                text=f"## 原始Markdown内容\n\n```markdown\n{content}\n```"
            ))
            
            # 添加处理完成的提示
            results.append(types.TextContent(
                type="text",
                text="Markdown文件处理完成!"
            ))
            
            return results
        except Exception as e:
            error_details = traceback.format_exc()
            return [types.TextContent(
                type="text",
                text=f"错误: 解析Markdown文件失败: {str(e)}\n"
                     f"可能的原因:\n"
                     f"1. 文件编码不兼容\n"
                     f"2. 文件已损坏\n"
                     f"3. 文件内容格式异常\n\n"
                     f"详细错误信息: {error_details}"
            )]
  • The _analyze_markdown_structure() helper method parses headings (h1-h6), counts code blocks, lists, links, images, and table rows, and returns a structured analysis report.
    def _analyze_markdown_structure(self, content: str) -> str:
        """
        分析Markdown文件结构
        
        Args:
            content: Markdown文件内容
            
        Returns:
            结构分析结果
        """
        lines = content.split('\n')
        
        # 分析标题
        headings = {
            "h1": [],
            "h2": [],
            "h3": [],
            "h4": [],
            "h5": [],
            "h6": []
        }
        
        # 计数
        code_blocks = 0
        lists = 0
        links = 0
        images = 0
        tables = 0
        
        in_code_block = False
        
        for line in lines:
            line = line.strip()
            
            # 检测代码块
            if line.startswith('```'):
                in_code_block = not in_code_block
                if not in_code_block:
                    code_blocks += 1
                continue
                
            if in_code_block:
                continue
                
            # 检测标题
            if line.startswith('# '):
                headings["h1"].append(line[2:])
            elif line.startswith('## '):
                headings["h2"].append(line[3:])
            elif line.startswith('### '):
                headings["h3"].append(line[4:])
            elif line.startswith('#### '):
                headings["h4"].append(line[5:])
            elif line.startswith('##### '):
                headings["h5"].append(line[6:])
            elif line.startswith('###### '):
                headings["h6"].append(line[7:])
                
            # 检测列表
            if line.startswith('- ') or line.startswith('* ') or line.startswith('+ ') or \
               (line and line[0].isdigit() and '.' in line[:3]):
                lists += 1
                
            # 检测链接和图片
            if '](' in line:
                if line.count('![') > 0:
                    images += line.count('![')
                links += line.count('](') - line.count('![')
                
            # 检测表格
            if line.startswith('|') and line.endswith('|'):
                tables += 1
                
        # 生成结构报告
        structure = "## Markdown结构分析\n\n"
        
        # 标题结构
        structure += "### 标题结构\n\n"
        has_headings = False
        for level, titles in headings.items():
            if titles:
                has_headings = True
                indent = "  " * (int(level[1]) - 1)
                for title in titles:
                    structure += f"{indent}- {title}\n"
                    
        if not has_headings:
            structure += "文档中未检测到标题结构\n"
            
        # 内容元素统计
        structure += "\n### 内容元素统计\n\n"
        structure += f"- 代码块: {code_blocks} 个\n"
        structure += f"- 列表项: {lists} 个\n"
        structure += f"- 链接: {links} 个\n"
        structure += f"- 图片: {images} 个\n"
        structure += f"- 表格行: {tables} 行\n"
        
        return structure 
  • The input_schema defines the required parameter 'file_path' (a string describing the local path to the Markdown file).
    input_schema = {
        "type": "object",
        "required": ["file_path"],
        "properties": {
            "file_path": {
                "type": "string",
                "description": "Markdown文件的本地路径,例如'/path/to/document.md'",
            }
        },
    }
  • The @ToolRegistry.register decorator on the MarkdownTool class registers it with the name 'parse_markdown'. The name is set to 'parse_markdown' at line 17.
    @ToolRegistry.register
    class MarkdownTool(BaseTool):
        """
        用于解析Markdown文件的工具,提取文本内容、标题结构和列表等信息
        """
        
        name = "parse_markdown"
        description = "解析Markdown文件内容,提取标题结构、列表和文本内容"
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must cover behavioral traits. It mentions extraction of content but does not disclose whether the operation is read-only, requires file existence, or how errors are handled. This leaves significant behavioral gaps for a tool with no annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence covering the core action and extracted content. It is front-loaded and efficient, though a bit more structure (e.g., listing extracted elements separately) could improve readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite the tool's simplicity (one param, no output schema), the description does not specify the return format or structure, which is critical for an agent to interpret results. The absence of output schema makes this omission more impactful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter (file_path) is fully described in the schema (100% coverage). The description adds an example path, which is helpful but does not provide additional meaning beyond the schema description. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it parses Markdown files and extracts specific elements (title structure, lists, text), making the tool's purpose explicit. It naturally distinguishes from sibling tools like parse_csv or parse_pdf due to the Markdown focus.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use or avoid this tool versus alternatives is provided. The description only states what it does, leaving the agent to infer applicability without explicit context or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/aigo666/mcp-framework'

If you have feedback or need assistance with the MCP directory API, please join our Discord server