Skip to main content
Glama
yuqie6

MCP Sheet Parser

by yuqie6

parse_sheet

Convert Excel and CSV files into structured JSON data for analysis and processing. Get file overviews with row/column counts and previews, or retrieve complete datasets with optional styling information.

Instructions

解析Excel/CSV文件为结构化JSON数据。默认返回文件概览信息(行数、列数、数据类型、前几行预览),避免上下文过载。LLM可通过参数控制是否获取完整数据、样式信息等。适合数据分析和处理,修改后可用apply_changes写回。

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYes目标表格文件的绝对路径,支持 .csv, .xlsx, .xls, .xlsb, .xlsm 格式。
sheet_nameNo【可选】要解析的工作表名称。如果留空,使用第一个工作表。
range_stringNo【可选】单元格范围,如'A1:D10'。指定范围时会返回该范围的完整数据。
include_full_dataNo【可选,默认false】是否返回完整数据。false时只返回概览和预览,true时返回所有行数据。大文件建议先查看概览。
include_stylesNo【可选,默认false】是否包含样式信息(字体、颜色、边框等)。样式信息会显著增加数据量。
preview_rowsNo【可选,默认5】预览行数。当include_full_data为false时,返回前N行作为数据预览。
max_rowsNo【可选】最大返回行数。用于限制大文件的数据量,超出部分会被截断并提示。

Implementation Reference

  • Registers all tools including parse_sheet by calling register_tools on the MCP Server instance.
    # 注册所有工具
    register_tools(server)
  • Defines the input schema and description for the parse_sheet tool in the MCP tool list.
    Tool(
        name="parse_sheet",
        description="解析Excel/CSV文件为结构化JSON数据。默认返回文件概览信息(行数、列数、数据类型、前几行预览),避免上下文过载。LLM可通过参数控制是否获取完整数据、样式信息等。适合数据分析和处理,修改后可用apply_changes写回。",
        inputSchema={
            "type": "object",
            "properties": {
                "file_path": {
                    "type": "string",
                    "description": "目标表格文件的绝对路径,支持 .csv, .xlsx, .xls, .xlsb, .xlsm 格式。"
                },
                "sheet_name": {
                    "type": "string",
                    "description": "【可选】要解析的工作表名称。如果留空,使用第一个工作表。"
                },
                "range_string": {
                    "type": "string",
                    "description": "【可选】单元格范围,如'A1:D10'。指定范围时会返回该范围的完整数据。"
                },
                "include_full_data": {
                    "type": "boolean",
                    "description": "【可选,默认false】是否返回完整数据。false时只返回概览和预览,true时返回所有行数据。大文件建议先查看概览。"
                },
                "include_styles": {
                    "type": "boolean",
                    "description": "【可选,默认false】是否包含样式信息(字体、颜色、边框等)。样式信息会显著增加数据量。"
                },
                "preview_rows": {
                    "type": "integer",
                    "description": "【可选,默认5】预览行数。当include_full_data为false时,返回前N行作为数据预览。"
                },
                "max_rows": {
                    "type": "integer",
                    "description": "【可选】最大返回行数。用于限制大文件的数据量,超出部分会被截断并提示。"
                }
            },
            "required": ["file_path"]
        }
    ),
  • MCP tool handler for parse_sheet: validates arguments, calls CoreService.parse_sheet_optimized, formats response as JSON.
    async def _handle_parse_sheet(arguments: dict[str, Any], core_service: CoreService) -> list[TextContent]:
        """处理 parse_sheet 工具调用,避免上下文爆炸。"""
    
        try:
            # 获取和验证参数
            file_path = arguments["file_path"]
            if not isinstance(file_path, str) or not file_path.strip():
                raise ValueError("file_path必须是非空字符串")
    
            sheet_name = arguments.get("sheet_name")
            if sheet_name is not None and not isinstance(sheet_name, str):
                raise ValueError("sheet_name必须是字符串")
    
            range_string = arguments.get("range_string")
            if range_string is not None and not isinstance(range_string, str):
                raise ValueError("range_string必须是字符串")
    
            include_full_data = arguments.get("include_full_data", False)
            if not isinstance(include_full_data, bool):
                raise ValueError("include_full_data必须是布尔值")
    
            include_styles = arguments.get("include_styles", False)
            if not isinstance(include_styles, bool):
                raise ValueError("include_styles必须是布尔值")
    
            preview_rows = arguments.get("preview_rows", 5)
            if not isinstance(preview_rows, int) or preview_rows <= 0:
                raise ValueError("preview_rows必须是正整数")
    
            max_rows = arguments.get("max_rows")
            if max_rows is not None and (not isinstance(max_rows, int) or max_rows <= 0):
                raise ValueError("max_rows必须是正整数或None")
    
            result = core_service.parse_sheet_optimized(
                file_path=file_path,
                sheet_name=sheet_name,
                range_string=range_string,
                include_full_data=include_full_data,
                include_styles=include_styles,
                preview_rows=preview_rows,
                max_rows=max_rows
            )
    
            # 为LLM添加使用指导
            result["llm_guidance"] = {
                "current_mode": "overview" if not include_full_data else "full_data",
                "next_steps": _generate_next_steps_guidance(result, include_full_data, include_styles),
                "data_access": {
                    "headers": "result['headers'] - 列标题",
                    "preview": "result['preview_rows'] - 数据预览",
                    "full_data": "设置 include_full_data=true 获取完整数据",
                    "styles": "设置 include_styles=true 获取样式信息"
                }
            }
    
            response = {
                "success": True,
                "operation": "parse_sheet",
                "data": result
            }
    
            return [TextContent(
                type="text",
                text=json.dumps(response, ensure_ascii=False, indent=2)
            )]
    
        except FileNotFoundError as e:
            return [TextContent(
                type="text",
                text=json.dumps({
                    "success": False,
                    "error_type": "file_not_found",
                    "error_message": f"文件未找到: {str(e)}",
                    "suggestion": "请检查文件路径是否正确。支持的格式: .xlsx, .xls, .xlsb, .xlsm, .csv"
                }, ensure_ascii=False, indent=2)
            )]
        except ValueError as e:
            return [TextContent(
                type="text",
                text=json.dumps({
                    "success": False,
                    "error_type": "invalid_parameter",
                    "error_message": f"参数错误: {str(e)}",
                    "suggestion": "请检查sheet_name是否存在,range_string格式是否正确(如'A1:D10')"
                }, ensure_ascii=False, indent=2)
            )]
        except Exception as e:
            return [TextContent(
                type="text",
                text=json.dumps({
                    "success": False,
                    "error_type": "parsing_error",
                    "error_message": f"解析失败: {str(e)}",
                    "suggestion": "请检查文件是否损坏,或尝试指定具体的工作表名称和范围"
                }, ensure_ascii=False, indent=2)
            )]
  • Core optimized parsing logic invoked by the tool handler, handles file parsing, sheet selection, range extraction, preview/full data, styles.
    def parse_sheet_optimized(self, file_path: str, sheet_name: str | None = None,
                             range_string: str | None = None, include_full_data: bool = False,
                             include_styles: bool = False, preview_rows: int = 5,
                             max_rows: int | None = None) -> dict[str, Any]:
        """
        参数:
            file_path: 文件路径
            sheet_name: 工作表名称(可选)
            range_string: 单元格范围(可选)
            include_full_data: 是否返回完整数据(默认False,只返回概览)
            include_styles: 是否包含样式信息(默认False)
            preview_rows: 预览行数(默认5行)
            max_rows: 最大返回行数(可选)
    
        返回:
            优化后的JSON数据
        """
        try:
            # 验证文件输入
            validated_path, _ = validate_file_input(file_path)
    
            # 获取解析器
            parser = self.parser_factory.get_parser(str(validated_path))
    
            # 解析文件
            sheets = parser.parse(str(validated_path))
    
            # 选择目标工作表
            if sheet_name:
                target_sheet = next((s for s in sheets if s.name == sheet_name), None)
                if not target_sheet:
                    available_sheets = [s.name for s in sheets]
                    raise ValueError(f"工作表 '{sheet_name}' 不存在。可用工作表: {available_sheets}")
            else:
                if not sheets:
                    raise ValueError("文件中没有找到任何工作表。")
                target_sheet = sheets[0]
    
            # 处理范围选择
            if range_string:
                try:
                    start_row, start_col, end_row, end_col = parse_range_string(range_string)
                    return self._extract_range_data(target_sheet, start_row, start_col, end_row, end_col, include_styles)
                except ValueError as e:
                    raise ValueError(f"范围格式错误: {e}")
    
            # 根据参数返回不同级别的数据
            return self._extract_optimized_data(
                target_sheet,
                include_full_data=include_full_data,
                include_styles=include_styles,
                preview_rows=preview_rows,
                max_rows=max_rows
            )
    
        except Exception as e:
            logger.error(f"优化解析失败: {e}")
            raise
  • Primary parse_sheet method with caching, streaming decision, and fallback to optimized parsing.
    def parse_sheet(self, file_path: str, sheet_name: str | None = None,
                   range_string: str | None = None,
                   enable_streaming: bool = True,
                   streaming_threshold: int | None = None) -> dict[str, Any]:
        """
        解析表格文件为标准化的JSON格式。
    
        参数:
            file_path: 文件路径
            sheet_name: 工作表名称(可选)
            range_string: 单元格范围(可选)
            enable_streaming: 是否启用自动流式读取(默认True)
            streaming_threshold: 流式读取的单元格数量阈值,None时使用配置默认值
    
        返回:
            标准化的TableModel JSON
        """
        # 获取当前配置
        current_config = get_config()
        if streaming_threshold is None:
            streaming_threshold = current_config.streaming_threshold_cells
        try:
            # 验证文件输入
            validated_path, _ = validate_file_input(file_path)
            
            # 尝试从缓存获取数据
            cache_manager = get_cache_manager()
            cached_data = cache_manager.get(file_path, range_string, sheet_name)
            if cached_data is not None:
                logger.info(f"从缓存获取数据: {file_path}")
                return cached_data['data']
            
            # 获取解析器
            parser = self.parser_factory.get_parser(str(validated_path))
    
            # 检查是否应该使用流式读取
            if enable_streaming and self._should_use_streaming(str(validated_path), streaming_threshold):
                json_data = self._parse_sheet_streaming(str(validated_path), sheet_name, range_string)
            else:
                # 使用传统方法
                sheets = parser.parse(str(validated_path))
                
                # 如果指定了工作表名称,则选择对应的工作表
                if sheet_name:
                    target_sheet = next((s for s in sheets if s.name == sheet_name), None)
                    if not target_sheet:
                        raise ValueError(f"工作表 '{sheet_name}' 不存在。")
                # 否则,默认使用第一个工作表
                else:
                    if not sheets:
                        raise ValueError("文件中没有找到任何工作表。")
                    target_sheet = sheets[0]
    
                # 检查工作表是否为空
                if not target_sheet:
                    logger.warning("目标工作表为None")
                    return {
                        "sheet_name": "Empty",
                        "headers": [],
                        "rows": [],
                        "total_rows": 0,
                        "total_columns": 0,
                        "size_info": {
                            "total_cells": 0,
                            "processing_mode": "empty",
                            "recommendation": "工作表为空,无数据可显示"
                        }
                    }
    
                if not target_sheet.rows:
                    logger.warning(f"工作表 '{target_sheet.name}' 为空")
                    return {
                        "sheet_name": target_sheet.name,
                        "headers": [],
                        "rows": [],
                        "total_rows": 0,
                        "total_columns": 0,
                        "size_info": {
                            "total_cells": 0,
                            "processing_mode": "empty",
                            "recommendation": "工作表为空,无数据可显示"
                        }
                    }
                
                # 转换为标准化JSON格式
                json_data = self._sheet_to_json(target_sheet, range_string)
            
            # 缓存解析结果
            cache_manager.set(file_path, json_data, range_string, sheet_name)
            logger.debug(f"数据已缓存: {file_path}")
            
            return json_data
            
        except Exception as e:
            logger.error(f"解析表格失败: {e}")
            raise
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: default returns overview info to avoid context overload, mentions performance considerations for large files, and hints at output structure (概览信息 like row count, column count, data types, preview). However, it doesn't detail error handling, file size limits, or exact JSON schema, leaving some gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured: it starts with the core purpose, explains default behavior and rationale, mentions parameter control, and ends with usage context. Every sentence adds value without redundancy, and it's front-loaded with essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, no annotations, no output schema), the description does a good job of covering purpose, behavior, and usage. It mentions the output includes overview info and previews, compensating for the lack of output schema. However, it doesn't fully detail error cases or exact return formats, leaving some ambiguity for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 7 parameters thoroughly. The description adds some context by mentioning 'LLM可通过参数控制是否获取完整数据、样式信息等' (LLM can use parameters to control whether to get full data, style info, etc.), which reinforces parameter purposes, but doesn't provide significant additional semantics beyond what's in the schema. Baseline 3 is appropriate given high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: '解析Excel/CSV文件为结构化JSON数据' (parse Excel/CSV files into structured JSON data). It specifies the resource (Excel/CSV files) and the action (parse into JSON), and distinguishes it from sibling tools by mentioning '修改后可用apply_changes写回' (after modification, can use apply_changes to write back), showing it's for reading/parsing rather than conversion or writing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on when to use it: '适合数据分析和处理' (suitable for data analysis and processing). It mentions an alternative tool ('apply_changes') for writing back modifications, but doesn't explicitly state when not to use it or compare it to 'convert_to_html'. The guidance is helpful but not fully comprehensive regarding all sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/yuqie6/MCP-Sheet-Parser-cot'

If you have feedback or need assistance with the MCP directory API, please join our Discord server