parse_sheet
Convert Excel/CSV files into structured JSON data. Retrieve file overviews (rows, columns, data types) or full content for analysis. Use parameters to include styles or limit data, and apply changes back to files.
Instructions
解析Excel/CSV文件为结构化JSON数据。默认返回文件概览信息(行数、列数、数据类型、前几行预览),避免上下文过载。LLM可通过参数控制是否获取完整数据、样式信息等。适合数据分析和处理,修改后可用apply_changes写回。
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes | 目标表格文件的绝对路径,支持 .csv, .xlsx, .xls, .xlsb, .xlsm 格式。 | |
| include_full_data | No | 【可选,默认false】是否返回完整数据。false时只返回概览和预览,true时返回所有行数据。大文件建议先查看概览。 | |
| include_styles | No | 【可选,默认false】是否包含样式信息(字体、颜色、边框等)。样式信息会显著增加数据量。 | |
| max_rows | No | 【可选】最大返回行数。用于限制大文件的数据量,超出部分会被截断并提示。 | |
| preview_rows | No | 【可选,默认5】预览行数。当include_full_data为false时,返回前N行作为数据预览。 | |
| range_string | No | 【可选】单元格范围,如'A1:D10'。指定范围时会返回该范围的完整数据。 | |
| sheet_name | No | 【可选】要解析的工作表名称。如果留空,使用第一个工作表。 |
Implementation Reference
- src/models/tools.py:237-333 (handler)MCP tool handler for 'parse_sheet'. Validates arguments, calls CoreService.parse_sheet_optimized with parameters like include_full_data and include_styles, adds LLM guidance, and returns structured JSON response.async def _handle_parse_sheet(arguments: dict[str, Any], core_service: CoreService) -> list[TextContent]: """处理 parse_sheet 工具调用,避免上下文爆炸。""" try: # 获取和验证参数 file_path = arguments["file_path"] if not isinstance(file_path, str) or not file_path.strip(): raise ValueError("file_path必须是非空字符串") sheet_name = arguments.get("sheet_name") if sheet_name is not None and not isinstance(sheet_name, str): raise ValueError("sheet_name必须是字符串") range_string = arguments.get("range_string") if range_string is not None and not isinstance(range_string, str): raise ValueError("range_string必须是字符串") include_full_data = arguments.get("include_full_data", False) if not isinstance(include_full_data, bool): raise ValueError("include_full_data必须是布尔值") include_styles = arguments.get("include_styles", False) if not isinstance(include_styles, bool): raise ValueError("include_styles必须是布尔值") preview_rows = arguments.get("preview_rows", 5) if not isinstance(preview_rows, int) or preview_rows <= 0: raise ValueError("preview_rows必须是正整数") max_rows = arguments.get("max_rows") if max_rows is not None and (not isinstance(max_rows, int) or max_rows <= 0): raise ValueError("max_rows必须是正整数或None") result = core_service.parse_sheet_optimized( file_path=file_path, sheet_name=sheet_name, range_string=range_string, include_full_data=include_full_data, include_styles=include_styles, preview_rows=preview_rows, max_rows=max_rows ) # 为LLM添加使用指导 result["llm_guidance"] = { "current_mode": "overview" if not include_full_data else "full_data", "next_steps": _generate_next_steps_guidance(result, include_full_data, include_styles), "data_access": { "headers": "result['headers'] - 列标题", "preview": "result['preview_rows'] - 数据预览", "full_data": "设置 include_full_data=true 获取完整数据", "styles": "设置 include_styles=true 获取样式信息" } } response = { "success": True, "operation": "parse_sheet", "data": result } return [TextContent( type="text", text=json.dumps(response, ensure_ascii=False, indent=2) )] except FileNotFoundError as e: return [TextContent( type="text", text=json.dumps({ "success": False, "error_type": "file_not_found", "error_message": f"文件未找到: {str(e)}", "suggestion": "请检查文件路径是否正确。支持的格式: .xlsx, .xls, .xlsb, .xlsm, .csv" }, ensure_ascii=False, indent=2) )] except ValueError as e: return [TextContent( type="text", text=json.dumps({ "success": False, "error_type": "invalid_parameter", "error_message": f"参数错误: {str(e)}", "suggestion": "请检查sheet_name是否存在,range_string格式是否正确(如'A1:D10')" }, ensure_ascii=False, indent=2) )] except Exception as e: return [TextContent( type="text", text=json.dumps({ "success": False, "error_type": "parsing_error", "error_message": f"解析失败: {str(e)}", "suggestion": "请检查文件是否损坏,或尝试指定具体的工作表名称和范围" }, ensure_ascii=False, indent=2) )]
- src/models/tools.py:67-100 (schema)Input schema for parse_sheet tool defining parameters: file_path (required), sheet_name, range_string, include_full_data, include_styles, preview_rows, max_rows.inputSchema={ "type": "object", "properties": { "file_path": { "type": "string", "description": "目标表格文件的绝对路径,支持 .csv, .xlsx, .xls, .xlsb, .xlsm 格式。" }, "sheet_name": { "type": "string", "description": "【可选】要解析的工作表名称。如果留空,使用第一个工作表。" }, "range_string": { "type": "string", "description": "【可选】单元格范围,如'A1:D10'。指定范围时会返回该范围的完整数据。" }, "include_full_data": { "type": "boolean", "description": "【可选,默认false】是否返回完整数据。false时只返回概览和预览,true时返回所有行数据。大文件建议先查看概览。" }, "include_styles": { "type": "boolean", "description": "【可选,默认false】是否包含样式信息(字体、颜色、边框等)。样式信息会显著增加数据量。" }, "preview_rows": { "type": "integer", "description": "【可选,默认5】预览行数。当include_full_data为false时,返回前N行作为数据预览。" }, "max_rows": { "type": "integer", "description": "【可选】最大返回行数。用于限制大文件的数据量,超出部分会被截断并提示。" } }, "required": ["file_path"] }
- src/models/tools.py:64-101 (registration)Registration of the 'parse_sheet' tool in server.list_tools(), including name, description, and input schema.Tool( name="parse_sheet", description="解析Excel/CSV文件为结构化JSON数据。默认返回文件概览信息(行数、列数、数据类型、前几行预览),避免上下文过载。LLM可通过参数控制是否获取完整数据、样式信息等。适合数据分析和处理,修改后可用apply_changes写回。", inputSchema={ "type": "object", "properties": { "file_path": { "type": "string", "description": "目标表格文件的绝对路径,支持 .csv, .xlsx, .xls, .xlsb, .xlsm 格式。" }, "sheet_name": { "type": "string", "description": "【可选】要解析的工作表名称。如果留空,使用第一个工作表。" }, "range_string": { "type": "string", "description": "【可选】单元格范围,如'A1:D10'。指定范围时会返回该范围的完整数据。" }, "include_full_data": { "type": "boolean", "description": "【可选,默认false】是否返回完整数据。false时只返回概览和预览,true时返回所有行数据。大文件建议先查看概览。" }, "include_styles": { "type": "boolean", "description": "【可选,默认false】是否包含样式信息(字体、颜色、边框等)。样式信息会显著增加数据量。" }, "preview_rows": { "type": "integer", "description": "【可选,默认5】预览行数。当include_full_data为false时,返回前N行作为数据预览。" }, "max_rows": { "type": "integer", "description": "【可选】最大返回行数。用于限制大文件的数据量,超出部分会被截断并提示。" } }, "required": ["file_path"] } ),
- src/core_service.py:133-191 (helper)Core parsing logic delegated by tool handler. Parses file using ParserFactory, selects sheet, handles range_string, and extracts optimized data (full, preview, styles) via _extract_optimized_data.def parse_sheet_optimized(self, file_path: str, sheet_name: str | None = None, range_string: str | None = None, include_full_data: bool = False, include_styles: bool = False, preview_rows: int = 5, max_rows: int | None = None) -> dict[str, Any]: """ 参数: file_path: 文件路径 sheet_name: 工作表名称(可选) range_string: 单元格范围(可选) include_full_data: 是否返回完整数据(默认False,只返回概览) include_styles: 是否包含样式信息(默认False) preview_rows: 预览行数(默认5行) max_rows: 最大返回行数(可选) 返回: 优化后的JSON数据 """ try: # 验证文件输入 validated_path, _ = validate_file_input(file_path) # 获取解析器 parser = self.parser_factory.get_parser(str(validated_path)) # 解析文件 sheets = parser.parse(str(validated_path)) # 选择目标工作表 if sheet_name: target_sheet = next((s for s in sheets if s.name == sheet_name), None) if not target_sheet: available_sheets = [s.name for s in sheets] raise ValueError(f"工作表 '{sheet_name}' 不存在。可用工作表: {available_sheets}") else: if not sheets: raise ValueError("文件中没有找到任何工作表。") target_sheet = sheets[0] # 处理范围选择 if range_string: try: start_row, start_col, end_row, end_col = parse_range_string(range_string) return self._extract_range_data(target_sheet, start_row, start_col, end_row, end_col, include_styles) except ValueError as e: raise ValueError(f"范围格式错误: {e}") # 根据参数返回不同级别的数据 return self._extract_optimized_data( target_sheet, include_full_data=include_full_data, include_styles=include_styles, preview_rows=preview_rows, max_rows=max_rows ) except Exception as e: logger.error(f"优化解析失败: {e}") raise
- src/core_service.py:598-683 (helper)Helper method that generates the optimized JSON output: metadata, headers, preview/full rows, styles, data types analysis.def _extract_optimized_data(self, sheet: Sheet, include_full_data: bool = False, include_styles: bool = False, preview_rows: int = 5, max_rows: int | None = None) -> dict[str, Any]: """ 提取优化后的数据,避免上下文爆炸。 """ # 基础元数据 total_rows = len(sheet.rows) total_cols = len(sheet.rows[0].cells) if sheet.rows else 0 total_cells = total_rows * total_cols # 提取表头 headers = [] if sheet.rows: headers = [cell.value if cell is not None and cell.value is not None else f"Column_{i}" for i, cell in enumerate(sheet.rows[0].cells)] # 分析数据类型 data_types = self._analyze_data_types(sheet, headers) # 基础响应结构 response = { "sheet_name": sheet.name, "metadata": { "total_rows": total_rows, "total_cols": total_cols, "total_cells": total_cells, "data_rows": max(0, total_rows - 1), # 减去表头行 "has_styles": any(any(cell.style for cell in row.cells if cell is not None) for row in sheet.rows), "has_merged_cells": len(sheet.merged_cells) > 0, "merged_cells_count": len(sheet.merged_cells), "preview_rows": min(preview_rows, max(0, total_rows - 1)) }, "headers": headers, "data_types": data_types, "merged_cells": sheet.merged_cells if len(sheet.merged_cells) <= 10 else sheet.merged_cells[:10] } # 根据参数决定返回的数据量 if include_full_data: # 返回完整数据 data_rows = [] start_row = 1 if total_rows > 1 else 0 end_row = total_rows if max_rows and (end_row - start_row) > max_rows: end_row = start_row + max_rows response["metadata"]["truncated"] = True response["metadata"]["truncated_at"] = max_rows for row in sheet.rows[start_row:end_row]: row_data = [] for cell in row.cells: if cell is not None: cell_data = {"value": self._value_to_json_serializable(cell.value)} if include_styles and cell.style: cell_data["style"] = style_to_dict(cell.style) else: cell_data = {"value": None} if include_styles: cell_data["style"] = None row_data.append(cell_data) data_rows.append(row_data) response["rows"] = data_rows response["metadata"]["returned_rows"] = len(data_rows) else: # 只返回预览数据 preview_data = [] start_row = 1 if total_rows > 1 else 0 end_row = min(start_row + preview_rows, total_rows) for row in sheet.rows[start_row:end_row]: row_data = [] for cell in row.cells: # 预览模式下不包含样式,减少数据量 if cell is not None: row_data.append(self._value_to_json_serializable(cell.value)) else: row_data.append(None) preview_data.append(row_data) response["preview_rows"] = preview_data return response