Skip to main content
Glama
guangxiangdebizi

XLSX Reader MCP

analyze_xlsx

Analyze Excel files to extract structure details, data types, statistics, and sheet contents for comprehensive document understanding.

Instructions

Analyze Excel files to get detailed information about structure, data types, statistics, and sheet contents.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
filePathYesPath to the Excel file to analyze
includePreviewNoWhether to include a preview of the data (default: true)
previewRowsNoNumber of rows to preview (default: 5, max: 20)
analyzeDataTypesNoWhether to analyze data types in each column (default: true)

Implementation Reference

  • The async run() function implementing the core tool logic: validates input, reads XLSX file using XLSX library, analyzes each sheet for dimensions, data types, statistics, filters empty rows, generates data preview, and compiles a comprehensive Markdown report.
    async run(args: { 
      filePath: string; 
      includePreview?: boolean;
      previewRows?: number;
      analyzeDataTypes?: boolean;
    }) {
      try {
        // Parameter validation
        if (!args.filePath) {
          throw new Error("File path is required");
        }
    
        // Check if file exists
        if (!fs.existsSync(args.filePath)) {
          throw new Error(`File not found: ${args.filePath}`);
        }
    
        // Check file extension
        const ext = path.extname(args.filePath).toLowerCase();
        if (!['.xlsx', '.xls', '.xlsm'].includes(ext)) {
          throw new Error("File must be an Excel file (.xlsx, .xls, or .xlsm)");
        }
    
        // Get file stats
        const stats = fs.statSync(args.filePath);
        const fileSize = (stats.size / 1024).toFixed(2); // KB
    
        // Read the workbook
        const workbook = XLSX.readFile(args.filePath);
        const sheetNames = workbook.SheetNames;
        
        const includePreview = args.includePreview !== false;
        const previewRows = Math.min(args.previewRows || 5, 20); // Default 5, max 20
        const analyzeDataTypes = args.analyzeDataTypes !== false;
    
        // Analyze each sheet
        const sheetAnalysis: any[] = [];
        
        for (const sheetName of sheetNames) {
          const worksheet = workbook.Sheets[sheetName];
          const range = XLSX.utils.decode_range(worksheet['!ref'] || 'A1');
          
          // Get sheet dimensions
          const totalRows = range.e.r + 1;
          const totalCols = range.e.c + 1;
          
          // Convert to array for analysis
          let rawData = XLSX.utils.sheet_to_json(worksheet, { 
            header: 1,
            defval: ''
          }) as any[][];
          
          // Helper function to check if a row is empty
          const isEmptyRow = (row: any[]): boolean => {
            return row.every(cell => cell === '' || cell == null || cell === undefined);
          };
          
          // Get original statistics
          const originalTotalRows = rawData.length;
          
          // Filter out completely empty rows for analysis
          const data = rawData.filter((row, index) => {
            // Always keep the first row (header)
            if (index === 0) return true;
            // Filter out empty rows
            return !isEmptyRow(row);
          });
          
          const dataRows = data.length > 0 ? data.length - 1 : 0; // Exclude header
          const skippedEmptyRows = originalTotalRows - data.length;
          
          // Analyze data types if requested
          let columnAnalysis: any[] = [];
          if (analyzeDataTypes && data.length > 0) {
            const headers = data[0] || [];
            const dataRows = data.slice(1);
            
            columnAnalysis = headers.map((header: any, colIndex: number) => {
              const columnData = dataRows.map(row => row[colIndex]).filter(cell => cell !== '' && cell != null);
              
              if (columnData.length === 0) {
                return {
                  column: colIndex + 1,
                  header: header || `Column ${colIndex + 1}`,
                  dataType: 'empty',
                  nonEmptyCount: 0,
                  emptyCount: dataRows.length,
                  uniqueValues: 0
                };
              }
              
              // Analyze data types
              const types = {
                number: 0,
                string: 0,
                date: 0,
                boolean: 0
              };
              
              columnData.forEach(cell => {
                if (typeof cell === 'number') {
                  types.number++;
                } else if (typeof cell === 'boolean') {
                  types.boolean++;
                } else if (cell instanceof Date || (typeof cell === 'string' && !isNaN(Date.parse(cell)))) {
                  types.date++;
                } else {
                  types.string++;
                }
              });
              
              // Determine primary type
              const primaryType = Object.entries(types).reduce((a, b) => types[a[0] as keyof typeof types] > types[b[0] as keyof typeof types] ? a : b)[0];
              
              // Get unique values count
              const uniqueValues = new Set(columnData).size;
              
              return {
                column: colIndex + 1,
                header: header || `Column ${colIndex + 1}`,
                dataType: primaryType,
                nonEmptyCount: columnData.length,
                emptyCount: dataRows.length - columnData.length,
                uniqueValues,
                typeDistribution: types
              };
            });
          }
          
          // Get preview data
          let preview = '';
          if (includePreview && data.length > 0) {
            // Limit preview data and ensure we don't exceed available data
            const maxPreviewRows = Math.min(previewRows + 1, data.length, 21); // +1 for header, max 21 total
            const previewData = data.slice(0, maxPreviewRows);
            const maxCols = Math.max(...previewData.map(row => row.length));
            const colWidths = Array(maxCols).fill(0);
            
            // Calculate column widths
            previewData.forEach(row => {
              row.forEach((cell: any, index: number) => {
                const cellStr = String(cell || '');
                colWidths[index] = Math.max(colWidths[index], Math.min(cellStr.length, 20)); // Max width 20
              });
            });
            
            // Format preview table
            const formatRow = (row: any[]) => {
              return '| ' + row.map((cell: any, index: number) => {
                const cellStr = String(cell || '');
                const truncated = cellStr.length > 20 ? cellStr.substring(0, 17) + '...' : cellStr;
                return truncated.padEnd(colWidths[index]);
              }).join(' | ') + ' |';
            };
            
            const separator = '|' + colWidths.map(width => '-'.repeat(width + 2)).join('|') + '|';
            
            preview = previewData.map((row: any[], index: number) => {
              const formattedRow = formatRow(row);
              if (index === 0) {
                return formattedRow + '\n' + separator;
              }
              return formattedRow;
            }).join('\n');
          }
          
          sheetAnalysis.push({
            name: sheetName,
            dimensions: {
              rows: totalRows,
              columns: totalCols
            },
            originalTotalRows,
            dataRows,
            skippedEmptyRows,
            hasData: data.length > 0,
            columnAnalysis,
            preview
          });
        }
        
        // Generate comprehensive report
        const report = [
          `# 📊 Excel File Analysis Report`,
          ``,
          `## 📁 File Information`,
          `- **File:** ${path.basename(args.filePath)}`,
          `- **Full Path:** ${args.filePath}`,
          `- **Size:** ${fileSize} KB`,
          `- **Format:** ${ext.toUpperCase()}`,
          `- **Last Modified:** ${stats.mtime.toLocaleString()}`,
          ``,
          `## 📋 Workbook Overview`,
          `- **Total Sheets:** ${sheetNames.length}`,
          `- **Sheet Names:** ${sheetNames.join(', ')}`,
          ``
        ];
        
        // Add detailed sheet analysis
        sheetAnalysis.forEach((sheet, index) => {
          report.push(`## 📄 Sheet ${index + 1}: "${sheet.name}"`);
          report.push(``);
          report.push(`### Basic Information`);
          report.push(`- **Total Rows:** ${sheet.originalTotalRows} (including header)`);
          report.push(`- **Data Rows:** ${sheet.dataRows} (excluding header)`);
          if (sheet.skippedEmptyRows > 0) {
            report.push(`- **Empty Rows Filtered:** ${sheet.skippedEmptyRows}`);
          }
          report.push(`- **Columns:** ${sheet.dimensions.columns}`);
          report.push(`- **Dimensions:** ${sheet.dimensions.rows} rows × ${sheet.dimensions.columns} columns (after filtering)`);
          report.push(`- **Has Data:** ${sheet.hasData ? 'Yes' : 'No'}`);
          report.push(``);
          
          if (analyzeDataTypes && sheet.columnAnalysis.length > 0) {
            report.push(`### Column Analysis`);
            report.push(``);
            sheet.columnAnalysis.forEach((col: any) => {
              report.push(`**${col.header}** (Column ${col.column})`);
              report.push(`- Primary Type: ${col.dataType}`);
              report.push(`- Non-empty: ${col.nonEmptyCount}, Empty: ${col.emptyCount}`);
              report.push(`- Unique Values: ${col.uniqueValues}`);
              if (col.typeDistribution) {
                const dist = Object.entries(col.typeDistribution)
                  .filter(([_, count]) => (count as number) > 0)
                  .map(([type, count]) => `${type}: ${count}`)
                  .join(', ');
                if (dist) report.push(`- Type Distribution: ${dist}`);
              }
              report.push(``);
            });
          }
          
          if (includePreview && sheet.preview) {
            report.push(`### Data Preview (First ${previewRows} rows)`);
            report.push(``);
            report.push('```');
            report.push(sheet.preview);
            report.push('```');
            report.push(``);
          }
          
          if (index < sheetAnalysis.length - 1) {
            report.push(`---`);
            report.push(``);
          }
        });
        
        return {
          content: [{
            type: "text",
            text: report.join('\n')
          }]
        };
        
      } catch (error: any) {
        return {
          content: [{
            type: "text",
            text: `❌ **Error analyzing Excel file:** ${error.message}`
          }],
          isError: true
        };
      }
    }
  • Input schema defining the parameters for the analyze_xlsx tool.
    parameters: {
      type: "object",
      properties: {
        filePath: {
          type: "string",
          description: "Path to the Excel file to analyze"
        },
        includePreview: {
          type: "boolean",
          description: "Whether to include a preview of the data (default: true)"
        },
        previewRows: {
          type: "number",
          description: "Number of rows to preview (default: 5, max: 20)"
        },
        analyzeDataTypes: {
          type: "boolean",
          description: "Whether to analyze data types in each column (default: true)"
        }
      },
      required: ["filePath"]
    },
  • src/index.ts:26-30 (registration)
    Tool registration in the ListToolsRequestSchema handler.
    {
      name: xlsxAnalyzer.name,
      description: xlsxAnalyzer.description,
      inputSchema: xlsxAnalyzer.parameters
    }
  • src/index.ts:40-41 (registration)
    Tool dispatch in the CallToolRequestSchema switch statement.
    case "analyze_xlsx":
      return await xlsxAnalyzer.run(request.params.arguments as any);
  • src/index.ts:7-7 (registration)
    Import of the xlsxAnalyzer tool module.
    import { xlsxAnalyzer } from "./tools/xlsx-analyzer.js";
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden for behavioral disclosure. It mentions what information is returned (structure, data types, etc.) but lacks details on performance, error handling, file size limits, or output format. For a tool with 4 parameters and no output schema, this is a significant gap in transparency about how the tool behaves.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose. It avoids unnecessary words and directly states the tool's function. However, it could be slightly more structured by separating key aspects (e.g., listing analysis components) for better readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (analyzing Excel files with 4 parameters), lack of annotations, and no output schema, the description is incomplete. It doesn't address what the output looks like, potential limitations, or how results are formatted. The description alone is insufficient for an agent to fully understand the tool's behavior and results.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the input schema fully documents all parameters. The description doesn't add any parameter-specific semantics beyond what's in the schema (e.g., it doesn't explain interactions between parameters like how 'includePreview' affects output). With high schema coverage, the baseline score of 3 is appropriate as the description doesn't compensate but doesn't need to.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Analyze Excel files to get detailed information about structure, data types, statistics, and sheet contents.' It specifies the verb 'analyze' and resource 'Excel files' with concrete output categories. However, it doesn't explicitly differentiate from the sibling tool 'read_xlsx' (which likely reads data rather than analyzes metadata), so it doesn't reach the highest score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention the sibling tool 'read_xlsx' or explain scenarios where analysis is preferred over reading data. There's no context about prerequisites or exclusions, leaving usage entirely implied from the purpose statement.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/guangxiangdebizi/xlsx-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server