get_file_info
Analyze Excel or CSV file size and receive chunking recommendations to optimize large file processing for data analysis.
Instructions
Analyze file size and get chunking recommendations
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| filePath | Yes | Path to the CSV or Excel file | |
| sheet | No | Sheet name for Excel files (optional) |
Implementation Reference
- src/handlers/data-operations.ts:660-733 (handler)Main handler function for the 'get_file_info' tool. Validates input, calls the utility function, adds intelligent recommendations for large files and chunking, formats response as ToolResponse.
async getFileInfo(args: ToolArgs): Promise<ToolResponse> { try { if (!args.filePath) { return { content: [ { type: 'text', text: JSON.stringify({ success: false, error: 'Missing required parameter: filePath', }, null, 2), }, ], }; } const { filePath, sheet } = args; const fileInfo = await getFileInfo(filePath, sheet); // Add additional recommendations based on file size let recommendations = []; if (fileInfo.estimatedTokens > 50000) { recommendations.push('Large file detected. Strongly recommend using chunked reading to avoid token limits.'); recommendations.push(`Use read_file_chunked or read_file with offset/limit parameters.`); } else if (fileInfo.estimatedTokens > 20000) { recommendations.push('Medium-sized file. Consider chunked reading for better performance.'); } else { recommendations.push('File size is manageable for direct reading.'); } recommendations.push(`Recommended chunk size: ${fileInfo.recommendedChunkSize} rows per chunk.`); if (fileInfo.sheets && fileInfo.sheets.length > 1) { recommendations.push(`Excel file contains ${fileInfo.sheets.length} sheets. Specify 'sheet' parameter to read specific sheets.`); } const response = { success: true, fileInfo: { ...fileInfo, fileSizeMB: Math.round((fileInfo.fileSize / 1024 / 1024) * 100) / 100, }, recommendations, chunkingAdvice: { useChunking: fileInfo.estimatedTokens > 20000, optimalChunkSize: fileInfo.recommendedChunkSize, estimatedChunks: Math.ceil(fileInfo.totalRows / fileInfo.recommendedChunkSize), maxTokensPerChunk: Math.ceil((fileInfo.recommendedChunkSize * fileInfo.totalColumns * 10) / 4), } }; return { content: [ { type: 'text', text: JSON.stringify(response, null, 2), }, ], }; } catch (error) { return { content: [ { type: 'text', text: JSON.stringify({ success: false, error: error instanceof Error ? error.message : 'Unknown error occurred', }, null, 2), }, ], }; } } - src/index.ts:277-293 (schema)Input schema definition for the 'get_file_info' tool, defining parameters filePath (required) and sheet (optional).
name: 'get_file_info', description: 'Analyze file size and get chunking recommendations', inputSchema: { type: 'object', properties: { filePath: { type: 'string', description: 'Path to the CSV or Excel file', }, sheet: { type: 'string', description: 'Sheet name for Excel files (optional)', }, }, required: ['filePath'], }, }, - src/index.ts:1217-1218 (registration)Tool registration in the MCP server request handler switch statement, dispatching 'get_file_info' calls to DataOperationsHandler.getFileInfo method.
case 'get_file_info': return await this.dataOpsHandler.getFileInfo(toolArgs); - src/utils/file-utils.ts:145-196 (helper)Core utility function that computes FileInfo including file stats, row/column counts, token estimates, chunk recommendations for CSV and Excel files.
export async function getFileInfo(filePath: string, sheet?: string): Promise<FileInfo> { const absolutePath = path.resolve(filePath); const stats = await fs.stat(absolutePath); const ext = path.extname(filePath).toLowerCase(); // Get basic file info let totalRows = 0; let totalColumns = 0; let sheets: string[] = []; if (ext === '.csv') { // For CSV, we need to read to count rows (but efficiently) const content = await fs.readFile(absolutePath, 'utf-8'); const lines = content.split('\n').filter(line => line.trim() !== ''); totalRows = lines.length; // Estimate columns from first line if (lines.length > 0) { const firstLine = csv.parse(lines[0])[0]; totalColumns = firstLine.length; } } else if (ext === '.xlsx' || ext === '.xls') { const workbook = new ExcelJS.Workbook(); await workbook.xlsx.readFile(absolutePath); sheets = workbook.worksheets.map(ws => ws.name); const worksheet = workbook.getWorksheet(sheet || sheets[0]); if (worksheet) { totalRows = worksheet.rowCount; totalColumns = worksheet.columnCount; } } // Estimate token count (rough approximation) const avgCellLength = 10; // characters const estimatedTokens = Math.ceil((totalRows * totalColumns * avgCellLength) / 4); // ~4 chars per token // Calculate recommended chunk size (target ~8000 tokens per chunk) const targetTokens = 8000; const recommendedChunkSize = Math.max(100, Math.floor(targetTokens / (totalColumns * avgCellLength / 4))); return { filePath: absolutePath, fileSize: stats.size, totalRows, totalColumns, estimatedTokens, recommendedChunkSize: Math.min(recommendedChunkSize, 5000), // Cap at 5000 rows sheets: sheets.length > 0 ? sheets : undefined }; }