Large File MCP Server

structure.md•8.71 KiB

# get_file_structure Analyze file structure and retrieve comprehensive metadata and statistics. ## Overview The `get_file_structure` tool provides detailed information about a file's structure, including line count, size, encoding, and statistical analysis. This is essential for understanding file characteristics before processing. ## Usage ```json { "tool": "get_file_structure", "arguments": { "filePath": "/data/large-dataset.csv" } } ``` ## Parameters | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `filePath` | string | Yes | - | Absolute or relative path to the file | ## Response Format ```typescript { filePath: string; // Absolute path to file fileName: string; // File name without path fileSize: number; // Size in bytes totalLines: number; // Total line count encoding: string; // Detected encoding (e.g., 'utf8') detectedType: string; // File type (e.g., 'log', 'csv', 'json') chunkSize: number; // Recommended lines per chunk totalChunks: number; // Total number of chunks created: Date; // File creation date modified: Date; // Last modified date statistics: { avgLineLength: number; // Average characters per line maxLineLength: number; // Longest line length minLineLength: number; // Shortest line length emptyLines: number; // Count of empty lines }; } ``` ## Examples ### Basic File Analysis Analyze a log file: ```json { "tool": "get_file_structure", "arguments": { "filePath": "/var/log/system.log" } } ``` Response: ```json { "filePath": "/var/log/system.log", "fileName": "system.log", "fileSize": 10485760, "totalLines": 25000, "encoding": "utf8", "detectedType": "log", "chunkSize": 500, "totalChunks": 50, "created": "2024-01-01T00:00:00.000Z", "modified": "2024-01-10T15:30:00.000Z", "statistics": { "avgLineLength": 120, "maxLineLength": 512, "minLineLength": 45, "emptyLines": 150 } } ``` ### CSV File Structure Analyze a CSV dataset: ```json { "tool": "get_file_structure", "arguments": { "filePath": "/data/transactions.csv" } } ``` Response: ```json { "filePath": "/data/transactions.csv", "fileName": "transactions.csv", "fileSize": 52428800, "totalLines": 500000, "encoding": "utf8", "detectedType": "csv", "chunkSize": 1000, "totalChunks": 500, "created": "2024-01-05T08:00:00.000Z", "modified": "2024-01-10T14:00:00.000Z", "statistics": { "avgLineLength": 105, "maxLineLength": 250, "minLineLength": 95, "emptyLines": 0 } } ``` ### Code File Analysis Analyze a TypeScript file: ```json { "tool": "get_file_structure", "arguments": { "filePath": "/code/app.ts" } } ``` Response: ```json { "filePath": "/code/app.ts", "fileName": "app.ts", "fileSize": 65536, "totalLines": 1250, "encoding": "utf8", "detectedType": "code", "chunkSize": 300, "totalChunks": 5, "created": "2024-01-01T10:00:00.000Z", "modified": "2024-01-10T16:45:00.000Z", "statistics": { "avgLineLength": 52, "maxLineLength": 180, "minLineLength": 0, "emptyLines": 85 } } ``` ## File Type Detection The tool automatically detects file type based on extension: | Extension | Detected Type | Chunk Size | Typical Use Case | |-----------|--------------|------------|------------------| | .txt | text | 500 | Plain text files | | .log | log | 500 | Application logs | | .csv | csv | 1000 | Data exports | | .json | json | 100 | Configuration, API data | | .xml | xml | 200 | Structured data | | .md | markdown | 500 | Documentation | | .ts, .js, .py, .java | code | 300 | Source code | | .yml, .yaml | config | 300 | Configuration | | .sql | sql | 300 | Database scripts | | .sh, .bash | shell | 300 | Shell scripts | ## Use Cases ### 1. Pre-Processing Assessment Determine optimal processing strategy: ```typescript const structure = await get_file_structure({ filePath: "/data/large.csv" }); if (structure.fileSize > 100_000_000) { // Use streaming approach console.log("Large file detected, using streaming"); } else { // Can load into memory console.log("Small file, loading directly"); } console.log(`Will process in ${structure.totalChunks} chunks`); ``` ### 2. Resource Planning Calculate processing time and memory requirements: ```typescript const structure = await get_file_structure({ filePath: "/logs/app.log" }); const estimatedMemory = structure.statistics.avgLineLength * structure.chunkSize; const estimatedTime = structure.totalChunks * 100; // 100ms per chunk console.log(`Memory per chunk: ${estimatedMemory} bytes`); console.log(`Estimated processing time: ${estimatedTime}ms`); ``` ### 3. Data Quality Check Identify potential issues: ```typescript const structure = await get_file_structure({ filePath: "/data/import.csv" }); // Check for unusual line lengths if (structure.statistics.maxLineLength > 10000) { console.warn("Unusually long lines detected"); } // Check for empty lines const emptyLinePercent = (structure.statistics.emptyLines / structure.totalLines) * 100; if (emptyLinePercent > 10) { console.warn(`${emptyLinePercent}% empty lines`); } ``` ### 4. File Comparison Compare multiple files: ```typescript const file1 = await get_file_structure({ filePath: "/data/old.csv" }); const file2 = await get_file_structure({ filePath: "/data/new.csv" }); console.log(`Line difference: ${file2.totalLines - file1.totalLines}`); console.log(`Size difference: ${file2.fileSize - file1.fileSize} bytes`); ``` ### 5. Archive Decision Determine if file should be archived: ```typescript const structure = await get_file_structure({ filePath: "/logs/old.log" }); const daysSinceModified = (Date.now() - structure.modified.getTime()) / (1000 * 60 * 60 * 24); if (daysSinceModified > 30 && structure.fileSize > 10_000_000) { console.log("Consider archiving this file"); } ``` ## Statistics Interpretation ### Average Line Length Indicates file structure: - **< 50 chars**: Likely structured data or code - **50-200 chars**: Normal text/logs - **> 200 chars**: Verbose logs or JSON ### Max Line Length Warns about potential issues: - **> 1000 chars**: May cause performance issues - **> 10000 chars**: Consider pre-processing ### Empty Lines Indicates formatting: - **0%**: Dense data files (CSV, JSON) - **5-10%**: Normal code/text - **> 20%**: Sparse formatting or issues ## Performance | File Size | Analysis Time | Memory Usage | Notes | |-----------|--------------|--------------|-------| | < 1MB | < 50ms | Minimal | Full scan | | 1-10MB | 50-200ms | < 10MB | Streaming | | 10-100MB | 200-1000ms | < 50MB | Line counting | | 100MB-1GB | 1-5s | < 100MB | Optimized scan | | > 1GB | 5-30s | < 200MB | Progressive | ## Error Handling ### File Not Found ```json { "error": "File not found: /path/to/file.csv", "code": "ENOENT" } ``` ### Permission Denied ```json { "error": "Permission denied: /root/protected.log", "code": "EACCES" } ``` ### Unsupported File Type ```json { "error": "Binary file not supported: /data/image.png", "code": "UNSUPPORTED_TYPE" } ``` ## Best Practices ### 1. Always Analyze Before Processing Check file structure before heavy operations: ```typescript // Good: Analyze first const structure = await get_file_structure({ filePath }); console.log(`Processing ${structure.totalChunks} chunks`); // Then process for (let i = 0; i < structure.totalChunks; i++) { await process_chunk(filePath, i); } ``` ### 2. Cache Structure Information Structure rarely changes, cache it: ```typescript const structureCache = new Map(); async function getStructureCached(filePath) { if (!structureCache.has(filePath)) { const structure = await get_file_structure({ filePath }); structureCache.set(filePath, structure); } return structureCache.get(filePath); } ``` ### 3. Validate File Size Check size before processing: ```typescript const structure = await get_file_structure({ filePath }); if (structure.fileSize > MAX_FILE_SIZE) { throw new Error(`File too large: ${structure.fileSize} bytes`); } ``` ### 4. Use Statistics for Optimization Adapt chunk size based on line length: ```typescript const structure = await get_file_structure({ filePath }); const optimalChunkSize = structure.statistics.avgLineLength < 100 ? 1000 // Small lines, larger chunks : 300; // Large lines, smaller chunks ``` ## See Also - [Tools Overview](/api/reference) - All available tools - [read_large_file_chunk](/api/read-chunk) - Read file chunks - [search_in_large_file](/api/search) - Search within files - [get_file_summary](/api/summary) - Quick file overview - [Performance Guide](/guide/performance) - Optimization tips - [Best Practices](/guide/best-practices) - Usage recommendations

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/willianpinho/large-file-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

structure.md•8.71 KiB