detect_file_encoding
Identify file encoding to prevent character display issues, especially for C++ and PowerShell files in Windows environments.
Instructions
Accurately detect file encoding using professional libraries.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file_name | Yes | File name to check (e.g., hello.cpp, test.h) | |
| directory_path | Yes | Absolute path of directory containing the file | |
| max_bytes | No | Maximum bytes to analyze (default: 8192) |
Implementation Reference
- Core implementation of the detect_file_encoding tool. Detects encoding by first checking BOM, then using charset-normalizer or chardet libraries, falling back to heuristics if necessary.def detect_file_encoding(file_path: str, max_bytes: int = 8192) -> Dict[str, Any]: """ Detect file encoding. Args: file_path: File path max_bytes: Maximum bytes to analyze (default 8KB) Returns: dict: Encoding information """ try: if not os.path.exists(file_path): return { "error": f"File not found: {file_path}", "encoding": None, "has_bom": False, "confidence": 0 } # Check file size file_size = os.path.getsize(file_path) if file_size == 0: return { "encoding": "empty", "has_bom": False, "confidence": 100, "file_size": 0, "first_bytes": "", "method": "empty-file" } # Read file with open(file_path, 'rb') as f: raw_data = f.read(min(max_bytes, file_size)) # Check BOM (highest priority) bom_encoding, bom_type = detect_bom(raw_data) if bom_encoding: return { "encoding": bom_encoding, "has_bom": True, "bom_type": bom_type, "confidence": 100, "file_size": file_size, "first_bytes": ' '.join(f'{b:02x}' for b in raw_data[:16]), "method": "bom-detection" } # Library-based detection (priority: charset-normalizer > chardet > fallback) detection_result = None if HAS_CHARSET_NORMALIZER: detection_result = detect_encoding_with_charset_normalizer(raw_data) elif HAS_CHARDET: detection_result = detect_encoding_with_chardet(raw_data) # Use fallback if library result is unavailable or confidence is low if not detection_result or detection_result["confidence"] < 60: fallback_result = fallback_encoding_detection(raw_data) if not detection_result or fallback_result["confidence"] > detection_result["confidence"]: detection_result = fallback_result # Build final result result = { "encoding": detection_result["encoding"], "has_bom": False, "bom_type": None, "confidence": detection_result["confidence"], "file_size": file_size, "first_bytes": ' '.join(f'{b:02x}' for b in raw_data[:16]), "method": detection_result["method"] } # Include additional information if available if "language" in detection_result: result["language"] = detection_result["language"] return result except Exception as e: return { "error": f"Error detecting file encoding: {str(e)}", "encoding": None, "has_bom": False, "confidence": 0 }
- encoding_mcp/server.py:155-178 (schema)Input schema definition for the detect_file_encoding tool, registered in the MCP server's list_tools handler.name="detect_file_encoding", description="Accurately detect file encoding using professional libraries.", inputSchema={ "type": "object", "properties": { "file_name": { "type": "string", "description": "File name to check (e.g., hello.cpp, test.h)" }, "directory_path": { "type": "string", "description": "Absolute path of directory containing the file" }, "max_bytes": { "type": "integer", "description": "Maximum bytes to analyze (default: 8192)", "default": 8192, "minimum": 512, "maximum": 65536 } }, "required": ["file_name", "directory_path"] } ),
- encoding_mcp/server.py:248-272 (registration)MCP server tool dispatch/registration logic for detect_file_encoding, which constructs the file path and calls the core detect_file_encoding function.elif name == "detect_file_encoding": file_name = arguments.get("file_name", "") directory_path = arguments.get("directory_path", "") max_bytes = arguments.get("max_bytes", 8192) if not file_name or not directory_path: return [ types.TextContent( type="text", text="❌ Both file name and directory path are required." ) ] # Combine file name and directory path file_path = os.path.join(directory_path, file_name) result = detect_file_encoding(file_path, max_bytes) formatted_result = format_encoding_result(result, file_path) return [ types.TextContent( type="text", text=formatted_result ) ]