Skip to main content
Glama
whyjp

Encoding MCP Server

detect_file_encoding

Identify file encoding to prevent character display issues, especially for C++ and PowerShell files in Windows environments.

Instructions

Accurately detect file encoding using professional libraries.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_nameYesFile name to check (e.g., hello.cpp, test.h)
directory_pathYesAbsolute path of directory containing the file
max_bytesNoMaximum bytes to analyze (default: 8192)

Implementation Reference

  • Core implementation of the detect_file_encoding tool. Detects encoding by first checking BOM, then using charset-normalizer or chardet libraries, falling back to heuristics if necessary.
    def detect_file_encoding(file_path: str, max_bytes: int = 8192) -> Dict[str, Any]:
        """
        Detect file encoding.
        
        Args:
            file_path: File path
            max_bytes: Maximum bytes to analyze (default 8KB)
            
        Returns:
            dict: Encoding information
        """
        try:
            if not os.path.exists(file_path):
                return {
                    "error": f"File not found: {file_path}",
                    "encoding": None,
                    "has_bom": False,
                    "confidence": 0
                }
            
            # Check file size
            file_size = os.path.getsize(file_path)
            if file_size == 0:
                return {
                    "encoding": "empty",
                    "has_bom": False,
                    "confidence": 100,
                    "file_size": 0,
                    "first_bytes": "",
                    "method": "empty-file"
                }
            
            # Read file
            with open(file_path, 'rb') as f:
                raw_data = f.read(min(max_bytes, file_size))
            
            # Check BOM (highest priority)
            bom_encoding, bom_type = detect_bom(raw_data)
            if bom_encoding:
                return {
                    "encoding": bom_encoding,
                    "has_bom": True,
                    "bom_type": bom_type,
                    "confidence": 100,
                    "file_size": file_size,
                    "first_bytes": ' '.join(f'{b:02x}' for b in raw_data[:16]),
                    "method": "bom-detection"
                }
            
            # Library-based detection (priority: charset-normalizer > chardet > fallback)
            detection_result = None
            
            if HAS_CHARSET_NORMALIZER:
                detection_result = detect_encoding_with_charset_normalizer(raw_data)
            elif HAS_CHARDET:
                detection_result = detect_encoding_with_chardet(raw_data)
            
            # Use fallback if library result is unavailable or confidence is low
            if not detection_result or detection_result["confidence"] < 60:
                fallback_result = fallback_encoding_detection(raw_data)
                if not detection_result or fallback_result["confidence"] > detection_result["confidence"]:
                    detection_result = fallback_result
            
            # Build final result
            result = {
                "encoding": detection_result["encoding"],
                "has_bom": False,
                "bom_type": None,
                "confidence": detection_result["confidence"],
                "file_size": file_size,
                "first_bytes": ' '.join(f'{b:02x}' for b in raw_data[:16]),
                "method": detection_result["method"]
            }
            
            # Include additional information if available
            if "language" in detection_result:
                result["language"] = detection_result["language"]
            
            return result
            
        except Exception as e:
            return {
                "error": f"Error detecting file encoding: {str(e)}",
                "encoding": None,
                "has_bom": False,
                "confidence": 0
            }
  • Input schema definition for the detect_file_encoding tool, registered in the MCP server's list_tools handler.
        name="detect_file_encoding",
        description="Accurately detect file encoding using professional libraries.",
        inputSchema={
            "type": "object",
            "properties": {
                "file_name": {
                    "type": "string",
                    "description": "File name to check (e.g., hello.cpp, test.h)"
                },
                "directory_path": {
                    "type": "string",
                    "description": "Absolute path of directory containing the file"
                },
                "max_bytes": {
                    "type": "integer",
                    "description": "Maximum bytes to analyze (default: 8192)",
                    "default": 8192,
                    "minimum": 512,
                    "maximum": 65536
                }
            },
            "required": ["file_name", "directory_path"]
        }
    ),
  • MCP server tool dispatch/registration logic for detect_file_encoding, which constructs the file path and calls the core detect_file_encoding function.
    elif name == "detect_file_encoding":
        file_name = arguments.get("file_name", "")
        directory_path = arguments.get("directory_path", "")
        max_bytes = arguments.get("max_bytes", 8192)
        
        if not file_name or not directory_path:
            return [
                types.TextContent(
                    type="text",
                    text="❌ Both file name and directory path are required."
                )
            ]
        
        # Combine file name and directory path
        file_path = os.path.join(directory_path, file_name)
        
        result = detect_file_encoding(file_path, max_bytes)
        formatted_result = format_encoding_result(result, file_path)
        
        return [
            types.TextContent(
                type="text",
                text=formatted_result
            )
        ]

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/whyjp/encoding_mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server