Skip to main content
Glama

content-core

context.md2.28 kB
# OSS-216: Remove libmagic Dependency - Context ## Why This Is Being Built - **Deployment Friction**: libmagic requires OS-level installation which creates deployment problems - **Cross-Platform Issues**: Binary dependency causes installation problems across Windows, macOS, and Linux - **Simplification**: Removing system dependencies makes the package easier to install and use - **Maintain Functionality**: Need to keep intelligent file detection without external dependencies ## Expected Outcome Replace libmagic with a pure Python implementation that: - Detects file types using magic bytes/file signatures (first 512 bytes) - Maintains the current routing behavior to appropriate content processors - Works across all platforms without OS-level dependencies - Keeps the same error handling (UnsupportedTypeException for unsupported types) ## Implementation Approach 1. **File Signature Detection System**: - Build comprehensive mapping of file signatures to MIME types - Read first 512 bytes to identify format by magic bytes - Special handling for Office formats (DOCX, XLSX, PPTX) which are ZIP-based - Content structure analysis for text formats (HTML, JSON, XML) 2. **Detection Priority** (as discussed): - Primary: File signature/magic bytes detection - Secondary: Content analysis for text formats - Tertiary: File extension as final fallback - If file extension and content disagree, prioritize content analysis 3. **Replace Current Usage**: - Remove imports of `magic` library - Replace `magic.from_file()` calls in: - `/src/content_core/content/identification/__init__.py` - `/src/content_core/content/extraction/graph.py` - Remove dependencies from `pyproject.toml` ## Testing Approach - Comprehensive testing will be handled later - Focus on maintaining existing functionality - Ensure all currently supported file types continue to work ## Dependencies No new dependencies - implementation should be pure Python using only standard library. ## Constraints - 512 bytes buffer is sufficient (no need for deep ZIP inspection) - Performance is not a concern (load is small) - Maintain current error behavior (raise UnsupportedTypeException) - MIME type strings can be adjusted as long as routing works correctly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lfnovo/content-core'

If you have feedback or need assistance with the MCP directory API, please join our Discord server