content-core

CHANGELOG.md•5.16 KiB

# Changelog All notable changes to Content Core will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [Unreleased] ## [1.14.1] - 2026-01-29 ### Fixed - **YouTube Transcript Extraction** - Updated to youtube-transcript-api v1.0+ API - The library removed deprecated static methods (`list_transcripts`, `get_transcript`) in v1.0 - Now uses instance-based API: `YouTubeTranscriptApi().list()` and `.fetch()` - Restored youtube-transcript-api as primary engine with pytubefix as fallback - **Video Processor Error Handling** - Fixed LangGraph compatibility issue - Video extraction now returns proper dict on error instead of `False` - Prevents `InvalidUpdateError: Expected dict, got False` when ffprobe is missing ## [1.14.0] - 2026-01-29 ### Changed - **Simplified Proxy Configuration** - Removed custom proxy infrastructure in favor of standard environment variables - Now uses standard `HTTP_PROXY` / `HTTPS_PROXY` environment variables (same as most HTTP clients) - Removed custom `CCORE_HTTP_PROXY` environment variable - Removed `proxy` field from `ProcessSourceInput` and `ProcessSourceState` - Removed programmatic API: `set_proxy()`, `clear_proxy()`, `get_proxy()`, `get_no_proxy()` - Removed proxy section from YAML configuration - All HTTP clients (aiohttp) now use `trust_env=True` to automatically read proxy settings - Crawl4AI bridges `HTTP_PROXY` to its `ProxyConfig` for consistent behavior - Aligns with Esperanto library's proxy handling approach ### Removed - `proxy` parameter from extraction API - Custom proxy configuration functions from `content_core.config` - Proxy-related unit and integration tests (proxy now handled by underlying HTTP clients) ## [1.13.0] - 2026-01-25 ### Added - **HTML to Markdown Conversion** - Auto-detect and convert HTML content to markdown - Detects HTML structure in text content (headings, paragraphs, lists, links, code, etc.) - Uses `markdownify` library for deterministic conversion - Useful for processing "rendered markdown" copied from preview panes (VS Code, Obsidian reading mode, browsers) - Plain text without HTML passes through unchanged - New exports in `processors/text.py`: `process_text_content`, `detect_html` ## [1.12.0] - 2026-01-25 ### Changed - **LangGraph v1 Migration** - Updated to LangGraph v1.0+ (from v0.3.x) - Minimum requirement now `langgraph>=1.0.0` - Updated StateGraph API: `input` → `input_schema`, `output` → `output_schema` - No breaking changes for users - same API surface maintained ## [1.11.0] - 2026-01-25 ### Added - **Self-Hosted Firecrawl Support** - Configure a custom Firecrawl API URL for self-hosted instances - Environment variable: `FIRECRAWL_API_BASE_URL` - YAML config: `extraction.firecrawl.api_url` - Programmatic API: `set_firecrawl_api_url()`, `get_firecrawl_api_url()` - Debug logging when using a custom base URL - Documentation with link to [Firecrawl self-hosting guide](https://github.com/mendableai/firecrawl/blob/main/SELF_HOST.md) ## [1.10.0] - 2026-01-16 ### Added - **HTTP/HTTPS Proxy Support** - Route all network requests through a configured proxy - 4-level configuration priority: Per-request > Programmatic > Environment variable > YAML config - Environment variables: `CCORE_HTTP_PROXY`, `HTTP_PROXY`, `HTTPS_PROXY` - Programmatic API: `set_proxy()`, `clear_proxy()`, `get_proxy()` - Per-request override via `proxy` parameter in `ProcessSourceState` - Bypass list support via `NO_PROXY` environment variable - Full proxy support for: aiohttp requests, Esperanto LLM/STT models, Crawl4AI, pytubefix, youtube-transcript-api - Warning logged when using Firecrawl (no client-side proxy support) - Pure Python file type detection via the new `FileDetector` class - Comprehensive file signature detection for 25+ file formats - Smart detection for ZIP-based formats (DOCX, XLSX, PPTX, EPUB) - Custom audio model configuration - override speech-to-text provider and model at runtime - Pass `audio_provider` and `audio_model` parameters through `extract_content()` API - Supports any provider/model combination available through Esperanto library - Maintains full backward compatibility - existing code works unchanged - Includes validation with helpful warnings and error messages ### Changed - File type detection now uses pure Python implementation instead of libmagic - Improved cross-platform compatibility - no system dependencies required ### Removed - Dependency on `python-magic` and `python-magic-bin` - System requirement for libmagic library ### Technical Details - New proxy configuration module in `content_core/config.py` - Proxy support integrated into all network-making components - Replaced libmagic dependency with custom `FileDetector` implementation - File detection based on binary signatures and content analysis - Maintains same API surface - no breaking changes for users - Significantly simplified installation process across all platforms ## Previous Releases For releases prior to this changelog, please see the [GitHub releases page](https://github.com/lfnovo/content-core/releases).

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lfnovo/content-core'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

CHANGELOG.md•5.16 KiB