Link Scan MCP Server

README.md•12.6 KiB

# Link Scan MCP Server 🚀 링크를 스캔하고 요약을 제공하는 포괄적인 Model Context Protocol (MCP) 서버입니다. YouTube, Instagram Reels 등 비디오 링크와 블로그, 기사 등 텍스트 링크를 자동으로 감지하고 분석하여 3문장 이내의 간결한 요약을 제공합니다. API 키 없이 모든 기능을 사용할 수 있습니다! **Link Scan MCP Server** - A comprehensive Model Context Protocol (MCP) server for scanning and summarizing links. Automatically detects and analyzes video links (YouTube, Instagram Reels) and text links (blogs, articles) to provide concise 3-sentence summaries. All features work without requiring API keys! **Python 3.11+** | **MCP Compatible** | **License: MIT** ## ✨ Features ### 🎥 Video Link Analysis - **YouTube Support** - Comprehensive metadata extraction (title, description) - Subtitle extraction for first 7 seconds (yt-dlp) - Audio transcription using OpenAI Whisper - Integrated summarization combining all text sources - **Instagram Reels Support** - Audio download and transcription (first 7 seconds) - Automatic content summarization - **Smart Link Detection** - Automatic video/text link type detection - Error handling for unsupported URLs ### 📝 Text Link Analysis - **Web Content Extraction** - BeautifulSoup-based HTML parsing - Main content area detection - Automatic navigation/ad removal - **Intelligent Summarization** - Llama3-powered text summarization - 3-sentence limit enforcement - Natural Korean output ### 🤖 AI-Powered Summarization - **Llama3 Integration** - Local LLM via Ollama (no API keys required) - Separate prompts for video and text content - Fallback to original text on errors - **Whisper Transcription** - High-quality speech-to-text conversion - Optimized for speed and accuracy - Supports multiple languages ### 🐳 Docker Support - **One-Command Setup** - Docker Compose configuration - Automatic Ollama service setup - Llama3 model auto-download - Development mode with hot reload ### 🔧 Developer-Friendly - **Type-safe with Pydantic models** - **Async/await support** for better performance - **Comprehensive error handling** - **Extensible architecture** - **Hot reload** in development mode ## 🚀 Quick Start ### Installation ```bash # Clone the repository git clone https://github.com/your-username/mcp-link-scan.git cd mcp-link-scan # Install dependencies pip install -r requirements.txt ``` ### System Dependencies **ffmpeg** (required for audio processing): - macOS: `brew install ffmpeg` - Ubuntu/Debian: `sudo apt-get install ffmpeg` - Windows: Download from https://ffmpeg.org/download.html **Ollama** (required for summarization): - macOS: `brew install ollama` or download from https://ollama.com/download - Linux: `curl -fsSL https://ollama.com/install.sh | sh` - Windows: Download from https://ollama.com/download - After installation: `ollama pull llama3:latest` ### Configuration Create a `.env` file: ```bash # 서버 설정 PORT=8000 # 서버 포트 (기본값: 8000) HOST=0.0.0.0 # 서버 호스트 (기본값: 0.0.0.0) DEBUG=False # 디버그 모드 (기본값: False) # API 경로 prefix (선택) # 같은 서버에 여러 MCP 서버를 호스팅할 때 사용 # 기본값: /link-scan API_PREFIX=/link-scan # Ollama 설정 (선택) # Docker Compose를 사용하는 경우 자동으로 설정됨 OLLAMA_API_URL=http://localhost:11434 # Ollama API URL (기본값: http://localhost:11434) OLLAMA_MODEL=llama3:latest # 사용할 Ollama 모델 (기본값: llama3) ``` #### 환경 변수 설명 | 변수명 | 필수 | 기본값 | 설명 | |--------|------|--------|------| | `PORT` | ❌ | `8000` | 서버가 사용할 포트 번호 | | `HOST` | ❌ | `0.0.0.0` | 서버가 바인딩할 호스트 주소 | | `DEBUG` | ❌ | `False` | 디버그 모드 활성화 (`True`/`False`) | | `API_PREFIX` | ❌ | `/link-scan` | API 엔드포인트 경로 prefix | | `OLLAMA_API_URL` | ❌ | `http://localhost:11434` | Ollama API 서버 URL | | `OLLAMA_MODEL` | ❌ | `llama3` | 사용할 Ollama 모델 이름 | ### Running as MCP Server **Local Mode (stdio):** ```bash python -m src.server ``` **Remote Mode (HTTP):** ```bash python run_server.py ``` Or with uvicorn directly: ```bash uvicorn src.server_http:app --host 0.0.0.0 --port 8000 ``` ### Docker Setup (Recommended) **Using Docker Compose:** ```bash # Start all services (link-scan + Ollama) docker-compose up -d # Check logs docker-compose logs -f # Stop services docker-compose down ``` Docker Compose automatically: - Sets up Ollama service with 8GB memory - Downloads Llama3 model - Configures link-scan service - Enables development mode with hot reload **Development Mode:** The `docker-compose.yml` is configured for development with: - Source code volume mounting - Hot reload enabled (`DEBUG=True`) - Automatic code changes detection ### Testing with MCP Inspector You can test the server using the MCP Inspector tool: ```bash # Test with Python npx @modelcontextprotocol/inspector python run_server.py # Or test stdio mode npx @modelcontextprotocol/inspector python -m src.server ``` The MCP Inspector provides a web interface to: - View available tools and their schemas - Test tool execution with sample inputs - Debug server responses and error handling - Validate MCP protocol compliance ## 🛠️ Available Tools ### 1. `scan_video_link` Scan and summarize video links (YouTube, Instagram Reels, etc.). **Parameters:** - `url` (string, required): Video URL to scan **Example:** ```json { "name": "scan_video_link", "arguments": { "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ" } } ``` **Process:** 1. Detects link type (YouTube, Instagram, etc.) 2. For YouTube: Extracts title, description, subtitles (first 7s) 3. Downloads audio (first 7 seconds) 4. Transcribes audio with Whisper 5. Combines all text sources 6. Summarizes with Llama3 (3 sentences max) ### 2. `scan_text_link` Scan and summarize text links (blogs, articles, etc.). **Parameters:** - `url` (string, required): Text URL to scan **Example:** ```json { "name": "scan_text_link", "arguments": { "url": "https://example.com/blog/article" } } ``` **Process:** 1. Fetches HTML content 2. Extracts main text content 3. Removes navigation, ads, and noise 4. Summarizes with Llama3 (3 sentences max) ## 📊 Example Outputs ### Video Link Summary **Input:** YouTube video URL **Output:** ``` 이 영상은 Python 프로그래밍 언어의 기본 개념을 소개합니다. 변수, 함수, 클래스 등 핵심 문법을 실습 예제와 함께 설명합니다. 초보자도 쉽게 따라할 수 있도록 단계별로 구성되어 있습니다. ``` ### Text Link Summary **Input:** Blog article URL **Output:** ``` 이 글은 Docker 컨테이너 기술의 장단점을 분석합니다. 가상화 기술과 비교하여 리소스 효율성과 배포 편의성을 강점으로 제시합니다. 다만 보안과 복잡성 측면에서 주의가 필요하다고 조언합니다. ``` ## 🏗️ Architecture ``` mcp-link-scan/ ├── src/ │ ├── server.py # Local server (stdio) │ ├── server_http.py # Remote server (HTTP) │ ├── tools/ # MCP tools │ │ ├── link_scanner.py # Main tool definitions │ │ ├── media_handler.py # Video processing (Whisper) │ │ └── text_handler.py # Text extraction │ ├── utils/ # Utilities │ │ ├── link_detector.py # Link type detection │ │ ├── youtube_extractor.py # YouTube metadata/subtitles │ │ └── llm_summarizer.py # Llama3 integration │ └── prompts/ # LLM prompts │ └── __init__.py # Video/text prompt templates ├── docker/ │ └── init-ollama.sh # Ollama initialization script ├── docker-compose.yml # Docker services ├── Dockerfile # Container build config ├── requirements.txt # Python dependencies └── run_server.py # Server entry point ``` ## 🔧 Development ### Setting up Development Environment ```bash # Clone and install git clone https://github.com/your-username/mcp-link-scan.git cd mcp-link-scan pip install -r requirements.txt # Set up environment variables cp .env.example .env # Edit .env with your settings # Start Ollama (if not using Docker) ollama serve ollama pull llama3:latest ``` ### Development Mode with Docker ```bash # Start in development mode (hot reload enabled) docker-compose up -d # View logs docker-compose logs -f link-scan # Code changes are automatically reloaded ``` ### Running Tests ```bash # Run all tests pytest # Run with coverage pytest --cov=src # Run specific test file pytest tests/test_link_scanner.py ``` ### Customizing Prompts Edit `src/prompts/__init__.py` to customize LLM prompts: ```python # Video summarization prompt VIDEO_SUMMARIZE_SYSTEM = """ Your custom system prompt here... """ # Text summarization prompt TEXT_SUMMARIZE_SYSTEM = """ Your custom system prompt here... """ ``` ### Configuring Whisper Model Edit `src/tools/media_handler.py`: ```python # Change model size (tiny, base, small, medium, large) _whisper_model = whisper.load_model("base") # Default: "base" ``` ## 📋 Requirements - **Python 3.11+** - **ffmpeg** - Audio processing - **Ollama** - LLM runtime (for summarization) - **yt-dlp** - Video/audio download - **openai-whisper** - Speech-to-text - **torch** - PyTorch (for Whisper) - **aiohttp** - Async HTTP client - **beautifulsoup4** - HTML parsing - **fastapi** - HTTP server framework - **uvicorn** - ASGI server - **mcp** - Model Context Protocol SDK ## 🌐 Deployment ### PlayMCP Registration 1. **Deploy Server**: Deploy to cloud hosting (Render, Railway, Fly.io, AWS, GCP, etc.) 2. **Get Server URL**: Example: `https://your-server.railway.app` 3. **Register in PlayMCP**: Use URL `https://your-server.railway.app/messages` **Important:** Server URL must be publicly accessible and support HTTPS for production use. ### Using with MCP Clients **Amazon Q CLI:** ```json { "mcpServers": { "link-scan": { "command": "python", "args": ["run_server.py"], "cwd": "/path/to/mcp-link-scan" } } } ``` **Other MCP Clients:** ```json { "mcpServers": { "link-scan": { "url": "https://your-server.com/messages" } } } ``` ## 🤝 Contributing We welcome contributions! Please follow these steps: 1. Fork the repository 2. Create a feature branch (`git checkout -b feature/amazing-feature`) 3. Make your changes 4. Add tests for new functionality 5. Ensure all tests pass (`pytest`) 6. Commit your changes (`git commit -m 'Add amazing feature'`) 7. Push to the branch (`git push origin feature/amazing-feature`) 8. Open a Pull Request ### Development Workflow ```bash # Install in development mode pip install -e . # Run tests pytest # Format code (if using formatters) black src/ tests/ isort src/ tests/ ``` ## 📄 License This project is licensed under the MIT License - see the LICENSE file for details. ## 🙏 Acknowledgments - **yt-dlp** team for the excellent YouTube extraction library - **OpenAI Whisper** team for the speech-to-text model - **Ollama** team for the local LLM runtime - **MCP** team for the Model Context Protocol specification - **Pydantic** team for the data validation library ## 📞 Support - **Issues**: [GitHub Issues](https://github.com/your-username/mcp-link-scan/issues) - **Discussions**: [GitHub Discussions](https://github.com/your-username/mcp-link-scan/discussions) ## 🗺️ Roadmap - [ ] Batch processing for multiple links - [ ] Caching layer for improved performance - [ ] Export functionality (JSON, CSV, etc.) - [ ] Advanced analytics (sentiment analysis, topic extraction) - [ ] Support for more video platforms (TikTok, Vimeo, etc.) - [ ] WebSocket support for real-time updates - [ ] Integration examples with popular MCP clients - [ ] Custom prompt templates via API - [ ] Multi-language support for summaries - [ ] Video thumbnail extraction ## 📝 Notes - Audio downloads are temporarily stored and automatically cleaned up - Whisper model is loaded once and reused for better performance - Processing time depends on video length and Whisper model size - **YouTube videos are processed for first 7 seconds only** to reduce processing time - **All text sources (title, description, subtitles, transcription) are combined for YouTube videos** - **Summaries are limited to 3 sentences maximum** - For production, consider using GPU for faster Whisper conversion - Ollama timeout is set to 5 minute for tool calls

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/chweyun/mcp-link-scan'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•12.6 KiB