# LearnMCP Server
A standalone MCP server that enhances Forest with learning content extraction and summarization capabilities.
## Overview
LearnMCP extracts and summarizes learning content from various sources (YouTube videos, PDFs, web articles) and makes those summaries available to Forest's HTA builder for more informed task generation.
## Features
- **Content Extraction**: YouTube videos (with transcripts), PDF documents, web articles
- **Background Processing**: Async content processing with queue management
- **Smart Summarization**: Content chunking and summarization with relevance scoring
- **Forest Integration**: Optional integration with Forest's HTA tree builder
- **Standalone Operation**: Can be enabled/disabled independently of Forest
## Architecture
```
User → LearnMCP Tools → LearnService → BackgroundProcessor ⇄ Extractors ⇄ Summarizer → DataPersistence
↓
<DATA_DIR>/learn-content/
↓
Forest HTA Builder (optional)
```
## Installation
1. **Install Dependencies**:
```bash
cd learn-mcp-server
npm install
```
2. **Configure MCP**: Add to your `mcp-config.json`:
```json
{
"mcpServers": {
"learn-mcp": {
"command": "node",
"args": ["server.js"],
"cwd": "learn-mcp-server",
"env": {
"FOREST_DATA_DIR": "<same as Forest>"
}
}
}
}
```
3. **Start Server**: The server starts automatically when Claude Desktop loads the MCP config.
## Available Tools
### `add_learning_sources`
Add learning sources (URLs) to a project for content extraction.
**Parameters:**
- `project_id` (string): Project ID to add sources to
- `urls` (array): Array of URLs (YouTube, PDF, articles)
**Example:**
```json
{
"project_id": "my_project",
"urls": [
"https://youtube.com/watch?v=example",
"https://example.com/document.pdf",
"https://blog.example.com/article"
]
}
```
### `process_learning_sources`
Start background processing of pending learning sources.
**Parameters:**
- `project_id` (string): Project ID to process sources for
### `list_learning_sources`
List learning sources for a project, optionally filtered by status.
**Parameters:**
- `project_id` (string): Project ID
- `status` (string, optional): Filter by status (pending, processing, completed, failed)
### `get_learning_summary`
Get learning content summary for a project or specific source.
**Parameters:**
- `project_id` (string): Project ID
- `source_id` (string, optional): Specific source ID (if not provided, returns aggregated summary)
- `token_limit` (number, optional): Maximum tokens for aggregated summary (default: 2000)
### `delete_learning_sources`
Delete learning sources and their summaries.
**Parameters:**
- `project_id` (string): Project ID
- `source_ids` (array): Array of source IDs to delete
### `get_processing_status`
Get current processing status for learning sources.
**Parameters:**
- `project_id` (string): Project ID
## Supported Content Types
### YouTube Videos
- Extracts video metadata (title, author, duration, etc.)
- Downloads transcripts when available
- Falls back to description if no transcript
### PDF Documents
- Extracts text content from remote PDF URLs
- Preserves document metadata
- Handles various PDF formats
### Web Articles
- Uses Mozilla Readability for clean content extraction
- Extracts metadata (title, author, publish date, etc.)
- Estimates reading time
## Data Storage
LearnMCP stores data in `<FOREST_DATA_DIR>/learn-content/`:
```
learn-content/
├── <project_id>/
│ ├── sources.json # Source registry
│ └── summaries/
│ ├── <source_id>.json # Individual summaries
│ └── ...
```
## Forest Integration
When both LearnMCP and Forest are active, Forest's HTA builder can optionally include learning content summaries in its task generation prompts. This happens automatically when:
1. LearnMCP has processed learning sources for a project
2. Forest builds an HTA tree for the same project
3. Learning content summaries are injected into the HTA generation prompt
## Workflow Examples
### Basic Learning Content Workflow
1. **Add Sources**:
```
add_learning_sources(project_id="learn_python", urls=["https://youtube.com/watch?v=python_tutorial"])
```
2. **Process Content**:
```
process_learning_sources(project_id="learn_python")
```
3. **Check Status**:
```
get_processing_status(project_id="learn_python")
```
4. **Get Summary**:
```
get_learning_summary(project_id="learn_python")
```
### Integrated with Forest
1. Add and process learning sources in LearnMCP
2. Build HTA tree in Forest - it will automatically include learning content context
3. Generated tasks will be informed by the processed learning materials
## Configuration
### Environment Variables
- `FOREST_DATA_DIR`: Shared data directory with Forest (required)
- `LOG_LEVEL`: Logging level (debug, info, warn, error)
- `NODE_ENV`: Environment (development, production)
### Background Processor Settings
- **Max Queue Size**: 50 tasks
- **Max Concurrent**: 2 simultaneous extractions
- **Processing Interval**: 3 seconds
- **Retry Attempts**: 3 per source
- **Timeout**: 5 minutes per extraction
## Error Handling
- **Graceful Degradation**: Failed extractions don't block other sources
- **Retry Logic**: Automatic retries with exponential backoff
- **Comprehensive Logging**: Detailed logs for debugging
- **Status Tracking**: Clear status indicators for each source
## Development
### Running Tests
```bash
npm test
```
### Linting
```bash
npm run lint
npm run lint:fix
```
### Debugging
Set `LOG_LEVEL=debug` for detailed logging.
## Troubleshooting
### Common Issues
1. **YouTube extraction fails**: Check if video has transcripts enabled
2. **PDF extraction fails**: Ensure PDF is publicly accessible
3. **Article extraction fails**: Some sites block automated access
### Logs
Check logs in `<FOREST_DATA_DIR>/logs/`:
- `learn-mcp.log`: General operations
- `learn-mcp-errors.log`: Error details
## License
MIT License - Same as Forest MCP Server