README.md•19.3 kB
[](https://mseep.ai/app/buhe-mcp-rss)
# MCP RSS
MCP RSS is a Model Context Protocol (MCP) server for intelligent RSS feed management with advanced search capabilities, semantic search using AI embeddings, and a comprehensive reading workflow.
## Features
- 📰 **RSS Feed Management** - Parse OPML files and automatically fetch articles from RSS feeds
- 🔍 **Advanced Search** - Keyword search with date range, category, and status filtering
- 🤖 **Semantic Search** - AI-powered natural language search using OpenAI embeddings (optional)
- 📊 **Smart Organization** - Four-status workflow (unread/read/favorite/archived)
- 📅 **Daily Digest** - Get today's unread articles grouped by category
- 🚀 **High Performance** - PostgreSQL with pgvector for efficient vector similarity search
- 🔄 **Auto-Deduplication** - Prevents duplicate articles and wasted API calls
- ⚡ **Token-Efficient** - Browse titles/excerpts first, fetch full content only when needed
- 📑 **Pagination Support** - Handle large feed collections (500+) with efficient pagination
## Installation
### Prerequisites
- Node.js (v18 or higher)
- Docker & Docker Compose (for PostgreSQL with pgvector)
- OpenAI API Key (optional, only for semantic search)
### Quick Start with Docker Compose
1. **Clone or install the package:**
```bash
npm install -g mcp_rss
# OR for local development
git clone <repository-url>
cd mcp_rss
npm install
```
2. **Start PostgreSQL with pgvector:**
```bash
docker-compose up -d
```
3. **Configure environment variables:**
```bash
cp .env.example .env
# Edit .env with your settings
```
4. **Build the project:**
```bash
npm run build
```
### Database Setup
The project uses PostgreSQL 17 with pgvector extension for vector similarity search.
**Using Docker Compose (Recommended):**
```bash
docker-compose up -d # Start PostgreSQL
docker-compose down # Stop PostgreSQL
docker-compose down -v # Stop and remove volumes (fresh start)
docker-compose logs -f postgres # View PostgreSQL logs
```
**Manual PostgreSQL Setup:**
```bash
docker run -d \
--name mcp-rss-postgres \
-p 5433:5432 \
-e POSTGRES_USER=mcp_user \
-e POSTGRES_PASSWORD=123456 \
-e POSTGRES_DB=mcp_rss \
pgvector/pgvector:pg17
```
## Configuration
### Environment Variables
Create a `.env` file with the following configuration:
| Variable | Description | Default | Required |
|----------|-------------|---------|----------|
| **Database Configuration** |
| `DB_HOST` | PostgreSQL host | `localhost` | No |
| `DB_PORT` | PostgreSQL port | `5433` | No |
| `DB_USER` / `DB_USERNAME` | Database username | `mcp_user` | No |
| `DB_PASSWORD` | Database password | `123456` | No |
| `DB_NAME` / `DB_DATABASE` | Database name | `mcp_rss` | No |
| **RSS Configuration** |
| `OPML_FILE_PATH` | Path to OPML file with RSS feeds | `./feeds.opml` | Yes |
| `RSS_UPDATE_INTERVAL` | Feed update interval (minutes) | `1` | No |
| **OpenAI Configuration** |
| `OPENAI_API_KEY` | OpenAI API key for embeddings | - | No* |
\* *Only required for semantic search feature. All other features work without it.*
### Claude Desktop Configuration
For local development, use the built dist folder:
```json
{
"mcpServers": {
"rss": {
"command": "node",
"args": ["/absolute/path/to/mcp_rss/dist/index.js"],
"env": {
"OPML_FILE_PATH": "/path/to/your/feeds.opml",
"DB_HOST": "localhost",
"DB_PORT": "5433",
"DB_USER": "mcp_user",
"DB_PASSWORD": "123456",
"DB_NAME": "mcp_rss",
"RSS_UPDATE_INTERVAL": "60",
"OPENAI_API_KEY": "sk-your-key-here"
}
}
}
}
```
For global installation via npm:
```json
{
"mcpServers": {
"rss": {
"command": "npx",
"args": ["mcp_rss"],
"env": {
"OPML_FILE_PATH": "/path/to/your/feeds.opml",
"OPENAI_API_KEY": "sk-your-key-here"
}
}
}
}
```
## MCP Tools Reference
The server exposes 8 powerful tools for RSS feed management:
### Token Efficiency Guide
All list/search tools now support ultra-efficient token usage:
- **Default behavior**: Returns ONLY titles and metadata (no excerpts, no content)
- **Optional excerpts**: Set `includeExcerpt: true` for content previews (moderate token usage)
- **Full content**: Set `includeContent: true` for complete article text (high token usage)
- **On-demand content**: Use `get_article_full` to fetch specific articles by ID (most efficient)
**Recommended workflow (90%+ token savings):**
1. Browse titles only with `get_content` or `search_articles` (default settings)
2. Identify interesting articles from titles alone
3. Optionally fetch excerpts for borderline cases with `includeExcerpt: true`
4. Fetch full content with `get_article_full` for selected articles only
**Token Usage Comparison:**
- Titles only: ~50-100 tokens per article
- Titles + excerpts: ~150-300 tokens per article
- Titles + full content: ~1,000-5,000 tokens per article
### 1. get_content
Get articles with basic filtering and pagination. **Returns latest articles first** (sorted by pubDate DESC).
**Use this for:**
- Browsing recent articles
- Checking unread articles
- Simple filtering by status or source
- Date range filtering for specific time periods
**Token Efficiency:**
- By default, returns ONLY titles and metadata (most token-efficient)
- Set `includeExcerpt: true` to add content previews
- Set `includeContent: true` to get full article text
- For best efficiency: browse titles only, then use `get_article_full` for specific articles
**Parameters:**
| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| `statuses` | `string[]` | Filter by statuses: `"unread"`, `"read"`, `"favorite"`, `"archived"` | All statuses |
| `source` | `string` | Filter by feed source title | All sources |
| `limit` | `number` | Number of articles to return | `10` |
| `offset` | `number` | Offset for pagination | `0` |
| `favoriteBlogsOnly` | `boolean` | Only show articles from favorite blogs | `false` |
| `prioritizeFavoriteBlogs` | `boolean` | Show favorite blog articles first | `false` |
| `includeContent` | `boolean` | Include full article content (uses more tokens) | `false` |
| `includeExcerpt` | `boolean` | Include article excerpt/preview | `false` |
| `startDate` | `string` | Start date (ISO: YYYY-MM-DD or YYYY-MM-DDTHH:mm:ssZ) | - |
| `endDate` | `string` | End date (ISO format) | - |
**Example (titles only - most efficient):**
```json
{
"statuses": ["unread"],
"limit": 20
}
```
**Example (with date range and excerpts):**
```json
{
"startDate": "2025-10-01",
"endDate": "2025-10-25",
"includeExcerpt": true,
"limit": 15
}
```
**Example (favorite blogs with full content):**
```json
{
"favoriteBlogsOnly": true,
"limit": 5,
"includeContent": true
}
```
**Response (default - titles only, no excerpt/content):**
```json
{
"articles": [
{
"id": 123,
"title": "Article Title",
"link": "https://example.com/article",
"pubDate": "2024-01-15T10:30:00Z",
"fetchDate": "2024-01-15T11:00:00Z",
"status": "unread",
"feedTitle": "Engineering Blog",
"feedCategory": "Technology"
}
],
"total": 150,
"success": true
}
```
---
### 2. search_articles
Advanced search with keyword matching, date ranges, categories, and status filters. **Searches both title and content**.
**Use this for:**
- Finding articles on specific topics
- Date-based filtering
- Complex multi-criteria searches
- Category-specific searches
**Parameters:**
| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| `keyword` | `string` | Search term (case-insensitive, searches title + content) | - |
| `category` | `string` | Filter by feed category | - |
| `statuses` | `string[]` | Filter by article statuses | All |
| `startDate` | `string` | Start date (ISO format: `YYYY-MM-DD` or `YYYY-MM-DDTHH:mm:ssZ`) | - |
| `endDate` | `string` | End date (ISO format) | - |
| `limit` | `number` | Number of results | `20` |
| `offset` | `number` | Offset for pagination | `0` |
| `includeContent` | `boolean` | Include full article content (uses more tokens) | `false` |
**Example:**
```json
{
"keyword": "kubernetes",
"category": "Engineering",
"startDate": "2024-01-01",
"endDate": "2024-12-31",
"statuses": ["unread"],
"limit": 10
}
```
---
### 3. semantic_search
**AI-powered semantic search** using OpenAI embeddings. Finds conceptually similar articles even without exact keyword matches.
**Use this for:**
- Natural language queries
- Finding related concepts
- Research and discovery
- Topic exploration
**Requirements:**
- `OPENAI_API_KEY` must be set
- Only works for articles from 2020 onwards
- Automatically disabled if API key is missing (fails gracefully)
**Parameters:**
| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| `query` | `string` | Natural language search query (required) | - |
| `includeContent` | `boolean` | Include full article content (uses more tokens) | `false` |
| `limit` | `number` | Number of results | `10` |
| `statuses` | `string[]` | Filter by article statuses | All |
| `category` | `string` | Filter by feed category | - |
**Example:**
```json
{
"query": "how to optimize database performance and reduce query latency",
"limit": 5,
"statuses": ["unread"]
}
```
**How it works:**
1. Converts your query into a 1536-dimensional vector using OpenAI
2. Compares against article embeddings using pgvector cosine similarity
3. Returns semantically similar articles ranked by relevance
---
### 4. get_daily_digest
Get **today's unread articles** grouped by category. Perfect for daily reading workflows. Filters by **publication date** (pubDate), not fetch date.
**Use this for:**
- Morning briefings
- Daily catch-up
- Category-organized reading
- Articles published today (based on pubDate)
**Parameters:**
| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| `limit` | `number` | Max articles per category | `5` |
| `includeContent` | `boolean` | Include full article content (uses more tokens) | `false` |
**Example:**
```json
{
"limit": 5
}
```
**Response:**
Articles grouped by category, with up to N articles per category fetched today.
---
### 5. get_weekly_favorites
**NEW:** Get favorite articles from the last 7 days (titles only). Perfect for weekly review of bookmarked content.
**Use this for:**
- Weekly reading lists
- Reviewing saved articles from the past week
- Tracking important bookmarked content
- Quick overview of what you found valuable recently
**Parameters:** None
**Example:**
No parameters needed - simply call the tool.
**Response:**
```json
{
"articles": [
{
"id": 789,
"title": "Optimizing PostgreSQL for High Write Throughput",
"link": "https://engineering.example.com/postgres-optimization",
"pubDate": "2025-10-22T14:30:00Z",
"fetchDate": "2025-10-22T15:00:00Z",
"status": "favorite",
"feedTitle": "Engineering at Example",
"feedCategory": "Database"
},
{
"id": 654,
"title": "Building Resilient Microservices with Circuit Breakers",
"link": "https://blog.example.com/circuit-breakers",
"pubDate": "2025-10-20T09:15:00Z",
"fetchDate": "2025-10-20T10:00:00Z",
"status": "favorite",
"feedTitle": "Tech Blog",
"feedCategory": "Architecture"
}
],
"total": 2,
"success": true
}
```
**Features:**
- Returns articles marked as "favorite" published in last 7 days
- Sorted by publication date (newest first)
- Ultra token-efficient - titles and metadata only
- No excerpts or content by default
- Use `get_article_full` to read full content of any article
---
### 6. get_article_full
Get full article content by ID. Use this for token-efficient reading: browse titles first, then fetch complete content only for articles you want to read.
**Use this for:**
- Reading full articles after browsing titles
- Getting complete content for specific interesting articles
- Token-efficient workflow (browse → select → read)
**Parameters:**
| Parameter | Type | Description | Required |
|-----------|------|-------------|----------|
| `articleId` | `number` | Article ID from get_content/search_articles | Yes |
**Example:**
```json
{
"articleId": 123
}
```
**Response:**
```json
{
"articles": [
{
"id": 123,
"title": "Complete Article Title",
"content": "Full article content with all HTML and formatting...",
"link": "https://example.com/article",
"pubDate": "2024-01-15T10:30:00Z",
"fetchDate": "2024-01-15T11:00:00Z",
"status": "unread",
"feedTitle": "Engineering Blog",
"feedCategory": "Technology",
"excerpt": "First 200 characters..."
}
],
"success": true
}
```
**Token-Efficient Workflow:**
```
1. get_content(limit=20) → Browse 20 titles/excerpts
2. Find interesting article with id=456
3. get_article_full(articleId=456) → Read full content
4. set_tag(articleId=456, status="favorite") → Save for later
```
---
### 7. get_sources
Get RSS feed sources with pagination and filtering. With hundreds of feeds, pagination is essential to avoid token limits.
**Use this for:**
- Discovering available sources
- Finding valid source names for filtering
- Exploring feed categories
- Browsing favorite blogs
**Parameters:**
| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| `limit` | `number` | Number of sources to return (max recommended: 100) | `50` |
| `offset` | `number` | Offset for pagination (e.g., 50 for page 2) | `0` |
| `favoritesOnly` | `boolean` | Only show favorite blogs | `false` |
| `category` | `string` | Filter by category (case-insensitive, partial match) | All categories |
**Example (first page):**
```json
{
"limit": 50,
"offset": 0
}
```
**Example (favorites only):**
```json
{
"favoritesOnly": true,
"limit": 20
}
```
**Example (filter by category):**
```json
{
"category": "Engineering",
"limit": 30
}
```
**Response:**
```json
{
"sources": [
{
"id": 1,
"title": "Engineering at Meta",
"category": "Engineering Blogs",
"url": "https://engineering.fb.com/feed/",
"isFavorite": true
},
{
"id": 2,
"title": "Netflix Tech Blog",
"category": "Engineering Blogs",
"url": "https://netflixtechblog.com/feed",
"isFavorite": false
}
],
"total": 518,
"success": true
}
```
**Pagination Example:**
```
Page 1: offset=0, limit=50 → Sources 1-50 of 518
Page 2: offset=50, limit=50 → Sources 51-100 of 518
Page 3: offset=100, limit=50 → Sources 101-150 of 518
```
---
### 8. set_tag
Update article status to manage your reading workflow.
**Use this for:**
- Marking articles as read
- Saving favorites
- Archiving old articles
- Managing reading queue
**Parameters:**
| Parameter | Type | Description | Required |
|-----------|------|-------------|----------|
| `articleId` | `number` | Article ID to update | Yes |
| `status` | `string` | New status: `"unread"`, `"read"`, `"favorite"`, `"archived"` | Yes |
**Example:**
```json
{
"articleId": 123,
"status": "favorite"
}
```
## Article Status Workflow
The server supports a comprehensive 4-status workflow:
```
┌─────────┐
│ unread │ ← New articles start here
└────┬────┘
│
├──→ read (marked as read)
├──→ favorite (important/bookmarked)
└──→ archived (old/irrelevant)
```
## Vector Search & Embeddings
### How Embeddings Work
1. **Automatic Generation**: When fetching RSS articles, the server automatically generates embeddings for articles from **2020 onwards**
2. **OpenAI Integration**: Uses `text-embedding-3-small` model (1536 dimensions)
3. **Deduplication**: Embeddings are only generated once per article (checked by URL)
4. **Graceful Degradation**: If `OPENAI_API_KEY` is missing or invalid, the server continues to work normally (embeddings skipped)
### Storage
- Embeddings stored as `vector(1536)` in PostgreSQL using pgvector extension
- Enables fast cosine similarity search: `ORDER BY embedding <=> query_vector`
### Cost Optimization
- Only articles from 2020+ get embeddings (configurable in `RssService.shouldGenerateEmbedding()`)
- Duplicate articles are skipped (no redundant API calls)
- Embedding generation failures don't block article saving
## Development
### Project Structure
```
mcp_rss/
├── src/
│ ├── entities/ # TypeORM entities
│ │ ├── Article.ts # Article entity with vector embeddings
│ │ └── Feed.ts # RSS feed source entity
│ ├── services/
│ │ ├── OpmlService.ts # OPML parsing
│ │ ├── RssService.ts # RSS fetching + embedding generation
│ │ ├── McpService.ts # MCP tool implementations
│ │ └── EmbeddingService.ts # OpenAI embedding wrapper
│ ├── config/
│ │ └── database.ts # TypeORM + pgvector setup
│ └── index.ts # MCP server entry point
├── docker-compose.yml # PostgreSQL with pgvector
├── .env.example # Environment template
└── package.json
```
### Building
```bash
npm run build # Compile TypeScript
npm run watch # Watch mode for development
```
### Testing
```bash
# Test database connection
docker-compose ps
# Test MCP server locally
node dist/index.js
# Debug with MCP inspector
npx @modelcontextprotocol/inspector node dist/index.js
```
## Troubleshooting
### Database Connection Issues
**Error: `connect ETIMEDOUT`**
- Ensure PostgreSQL is running: `docker-compose ps`
- Check port 5433 is available: `lsof -i :5433`
- Verify environment variables match docker-compose settings
### OpenAI API Errors
**Error: `401 Incorrect API key`**
- Verify your API key at https://platform.openai.com/api-keys
- Ensure you have available credits
- Check the key isn't expired
**Embeddings not being generated:**
- Server works fine without API key (embeddings skipped)
- Check article dates (only 2020+ articles get embeddings)
- Look for errors in console logs
### MCP Server Issues
**Server not appearing in Claude Desktop:**
1. Check Claude Desktop config path is correct
2. Verify `dist/index.js` exists (run `npm run build`)
3. Restart Claude Desktop after config changes
4. Check Claude Desktop logs for errors
## Performance Tips
1. **Adjust Update Interval**: Set `RSS_UPDATE_INTERVAL` to 60+ minutes for production
2. **Limit Embedding Generation**: Embeddings are only for articles from 2020+
3. **Use Pagination**: Always use `offset` and `limit` for large result sets
4. **Database Indexing**: PostgreSQL automatically indexes the vector column
## License
MIT
## Contributing
Contributions welcome! Please ensure:
- TypeScript compiles without errors (`npm run build`)
- Environment variables are documented
- New features include appropriate error handling