Utilizes OpenAI GPT-4 to generate AI-powered summaries of blog posts scraped from V2.ai Insights
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@V2.ai Insights Scraper MCPsearch for recent posts about AI automation"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
V2.ai Insights Scraper MCP
A Model Context Protocol (MCP) server that scrapes blog posts from V2.ai Insights, extracts content, and provides AI-powered summaries using OpenAI's GPT-4. Currently supports Contentful CMS integration with search capabilities.
π Strategic Vision: This project is evolving into a comprehensive AI intelligence platform. See STRATEGIC_VISION.md for the complete roadmap from content API to strategic intelligence platform.
Features
π Multi-Source Content: Fetches from Contentful CMS and V2.ai web scraping
π Content Extraction: Extracts title, date, author, and content with intelligent fallbacks
π Full-Text Search: Search across all blog content with Contentful's search API
π€ AI Summarization: Generates summaries using OpenAI GPT-4
π§ MCP Integration: Exposes tools for Claude Desktop integration
Related MCP server: MCP-Twikit
Tools Available
get_latest_posts()- Retrieves blog posts with metadata (Contentful + V2.ai fallback)get_contentful_posts(limit)- Fetch posts directly from Contentful CMSsearch_blogs(query, limit)- NEW - Search across all blog contentsummarize_post(index)- Returns AI-generated summary of a specific postget_post_content(index)- Returns full content of a specific post
Setup
Prerequisites
Python 3.12+
uv package manager
OpenAI API key
Contentful CMS credentials (optional, for enhanced functionality)
Installation
Clone and navigate to project:
cd v2-ai-mcpInstall dependencies:
uv add fastmcp beautifulsoup4 requests openaiSet up environment variables:
Create a
.envfile based on.env.example:cp .env.example .envEdit
.envwith your credentials:# Required OPENAI_API_KEY=your-openai-api-key-here # Optional (for Contentful integration) CONTENTFUL_SPACE_ID=your-contentful-space-id CONTENTFUL_ACCESS_TOKEN=your-contentful-access-token CONTENTFUL_CONTENT_TYPE=pageBlogPost
Running the Server
uv run python -m src.v2_ai_mcp.mainThe server will start and be available for MCP connections.
Testing the Scraper
Test individual components:
# Test scraper
uv run python -c "from src.v2_ai_mcp.scraper import fetch_blog_posts; print(fetch_blog_posts()[0]['title'])"
# Test with summarizer (requires OpenAI API key)
uv run python -c "from src.v2_ai_mcp.scraper import fetch_blog_posts; from src.v2_ai_mcp.summarizer import summarize; post = fetch_blog_posts()[0]; print(summarize(post['content'][:1000]))"
# Run unit tests
uv run pytest tests/ -v --cov=srcClaude Desktop Integration
Configuration
Install Claude Desktop (if not already installed)
Configure MCP in Claude Desktop:
Add to your Claude Desktop MCP configuration:
{ "mcpServers": { "v2-insights-scraper": { "command": "/path/to/uv", "args": ["run", "--directory", "/path/to/your/v2-ai-mcp", "python", "-m", "src.v2_ai_mcp.main"], "env": { "OPENAI_API_KEY": "your-api-key-here", "CONTENTFUL_SPACE_ID": "your-contentful-space-id", "CONTENTFUL_ACCESS_TOKEN": "your-contentful-access-token", "CONTENTFUL_CONTENT_TYPE": "pageBlogPost" } } } }Restart Claude Desktop to load the MCP server
Using the Tools
Once configured, you can use these tools in Claude Desktop:
Get latest posts:
get_latest_posts()(intelligent Contentful + V2.ai fallback)Get Contentful posts:
get_contentful_posts(10)(direct CMS access)Search blogs:
search_blogs("AI automation", 5)(NEW - full-text search)Summarize post:
summarize_post(0)(index 0 for first post)Get full content:
get_post_content(0)
Example Usage
π Search for AI-related content:
search_blogs("artificial intelligence", 3)
π Get latest posts with automatic source selection:
get_latest_posts()
π€ Get AI summary of specific post:
summarize_post(0)Project Structure
v2-ai-mcp/
βββ src/
β βββ v2_ai_mcp/
β βββ __init__.py # Package initialization
β βββ main.py # FastMCP server with tool definitions
β βββ scraper.py # Web scraping logic
β βββ summarizer.py # OpenAI GPT-4 integration
βββ tests/
β βββ __init__.py # Test package initialization
β βββ test_scraper.py # Unit tests for scraper
β βββ test_summarizer.py # Unit tests for summarizer
βββ .github/
β βββ workflows/
β βββ ci.yml # GitHub Actions CI/CD pipeline
βββ pyproject.toml # Project dependencies and config
βββ .env.example # Environment variables template
βββ .gitignore # Git ignore patterns
βββ README.md # This fileCurrent Implementation
The scraper currently targets this specific blog post:
URL:
https://www.v2.ai/insights/adopting-AI-assistants-while-balancing-risks
Extracted Data
Title: "Adopting AI Assistants while Balancing Risks"
Author: "Ashley Rodan"
Date: "July 3, 2025"
Content: ~12,785 characters of main content
Development
Adding More Blog Posts
To scrape multiple posts or different URLs, modify the fetch_blog_posts() function in scraper.py:
def fetch_blog_posts() -> list:
urls = [
"https://www.v2.ai/insights/post1",
"https://www.v2.ai/insights/post2",
# Add more URLs
]
return [fetch_blog_post(url) for url in urls]Improving Content Extraction
The scraper uses multiple fallback strategies for extracting content. You can enhance it by:
Inspecting V2.ai's HTML structure
Adding more specific CSS selectors
Improving date/author extraction patterns
Troubleshooting
Common Issues
OpenAI API Key Error: Ensure your API key is set in environment variables
Import Errors: Run
uv syncto ensure all dependencies are installedScraping Issues: Check if the target URL is accessible and the HTML structure hasn't changed
Testing Components
# Test scraper only
uv run python -c "from src.v2_ai_mcp.scraper import fetch_blog_posts; posts = fetch_blog_posts(); print(f'Found {len(posts)} posts')"
# Run full test suite
uv run pytest tests/ -v --cov=src
# Test MCP server startup
uv run python -m src.v2_ai_mcp.mainDevelopment
Running Tests
# Run all tests
uv run pytest
# Run with coverage
uv run pytest --cov=src --cov-report=html
# Run specific test file
uv run pytest tests/test_scraper.py -vCode Quality
# Format code
uv run ruff format src tests
# Lint code
uv run ruff check src tests
# Fix auto-fixable issues
uv run ruff check --fix src testsLicense
This project is for educational and development purposes.
This server cannot be installed
Resources
Looking for Admin?
Admins can modify the Dockerfile, update the server description, and track usage metrics. If you are the server author, to access the admin panel.