Utilizes OpenAI GPT-4 to generate AI-powered summaries of blog posts scraped from V2.ai Insights
V2.ai Insights Scraper MCP
A Model Context Protocol (MCP) server that scrapes blog posts from V2.ai Insights, extracts content, and provides AI-powered summaries using OpenAI's GPT-4. Currently supports Contentful CMS integration with search capabilities.
š Strategic Vision: This project is evolving into a comprehensive AI intelligence platform. See STRATEGIC_VISION.md for the complete roadmap from content API to strategic intelligence platform.
Features
š Multi-Source Content: Fetches from Contentful CMS and V2.ai web scraping
š Content Extraction: Extracts title, date, author, and content with intelligent fallbacks
š Full-Text Search: Search across all blog content with Contentful's search API
š¤ AI Summarization: Generates summaries using OpenAI GPT-4
š§ MCP Integration: Exposes tools for Claude Desktop integration
Related MCP server: Prysm MCP Server
Tools Available
get_latest_posts()- Retrieves blog posts with metadata (Contentful + V2.ai fallback)get_contentful_posts(limit)- Fetch posts directly from Contentful CMSsearch_blogs(query, limit)- NEW - Search across all blog contentsummarize_post(index)- Returns AI-generated summary of a specific postget_post_content(index)- Returns full content of a specific post
Setup
Prerequisites
Python 3.12+
uv package manager
OpenAI API key
Contentful CMS credentials (optional, for enhanced functionality)
Installation
Clone and navigate to project:
cd v2-ai-mcpInstall dependencies:
uv add fastmcp beautifulsoup4 requests openaiSet up environment variables:
Create a
.envfile based on.env.example:cp .env.example .envEdit
.envwith your credentials:# Required OPENAI_API_KEY=your-openai-api-key-here # Optional (for Contentful integration) CONTENTFUL_SPACE_ID=your-contentful-space-id CONTENTFUL_ACCESS_TOKEN=your-contentful-access-token CONTENTFUL_CONTENT_TYPE=pageBlogPost
Running the Server
The server will start and be available for MCP connections.
Testing the Scraper
Test individual components:
Claude Desktop Integration
Configuration
Install Claude Desktop (if not already installed)
Configure MCP in Claude Desktop:
Add to your Claude Desktop MCP configuration:
{ "mcpServers": { "v2-insights-scraper": { "command": "/path/to/uv", "args": ["run", "--directory", "/path/to/your/v2-ai-mcp", "python", "-m", "src.v2_ai_mcp.main"], "env": { "OPENAI_API_KEY": "your-api-key-here", "CONTENTFUL_SPACE_ID": "your-contentful-space-id", "CONTENTFUL_ACCESS_TOKEN": "your-contentful-access-token", "CONTENTFUL_CONTENT_TYPE": "pageBlogPost" } } } }Restart Claude Desktop to load the MCP server
Using the Tools
Once configured, you can use these tools in Claude Desktop:
Get latest posts:
get_latest_posts()(intelligent Contentful + V2.ai fallback)Get Contentful posts:
get_contentful_posts(10)(direct CMS access)Search blogs:
search_blogs("AI automation", 5)(NEW - full-text search)Summarize post:
summarize_post(0)(index 0 for first post)Get full content:
get_post_content(0)
Example Usage
Project Structure
Current Implementation
The scraper currently targets this specific blog post:
URL:
https://www.v2.ai/insights/adopting-AI-assistants-while-balancing-risks
Extracted Data
Title: "Adopting AI Assistants while Balancing Risks"
Author: "Ashley Rodan"
Date: "July 3, 2025"
Content: ~12,785 characters of main content
Development
Adding More Blog Posts
To scrape multiple posts or different URLs, modify the fetch_blog_posts() function in scraper.py:
Improving Content Extraction
The scraper uses multiple fallback strategies for extracting content. You can enhance it by:
Inspecting V2.ai's HTML structure
Adding more specific CSS selectors
Improving date/author extraction patterns
Troubleshooting
Common Issues
OpenAI API Key Error: Ensure your API key is set in environment variables
Import Errors: Run
uv syncto ensure all dependencies are installedScraping Issues: Check if the target URL is accessible and the HTML structure hasn't changed
Testing Components
Development
Running Tests
Code Quality
License
This project is for educational and development purposes.