web_content_qna
Extract and analyze web page content to answer specific questions using RAG (Retrieval Augmented Generation). Provide AI-generated responses based on relevant page sections for accurate insights.
Instructions
Answer questions about web page content using RAG.
This tool combines web scraping with RAG (Retrieval Augmented Generation) to answer specific questions about web page content. It extracts relevant content sections and uses AI to provide accurate answers.
Args: url: The web page URL to analyze question: The question to answer based on the page content
Returns: str: AI-generated answer based on the web page content
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| question | Yes | ||
| url | Yes |
Implementation Reference
- web_analyzer_mcp/server.py:38-55 (handler)The tool handler 'web_content_qna' registered with @mcp.tool(). Defines input schema via type hints and docstring, executes by calling RAGProcessor.process_web_qna.@mcp.tool() def web_content_qna(url: str, question: str) -> str: """ Answer questions about web page content using RAG. This tool combines web scraping with RAG (Retrieval Augmented Generation) to answer specific questions about web page content. It extracts relevant content sections and uses AI to provide accurate answers. Args: url: The web page URL to analyze question: The question to answer based on the page content Returns: str: AI-generated answer based on the web page content """ return rag_processor.process_web_qna(url, question)
- Core RAG processing logic for web_content_qna: extracts markdown content, chunks it, retrieves relevant sections using keyword scoring, generates answer with OpenAI.def process_web_qna(self, url: str, question: str) -> str: """ Process a URL and answer a question about its content. This is the main RAG function that combines web extraction and QA. Args: url: The URL to analyze question: The question to answer based on the URL content Returns: str: The answer to the question based on the web content """ try: # Extract content from URL markdown_content = url_to_markdown(url) if markdown_content.startswith("Error"): return f"Could not process the URL: {markdown_content}" # Chunk the content chunks = self.chunk_content(markdown_content) if not chunks: return "No content could be extracted from the URL to answer your question." # Select relevant chunks relevant_chunks = self.select_relevant_chunks(question, chunks) if not relevant_chunks: return f"The content from {url} doesn't seem to contain information relevant to your question: '{question}'" # Generate answer answer = self.generate_answer(question, relevant_chunks) return answer except Exception as e: return f"Error processing question about {url}: {str(e)}"
- Web content extraction helper used by RAG processor: scrapes URL with Selenium, parses and cleans HTML, ranks content importance, extracts special elements (tables/images), converts to markdown.def url_to_markdown(url: str) -> str: """ Convert a URL to markdown format using advanced content extraction. This is the main function that replaces the original build_output function. It extracts HTML, analyzes content importance, and converts to markdown. Args: url: The URL to analyze and convert Returns: str: Markdown formatted content """ try: # Ensure valid URL clean_url = ensure_url_scheme(url) # Extract HTML content html_content = extract_html_content(clean_url) # Parse HTML soup = BeautifulSoup(html_content, 'html.parser') # Extract special elements before cleaning special_elements = parse_special_elements(soup) # Clean HTML content cleaned_soup = clean_html_content(soup) # Rank content by importance main_content = rank_content_by_importance(cleaned_soup) # Convert to markdown markdown_result = convert_to_markdown(special_elements, main_content) return markdown_result except Exception as e: return f"Error processing URL {url}: {str(e)}"
- web_analyzer_mcp/server.py:38-38 (registration)The @mcp.tool() decorator registers the web_content_qna function as an MCP tool.@mcp.tool()
- web_analyzer_mcp/server.py:40-53 (schema)Docstring provides input/output schema description for the tool.""" Answer questions about web page content using RAG. This tool combines web scraping with RAG (Retrieval Augmented Generation) to answer specific questions about web page content. It extracts relevant content sections and uses AI to provide accurate answers. Args: url: The web page URL to analyze question: The question to answer based on the page content Returns: str: AI-generated answer based on the web page content """