๐ Amazon Q Web Documentation Reader
MCP Server for Intelligent Web Content Extraction
Features โข Installation โข Setup โข Usage โข Tools
โจ Features
๐ง Intelligent Navigation - Amazon Q (Claude 4.5) decides which documentation pages to visit
๐งน Clean Content Extraction - Removes navigation, ads, scripts, and other non-content elements
๐ Multiple Output Formats - Supports both Markdown and plain text output
๐ป Code Block Extraction - Specifically extracts code examples from documentation
๐ Page Structure Analysis - Extracts heading hierarchy and table of contents
๐ Link Discovery - Finds and filters documentation links
๐ Batch Processing - Read multiple documentation pages at once
๐ฏ How It Works
๐ฆ Installation
Prerequisites
Python 3.12 or higher
uv (recommended) or pip
Step 1: Clone the Repository
Step 2: Install Dependencies
Using uv (Recommended):
Using pip:
๐ง Setup with Amazon Q
Step 1: Locate Your MCP Configuration File
Amazon Q looks for MCP server configuration in:
Linux/WSL:
~/.aws/amazonq/mcp.jsonmacOS:
~/.aws/amazonq/mcp.jsonWindows:
%USERPROFILE%\.aws\amazonq\mcp.json
Step 2: Create/Edit the Configuration File
Create the directory if it doesn't exist:
Edit or create ~/.aws/amazonq/mcp.json:
For Linux/WSL:
For macOS:
For Windows:
๐ก Tip: Replace /full/path/to/ with the actual path where you cloned the repository.
Step 3: Verify Installation
Start Amazon Q CLI:
q chatCheck if MCP server is loaded:
/mcpYou should see:
doc_reader - read_web_documentation - get_documentation_links - get_page_structure - extract_code_examples - read_multiple_docsIf not loaded:
Check the file path in
mcp.jsonis correctRestart Amazon Q CLI
Check logs:
q chat logdump
๐ Usage
Basic Example
In Amazon Q CLI, simply ask about documentation:
Amazon Q will:
โ Read the main documentation page
โ Extract all available links
โ Intelligently identify the "Routes" link
โ Navigate to the Routes documentation
โ Provide you with accurate information
More Examples
Python Documentation:
FastAPI Tutorial:
AWS Lambda:
๐ Available Tools
Amazon Q intelligently chains these tools to navigate documentation:
1. read_web_documentation
Fetches and extracts clean documentation content from a web page.
Parameters:
url(required): The URL of the documentation pageoutput_format(optional):"markdown"(default) or"text"
Returns: Extracted documentation content with title and metadata
2. get_documentation_links
Extracts all links from a documentation page with optional filtering.
Parameters:
url(required): The URL of the documentation pagefilter_pattern(optional): Pattern to filter links (e.g.,"api","guide")
Returns: List of links found on the page
3. get_page_structure
Extracts the heading structure and table of contents from a documentation page.
Parameters:
url(required): The URL of the documentation page
Returns: Hierarchical structure of headings on the page
4. extract_code_examples
Extracts all code blocks from a documentation page.
Parameters:
url(required): The URL of the documentation page
Returns: All code blocks found with their detected languages
5. read_multiple_docs
Reads multiple documentation pages and combines their content.
Parameters:
urls(required): List of documentation URLs (max 10)
Returns: Combined content from all pages
๐ Project Structure
โ๏ธ Configuration
Edit src/config.py to customize behavior:
Setting | Default | Description |
| 30.0s | Request timeout in seconds |
| 10MB | Maximum content size in bytes |
| Custom | HTTP User-Agent string |
| Various | HTML tags to remove during extraction |
| Various | Selectors for finding main content |
๐ Troubleshooting
MCP Server Not Loading
Check configuration:
Verify paths are correct:
Use absolute paths, not relative
Check that Python executable exists
Check that main.py exists
Test server manually:
Check Amazon Q logs:
Server Starts But Tools Don't Work
Verify dependencies are installed:
Reinstall dependencies:
Connection Timeout
Increase timeout in settings:
๐ Dependencies
Package | Purpose |
Async HTTP client for fetching web pages | |
HTML parsing and navigation | |
Fast XML/HTML parser | |
HTML to Markdown conversion | |
Model Context Protocol SDK |
โ ๏ธ Limitations
Limit | Value |
Maximum content size | 10MB per page |
Maximum URLs per batch | 10 |
Request timeout | 30 seconds |
Content type | HTML only |
๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Fork the repository
Create your feature branch (
git checkout -b feature/AmazingFeature)Commit your changes (
git commit -m 'Add some AmazingFeature')Push to the branch (
git push origin feature/AmazingFeature)Open a Pull Request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ฌ Support
๐ซ Open an Issue for bug reports or feature requests
โญ Star this repo if you find it useful!