Skip to main content
Glama

FastMCP Document Analyzer

by Tathagat017
README.mdโ€ข13.9 kB
# ๐Ÿ” FastMCP Document Analyzer > A comprehensive document analysis server built with the modern FastMCP framework [![Python](https://img.shields.io/badge/Python-3.8+-blue.svg)](https://python.org) [![FastMCP](https://img.shields.io/badge/FastMCP-2.3+-green.svg)](https://gofastmcp.com) [![License](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) ## ๐Ÿ“‹ Table of Contents - [๐ŸŒŸ Features](#-features) - [๐Ÿš€ Quick Start](#-quick-start) - [๐Ÿ“ฆ Installation](#-installation) - [๐Ÿ”ง Usage](#-usage) - [๐Ÿ› ๏ธ Available Tools](#๏ธ-available-tools) - [๐Ÿ“Š Sample Data](#-sample-data) - [๐Ÿ—๏ธ Project Structure](#๏ธ-project-structure) - [๐Ÿ”„ API Reference](#-api-reference) - [๐Ÿงช Testing](#-testing) - [๐Ÿ“š Documentation](#-documentation) - [๐Ÿค Contributing](#-contributing) ## ๐ŸŒŸ Features ### ๐Ÿ“– **Document Analysis** - **๐ŸŽญ Sentiment Analysis**: VADER + TextBlob dual-engine sentiment classification - **๐Ÿ”‘ Keyword Extraction**: TF-IDF and frequency-based keyword identification - **๐Ÿ“š Readability Scoring**: Multiple metrics (Flesch, Flesch-Kincaid, ARI) - **๐Ÿ“Š Text Statistics**: Word count, sentences, paragraphs, and more ### ๐Ÿ—‚๏ธ **Document Management** - **๐Ÿ’พ Persistent Storage**: JSON-based document collection with metadata - **๐Ÿ” Smart Search**: TF-IDF semantic similarity search - **๐Ÿท๏ธ Tag System**: Category and tag-based organization - **๐Ÿ“ˆ Collection Insights**: Comprehensive statistics and analytics ### ๐Ÿš€ **FastMCP Advantages** - **โšก Simple Setup**: 90% less boilerplate than standard MCP - **๐Ÿ”’ Type Safety**: Full type validation with Pydantic - **๐ŸŽฏ Modern API**: Decorator-based tool definitions - **๐ŸŒ Multi-Transport**: STDIO, HTTP, and SSE support ## ๐Ÿš€ Quick Start ### 1. **Clone and Setup** ```bash git clone <repository-url> cd document-analyzer python -m venv venv source venv/Scripts/activate # Windows # source venv/bin/activate # macOS/Linux ``` ### 2. **Install Dependencies** ```bash pip install -r requirements.txt ``` ### 3. **Initialize NLTK Data** ```bash python -c "import nltk; nltk.download('punkt'); nltk.download('vader_lexicon'); nltk.download('stopwords'); nltk.download('punkt_tab')" ``` ### 4. **Run the Server** ```bash python fastmcp_document_analyzer.py ``` ### 5. **Test Everything** ```bash python test_fastmcp_analyzer.py ``` ## ๐Ÿ“ฆ Installation ### **System Requirements** - Python 3.8 or higher - 500MB free disk space - Internet connection (for initial NLTK data download) ### **Dependencies** ```txt fastmcp>=2.3.0 # Modern MCP framework textblob>=0.17.1 # Sentiment analysis nltk>=3.8.1 # Natural language processing textstat>=0.7.3 # Readability metrics scikit-learn>=1.3.0 # Machine learning utilities numpy>=1.24.0 # Numerical computing pandas>=2.0.0 # Data manipulation python-dateutil>=2.8.2 # Date handling ``` ### **Optional: Virtual Environment** ```bash # Create virtual environment python -m venv venv # Activate (Windows) venv\Scripts\activate # Activate (macOS/Linux) source venv/bin/activate # Install dependencies pip install -r requirements.txt ``` ## ๐Ÿ”ง Usage ### **Starting the Server** #### Default (STDIO Transport) ```bash python fastmcp_document_analyzer.py ``` #### HTTP Transport (for web services) ```bash python fastmcp_document_analyzer.py --transport http --port 9000 ``` #### With Custom Host ```bash python fastmcp_document_analyzer.py --transport http --host 0.0.0.0 --port 8080 ``` ### **Basic Usage Examples** ```python # Analyze a document result = analyze_document("doc_001") print(f"Sentiment: {result['sentiment_analysis']['overall_sentiment']}") # Extract keywords keywords = extract_keywords("Artificial intelligence is transforming healthcare", 5) print([kw['keyword'] for kw in keywords]) # Search documents results = search_documents("machine learning", 3) print(f"Found {len(results)} relevant documents") # Get collection statistics stats = get_collection_stats() print(f"Total documents: {stats['total_documents']}") ``` ## ๐Ÿ› ๏ธ Available Tools ### **Core Analysis Tools** | Tool | Description | Example | | ----------------------- | ----------------------------- | ------------------------------- | | `analyze_document` | ๐Ÿ” Complete document analysis | `analyze_document("doc_001")` | | `get_sentiment` | ๐Ÿ˜Š Sentiment analysis | `get_sentiment("I love this!")` | | `extract_keywords` | ๐Ÿ”‘ Keyword extraction | `extract_keywords(text, 10)` | | `calculate_readability` | ๐Ÿ“– Readability metrics | `calculate_readability(text)` | ### **Document Management Tools** | Tool | Description | Example | | ----------------- | --------------------- | ---------------------------------------- | | `add_document` | ๐Ÿ“ Add new document | `add_document("id", "title", "content")` | | `get_document` | ๐Ÿ“„ Retrieve document | `get_document("doc_001")` | | `delete_document` | ๐Ÿ—‘๏ธ Delete document | `delete_document("old_doc")` | | `list_documents` | ๐Ÿ“‹ List all documents | `list_documents("Technology")` | ### **Search and Discovery Tools** | Tool | Description | Example | | ---------------------- | ------------------------ | -------------------------------- | | `search_documents` | ๐Ÿ” Semantic search | `search_documents("AI", 5)` | | `search_by_tags` | ๐Ÿท๏ธ Tag-based search | `search_by_tags(["AI", "tech"])` | | `get_collection_stats` | ๐Ÿ“Š Collection statistics | `get_collection_stats()` | ## ๐Ÿ“Š Sample Data The server comes pre-loaded with **16 diverse documents** covering: | Category | Documents | Topics | | --------------- | --------- | ------------------------------------------------- | | **Technology** | 4 | AI, Quantum Computing, Privacy, Blockchain | | **Science** | 3 | Space Exploration, Healthcare, Ocean Conservation | | **Environment** | 2 | Climate Change, Sustainable Agriculture | | **Society** | 3 | Remote Work, Mental Health, Transportation | | **Business** | 2 | Economics, Digital Privacy | | **Culture** | 2 | Art History, Wellness | ### **Sample Document Structure** ```json { "id": "doc_001", "title": "The Future of Artificial Intelligence", "content": "Artificial intelligence is rapidly transforming...", "author": "Dr. Sarah Chen", "category": "Technology", "tags": ["AI", "technology", "future", "ethics"], "language": "en", "created_at": "2024-01-15T10:30:00" } ``` ## ๐Ÿ—๏ธ Project Structure ``` document-analyzer/ โ”œโ”€โ”€ ๐Ÿ“ analyzer/ # Core analysis engine โ”‚ โ”œโ”€โ”€ __init__.py โ”‚ โ””โ”€โ”€ document_analyzer.py # Sentiment, keywords, readability โ”œโ”€โ”€ ๐Ÿ“ storage/ # Document storage system โ”‚ โ”œโ”€โ”€ __init__.py โ”‚ โ””โ”€โ”€ document_storage.py # JSON storage, search, management โ”œโ”€โ”€ ๐Ÿ“ data/ # Sample data โ”‚ โ”œโ”€โ”€ __init__.py โ”‚ โ””โ”€โ”€ sample_documents.py # 16 sample documents โ”œโ”€โ”€ ๐Ÿ“„ fastmcp_document_analyzer.py # ๐ŸŒŸ Main FastMCP server โ”œโ”€โ”€ ๐Ÿ“„ test_fastmcp_analyzer.py # Comprehensive test suite โ”œโ”€โ”€ ๐Ÿ“„ requirements.txt # Python dependencies โ”œโ”€โ”€ ๐Ÿ“„ documents.json # Persistent document storage โ”œโ”€โ”€ ๐Ÿ“„ README.md # This documentation โ”œโ”€โ”€ ๐Ÿ“„ FASTMCP_COMPARISON.md # FastMCP vs Standard MCP โ”œโ”€โ”€ ๐Ÿ“„ .gitignore # Git ignore patterns โ””โ”€โ”€ ๐Ÿ“ venv/ # Virtual environment (optional) ``` ## ๐Ÿ”„ API Reference ### **Document Analysis** #### `analyze_document(document_id: str) -> Dict[str, Any]` Performs comprehensive analysis of a document. **Parameters:** - `document_id` (str): Unique document identifier **Returns:** ```json { "document_id": "doc_001", "title": "Document Title", "sentiment_analysis": { "overall_sentiment": "positive", "confidence": 0.85, "vader_scores": {...}, "textblob_scores": {...} }, "keywords": [ {"keyword": "artificial", "frequency": 5, "relevance_score": 2.3} ], "readability": { "flesch_reading_ease": 45.2, "reading_level": "Difficult", "grade_level": "Grade 12" }, "basic_statistics": { "word_count": 119, "sentence_count": 8, "paragraph_count": 1 } } ``` #### `get_sentiment(text: str) -> Dict[str, Any]` Analyzes sentiment of any text. **Parameters:** - `text` (str): Text to analyze **Returns:** ```json { "overall_sentiment": "positive", "confidence": 0.85, "vader_scores": { "compound": 0.7269, "positive": 0.294, "negative": 0.0, "neutral": 0.706 }, "textblob_scores": { "polarity": 0.5, "subjectivity": 0.6 } } ``` ### **Document Management** #### `add_document(...) -> Dict[str, str]` Adds a new document to the collection. **Parameters:** - `id` (str): Unique document ID - `title` (str): Document title - `content` (str): Document content - `author` (str, optional): Author name - `category` (str, optional): Document category - `tags` (List[str], optional): Tags list - `language` (str, optional): Language code **Returns:** ```json { "status": "success", "message": "Document 'my_doc' added successfully", "document_count": 17 } ``` ### **Search and Discovery** #### `search_documents(query: str, limit: int = 10) -> List[Dict[str, Any]]` Performs semantic search across documents. **Parameters:** - `query` (str): Search query - `limit` (int): Maximum results **Returns:** ```json [ { "id": "doc_001", "title": "AI Document", "similarity_score": 0.8542, "content_preview": "First 200 characters...", "tags": ["AI", "technology"] } ] ``` ## ๐Ÿงช Testing ### **Run All Tests** ```bash python test_fastmcp_analyzer.py ``` ### **Test Categories** - โœ… **Server Initialization**: FastMCP server setup - โœ… **Sentiment Analysis**: VADER and TextBlob integration - โœ… **Keyword Extraction**: TF-IDF and frequency analysis - โœ… **Readability Calculation**: Multiple readability metrics - โœ… **Document Analysis**: Full document processing - โœ… **Document Search**: Semantic similarity search - โœ… **Collection Statistics**: Analytics and insights - โœ… **Document Management**: CRUD operations - โœ… **Tag Search**: Tag-based filtering ### **Expected Test Output** ``` === Testing FastMCP Document Analyzer === โœ“ FastMCP server module imported successfully โœ“ Server initialized successfully โœ“ Sentiment analysis working โœ“ Keyword extraction working โœ“ Readability calculation working โœ“ Document analysis working โœ“ Document search working โœ“ Collection statistics working โœ“ Document listing working โœ“ Document addition and deletion working โœ“ Tag search working === All FastMCP tests completed successfully! === ``` ## ๐Ÿ“š Documentation ### **Additional Resources** - ๐Ÿ“– [FastMCP Documentation](https://gofastmcp.com) - ๐Ÿ“– [MCP Protocol Specification](https://modelcontextprotocol.io) - ๐Ÿ“– [FASTMCP_COMPARISON.md](FASTMCP_COMPARISON.md) - FastMCP vs Standard MCP ### **Key Concepts** #### **Sentiment Analysis** Uses dual-engine approach: - **VADER**: Rule-based, excellent for social media text - **TextBlob**: Machine learning-based, good for general text #### **Keyword Extraction** Combines multiple approaches: - **TF-IDF**: Term frequency-inverse document frequency - **Frequency Analysis**: Simple word frequency counting - **Relevance Scoring**: Weighted combination of both methods #### **Readability Metrics** Provides multiple readability scores: - **Flesch Reading Ease**: 0-100 scale (higher = easier) - **Flesch-Kincaid Grade**: US grade level - **ARI**: Automated Readability Index #### **Document Search** Uses TF-IDF vectorization with cosine similarity: - Converts documents to numerical vectors - Calculates similarity between query and documents - Returns ranked results with similarity scores ## ๐Ÿค Contributing ### **Development Setup** ```bash # Clone repository git clone <repository-url> cd document-analyzer # Create development environment python -m venv venv source venv/Scripts/activate # Windows pip install -r requirements.txt # Run tests python test_fastmcp_analyzer.py ``` ### **Adding New Tools** FastMCP makes it easy to add new tools: ```python @mcp.tool def my_new_tool(param: str) -> Dict[str, Any]: """ ๐Ÿ”ง Description of what this tool does. Args: param: Parameter description Returns: Return value description """ # Implementation here return {"result": "success"} ``` ### **Code Style** - Use type hints for all functions - Add comprehensive docstrings - Include error handling - Follow PEP 8 style guidelines - Add emoji icons for better readability ### **Testing New Features** 1. Add your tool to the main server file 2. Create test cases in the test file 3. Run the test suite to ensure everything works 4. Update documentation as needed ## ๐Ÿ“„ License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. ## ๐Ÿ™ Acknowledgments - **FastMCP Team** for the excellent framework - **NLTK Team** for natural language processing tools - **TextBlob Team** for sentiment analysis capabilities - **Scikit-learn Team** for machine learning utilities --- **Made with โค๏ธ using FastMCP** > ๐Ÿš€ Ready to analyze documents? Start with `python fastmcp_document_analyzer.py`

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Tathagat017/Document-Analyser-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server