LitSynth MCP Server

README.md•8.06 kB

# LitSynth MCP Server <p align="center"> <picture style="display:block; margin: 0 auto; width: 400px;"> <source srcset="public/logo-dark.png" media="(prefers-color-scheme: dark)"> <img src="public/logo-light.png" alt="AI Research Assistant Logo" style="display:block; margin: 0 auto"> </picture> </p> A Model Context Protocol (MCP) server for intelligent academic paper discovery and semantic search using ArXiv. This server provides tools for searching academic papers and performing semantic similarity analysis using state-of-the-art sentence transformers. ## Features - **ArXiv Search**: Query ArXiv database with automatic URL encoding for complex search terms - **Semantic Search**: Find papers most relevant to your research using AI-powered semantic similarity - **Robust Error Handling**: Graceful handling of network issues and malformed data - **Flexible Input**: Support for various query formats including spaces and special characters ## Tools Available ### 1. `greet(name: str)` Simple greeting function for testing server connectivity. **Parameters:** - `name`: String - Name to greet **Returns:** Greeting message ### 2. `search_query_arxiv(query: str, max_results: int = 5)` Search ArXiv database for academic papers matching your query. **Parameters:** - `query`: String - Search terms (automatically URL encoded) - `max_results`: Integer - Maximum number of results to return (default: 5) **Returns:** Structured response with papers including: - Title - Authors - Summary/Abstract - ArXiv link - Status message **Example:** ```python search_query_arxiv("multimodal agents", 3) ``` ### 3. `search_semantic_arxiv(query: str, papers: list, top_k: int = 5)` Perform semantic search on a list of papers to find the most relevant ones. **Parameters:** - `query`: String - Research query for semantic matching - `papers`: List - Papers to search through (from `search_query_arxiv` or manual list) - `top_k`: Integer - Number of most relevant papers to return (default: 5) **Returns:** Ranked papers with similarity scores including: - Title - Summary - Authors - ArXiv link - Similarity score (0-1) **Example:** ```python papers = search_query_arxiv("machine learning") relevant = search_semantic_arxiv("deep reinforcement learning", papers, 3) ``` ### 4. `search_hf_datasets(query: str, limit: int = 5)` Perform a research on Hugging Face to find the most relevant datasets on a topics **Parameters:** - `query`: String - Research query for matching datasets - `limit`: Integer - Number of the limit of datasets showed **Returns:** Ranked datasets info - Id - Description - Url ```python search_hf_datasets("biology", 5) ``` ### 5. `get_dataset_details(dataset_id: str)` Get detailed information about a specific Hugging Face dataset. It provides comprehensive analysis including structure, usage examples, and research applications. **Parameters:** - `dataset_id`: String - The dataset ID from Hugging Face (e.g., "microsoft/COCO") **Returns:** Detailed dataset information including metadata, structure, and usage examples - Dataset Id - Url - Basic Info - Structure - Files - Usage Information - Research Applications ### 6. `explore_dataset_files(dataset_id: str)` Explore the structure of a Hugging Face dataset without downloading it entirely. Shows available files/splits and provides a sample preview of the data. **Parameters:** - `dataset_id`: String - The dataset ID from Hugging Face (e.g., "microsoft/COCO") **Returns:** Dataset structure information including files, splits, and sample data - Dataset Id - Url - Structure - Available Files - Sample Preview - Summary ### 7. `explore_dataset_structure(dataset_id: str, split_name: str = None)` Explore the structure of a Hugging Face dataset with detailed split information. **Parameters:** - `dataset_id`: String - The dataset ID from Hugging Face (e.g., "microsoft/COCO") - `split_name`: Optional specific split to explore (e.g., "train", "test", "validation") **Returns:** Dataset structure overview with splits, file types, and sample data - Dataset Id - Url - Overview - Available Splits - File Types - Sample Data - Description - Next Steps ## Installation ### Prerequisites - Python 3.8+ - pip ### Setup 1. **Clone or download the project files** ```bash git clone https://github.com/RayaneChCh-dev/LitSynth-MCP-Server.git ``` 2. **Install dependencies:** ```bash pip install -r requirements.txt ``` 1. **Run the MCP server:** ```bash fastmcp run my_server.py:mcp --transport http --port <your_port> ``` ## Dependencies The project requires the following packages (see `requirements.txt`): - `fastmcp>=0.1.0` - MCP framework - `feedparser>=6.0.10` - RSS/Atom feed parsing for ArXiv API - `requests>=2.31.0` - HTTP requests - `sentence-transformers>=2.2.2` - Semantic search and embeddings - `torch>=2.0.0` - PyTorch for neural networks - `transformers>=4.21.0` - Hugging Face transformers - `numpy>=1.21.0` - Numerical computing ## Project Structure ```bash ai-research-assistant/ ├── my_server.py # Main MCP server implementation ├── requirements.txt # Python dependencies └── README.md # This file ``` ## Usage Examples ### Basic ArXiv Search Search for papers on a specific topic: ```python # Search for papers about transformers results = search_query_arxiv("attention mechanisms transformers", 5) ``` ### Semantic Paper Discovery Find the most relevant papers from a search result: ```python # First, get papers on a broad topic papers = search_query_arxiv("artificial intelligence", 20) # Then find the most relevant ones for your specific research relevant_papers = search_semantic_arxiv("graph neural networks", papers, 5) ``` ### Handling Complex Queries The server automatically handles special characters and spaces: ```python # These work automatically without manual encoding search_query_arxiv("machine learning & deep learning: survey") search_query_arxiv("reinforcement learning (RL) applications") ``` ## Technical Details ### Semantic Search Model The server uses the `sentence-transformers/all-MiniLM-L6-v2` model for semantic embeddings. This model: - Provides 384-dimensional sentence embeddings - Balances speed and accuracy - Works well for academic text similarity ### Error Handling The server includes comprehensive error handling: - **URL Encoding**: Automatic handling of spaces and special characters - **Network Errors**: Graceful degradation when ArXiv is unavailable - **Data Validation**: Safe handling of missing or malformed paper data - **Empty Results**: Informative messages when no papers are found ### Response Format All functions return structured responses: ```json { "message": "Status or info message", "results": [ { "title": "Paper Title", "author": ["Author 1", "Author 2"], "summary": "Abstract text...", "link": "https://arxiv.org/abs/...", "similarity_score": 0.85 // Only in semantic search } ] } ``` ## Troubleshooting ### Common Issues **"URL can't contain control characters" error:** - This is fixed in the current version with automatic URL encoding - Make sure you're using the latest version of the server **"No papers found" result:** - Check your query spelling - Try broader search terms - Verify ArXiv service availability **Slow semantic search:** - First run downloads the transformer model (~90MB) - Subsequent runs are much faster - Consider reducing `top_k` for faster results **Memory issues:** - The sentence transformer model requires ~500MB RAM - Reduce batch sizes if experiencing memory problems ## Contributing Feel free to submit issues, feature requests, or pull requests to improve the AI Research Assistant. ## License This project is open source. Please check individual dependency licenses for commercial use. ## Acknowledgments - **ArXiv** for providing free access to academic papers - **Sentence Transformers** for semantic search capabilities - **FastMCP** for the MCP server framework

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/RayaneChCh-dev/LitSynth-MCP-Server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server