Provides tools for searching academic papers, finding author publications, discovering recent research, and identifying highly cited works through Google Scholar's database.
🔬 Google Scholar MCP Server
A Model Context Protocol (MCP) server that provides access to Google Scholar for academic research through web scraping. This server enables you to search for papers, find author publications, discover recent research, and identify highly cited works.
✨ Features
🔍 Paper Search: Search Google Scholar for academic papers with flexible filtering
👨🔬 Author Research: Find papers by specific authors
📅 Recent Papers: Discover recent publications in any field
🏆 Highly Cited Papers: Find influential papers with citation filtering
⏱️ Rate Limiting: Respectful scraping with built-in delays
🛡️ Error Handling: Robust error handling and logging
🌐 Local Web Interface: Optional Flask web interface for testing
🧠 Smart Query Processing: Natural language query processing with AI integration
🚀 Quick Start
Prerequisites
Python 3.8 or higher
pip (Python package manager)
Installation
Clone the repository
git clone https://github.com/yourusername/google-scholar-mcp.git cd google-scholar-mcpInstall dependencies
pip install -r requirements.txtOptional: Set up environment variables
cp env.example .env # Edit .env with your preferred settings
Running the MCP Server
Run the MCP server for use with MCP clients:
Testing with Local Web Interface
For testing and development, you can run the local web interface:
Then open your browser to http://localhost:5000
🔧 Configuration
The server can be configured through environment variables. Copy env.example to .env and modify as needed:
Available Tools
1. search_papers
Search for academic papers on Google Scholar.
Parameters:
query(required): Search query for papersnum_results(optional): Number of results to return (1-20, default: 10)start_year(optional): Earliest publication year to includeend_year(optional): Latest publication year to include
Example:
2. get_author_papers
Search for papers by a specific author.
Parameters:
author_name(required): Name of the author to search fornum_results(optional): Number of results to return (default: 10)
Example:
3. search_recent_papers
Search for recent papers in a specific field.
Parameters:
field(required): Research field or topicyears_back(optional): How many years back to search (1-10, default: 2)num_results(optional): Number of results to return (default: 10)
Example:
4. get_highly_cited_papers
Search for highly cited papers in a topic.
Parameters:
topic(required): Research topic or fieldmin_citations(optional): Minimum number of citations (default: 100)num_results(optional): Number of results to return (default: 10)
Example:
Response Format
Each tool returns a JSON response with paper information including:
title: Paper titleauthors: Author namesurl: Link to the paperyear: Publication yearsnippet: Paper abstract/description snippetcited_by: Number of citations (when available)pdf_url: Direct PDF link (when available)publication_info: Journal/conference information
Rate Limiting and Ethics
This server implements respectful scraping practices:
2-second delays between requests
Proper User-Agent headers
Error handling for rate limits
Designed for research and educational purposes
🔍 MCP Client Integration
Claude Desktop
Add this to your Claude Desktop configuration (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
Other MCP Clients
The server follows the standard MCP protocol and should work with any MCP-compatible client.
🧠 Smart Query Processing
The server includes intelligent query processing that can understand natural language requests:
📊 Response Format
All tools return structured JSON with paper information:
⚖️ Legal and Ethical Considerations
🎓 Educational Use: This tool is intended for research and educational purposes
📜 Terms of Service: Respect Google Scholar's terms of service
🤝 Responsible Use: Use responsibly and avoid excessive requests
🔌 Official APIs: Consider using official APIs when available
📚 Copyright: Be mindful of copyright and fair use policies
🔧 Troubleshooting
Common Issues
Rate Limiting: If you get blocked, wait and reduce request frequency
Network Errors: Check your internet connection
Parsing Errors: Google Scholar may change their HTML structure
Import Errors: Make sure all dependencies are installed
Debug Mode
Enable debug logging by setting DEBUG=true in your .env file.
Logging
The server includes detailed logging. Check the console output for error messages and debugging information.
📦 Dependencies
mcp: Model Context Protocol libraryrequests: HTTP library for web scrapingbeautifulsoup4: HTML parsinglxml: XML/HTML parserurllib3: HTTP clientflask: Web interface (optional)
🤝 Contributing
Contributions are welcome! Please ensure:
Respectful scraping practices
Error handling for edge cases
Clear documentation
Testing with various queries
Follow the existing code style
Development Setup
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
Built on the Model Context Protocol by Anthropic
Inspired by the need for accessible academic research tools
Thanks to the open-source community for the excellent libraries used
⚠️ Disclaimer
This tool is for educational and research purposes. Please respect Google Scholar's terms of service and use responsibly. The authors are not responsible for any misuse of this tool.