A server for searching research papers, Kaggle datasets, and websites for ML/AI model training data

Search for:

A server for searching research papers, Kaggle datasets, and websites for ML/AI model training data

View all MCP Servers

Why this server?
This server provides direct integration with the Kaggle API, allowing you to search for ML/AI datasets, competitions, kernels, and pre-trained models specifically requested in your query.
Kaggle-MCP
Search Research & Data Developer Tools
realbytecode
A
license
-
quality
D
maintenance
Connects Claude AI to the Kaggle API through the Model Context Protocol, enabling users to browse competitions, search and download datasets, analyze kernels, and access pre-trained models through natural language interactions.
Last updated 2025-09-26
MIT
Why this server?
This server is designed to search the extensive arXiv research repository, providing direct access to academic papers, which is highly relevant for ML/AI models training research.
ArXiv MCP Server
Research & Data Search RAG Systems
blazickjp
A
license
A
quality
A
maintenance
The ArXiv MCP Server bridges the gap between AI models and academic research by providing a sophisticated interface to arXiv's extensive research repository. This server enables AI assistants to perform precise paper searches and access full paper content, enhancing their ability to engage with scientific literature.
Last updated 2026-07-26
14
2,985
Apache 2.0
Why this server?
A specialized server for academic papers, offering advanced features like semantic search and analysis of research from arXiv, directly fitting your need for research paper data.
arXiv MCP Server
Research & Data Search Knowledge & Memory
1Dark134
A
license
-
quality
C
maintenance
Enables AI-powered academic paper discovery, search, and analysis from arXiv with advanced features like semantic search, citation network analysis, and multi-format exports (BibTeX, RIS, JSON, CSV). Provides intelligent research assistance through specialized AI prompts for summarization, trend tracking, and literature review automation.
Last updated 2026-06-28
17
MIT
Why this server?
This provides access to search and retrieve ML models, datasets, and their metadata directly from the Hugging Face Hub, a crucial resource for ML/AI training.
Hugging Face Hub MCP Server
Search Databases RAG Systems
michaelwaves
F
license
A
quality
D
maintenance
Enables access to the Hugging Face Hub API to search and retrieve information about machine learning models, datasets, and their metadata. Provides comprehensive tools for exploring the Hugging Face ecosystem including model details, dataset information, and parquet file access.
Last updated 2025-08-24
8
Why this server?
Excellent for finding data on arbitrary 'websites' as it enables scraping and extraction from virtually any website globally, bypassing anti-bot systems to gather training data.
Thordata MCP Server
Web Scraping Browser Automation
xja1023789-collab
-
license
-
quality
-
maintenance
Enables AI models to scrape and extract data from any website globally using Thordata's 195+ country proxy network. Bypasses anti-bot systems and renders JavaScript content, outputting structured data in Markdown, HTML, or Links format.
Last updated 2025-09-23
Why this server?
Enables comprehensive web and local document crawling and data extraction, perfect for gathering large volumes of varied data for ML/AI model training context.
AnyCrawl MCP Server
Web Scraping Browser Automation
any4ai
A
license
-
quality
C
maintenance
Enables web scraping and crawling capabilities for LLM clients, supporting single-page scraping, multi-page website crawling, and web search with multiple engines (Playwright, Cheerio, Puppeteer) and flexible output formats including markdown, HTML, text, and screenshots.
Last updated 2026-03-19
5
6
MIT
Why this server?
This server combines web search, content extraction, web crawling, and scraping capabilities using the Firecrawl API, making it a robust tool for general data collection from websites.
WebSearch
Web Scraping Browser Automation Search
josemartinrodriguezmortaloni
F
license
C
quality
C
maintenance
Built as a Model Context Protocol (MCP) server that provides advanced web search, content extraction, web crawling, and scraping capabilities using the Firecrawl API.
Last updated 2026-06-17
4
1
Why this server?
Designed to ingest, index, and retrieve structured knowledge from diverse sources (including web, documents, GitHub), making it useful for building a knowledge base for ML/AI context.
Graphlit MCP Serverofficial
RAG Systems Knowledge & Memory
graphlit
A
license
-
quality
F
maintenance
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Last updated 2026-01-12
333
376
MIT
Why this server?
If your research papers are stored locally as PDFs, this tool allows for semantic search and retrieval within that document collection using vector embeddings.
PDF Knowledgebase MCP Server
RAG Systems Vector Databases Knowledge & Memory
juanqui
A
license
-
quality
D
maintenance
A Model Context Protocol server that enables intelligent document search and retrieval from PDF collections, providing semantic search capabilities powered by OpenAI embeddings and ChromaDB vector storage.
Last updated 2025-09-15
12
MIT

Graphlit MCP Serverofficial