Skip to main content
Glama
melisasvr

Deep Research Agent MCP Server

by melisasvr

๐Ÿ” Deep Research Agent MCP Server

An AI-powered deep research agent built with Python, FastMCP, and Streamlit.
Search โ†’ Fetch โ†’ Cluster โ†’ Report. Fully automated. Fully open source.

Python FastMCP Streamlit License Status


๐Ÿ“– Overview

Deep Research Agent is a Python MCP server that automates the full research pipeline from web search to a structured executive report in under 10 seconds. It uses a 4-tool pipeline, powered by the Tavily Search API for retrieval and a pure-Python TF-IDF + K-Means engine for semantic clustering. The frontend is a sleek Streamlit chat interface that shows every step live.

๐Ÿ’ก Built as a portfolio project demonstrating: FastMCP tool design, async web scraping, NLP clustering without heavy ML dependencies, and full-stack Python app architecture.


โœจ Features

  • ๐Ÿ”Ž Multi-angle web search: 3 query variations per topic for broader coverage

  • ๐ŸŒ Async URL fetching: parallel page retrieval with smart HTML cleaning

  • ๐Ÿงน Text denoising: strips HTML entities, SVG labels, nav boilerplate, repeated patterns

  • ๐Ÿง  Semantic clustering: pure-Python TF-IDF + K-Means, zero ML framework required

  • ๐Ÿท๏ธ Auto cluster labeling: 13 topic categories (Quantum, Biotech, Climate, AI, Policy...)

  • ๐Ÿ“„ Structured reports: markdown or JSON output with sources, keywords, evidence

  • โฌ‡๏ธ One-click download: export .md report directly from the UI

  • โšก Fast: full pipeline typically completes in 6โ€“12 seconds


๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              Streamlit Frontend                 โ”‚
โ”‚              app.py  โ€”  Chat UI                 โ”‚
โ”‚         Live step cards ยท Download button       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                    โ”‚  FastMCP Client (protocol-aware)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚           FastMCP Server  (server.py)           โ”‚
โ”‚                                                 โ”‚
โ”‚  ๐Ÿ”Ž search_web        ๐ŸŒ fetch_and_chunk        โ”‚
โ”‚     Tavily API           httpx + HTML parser    โ”‚
โ”‚                                                 โ”‚
โ”‚  ๐Ÿง  cluster_findings  ๐Ÿ“„ generate_report        โ”‚
โ”‚     TF-IDF + K-Means     Markdown / JSON        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ› ๏ธ Tech Stack

Layer

Technology

๐Ÿ Language

Python 3.12+

๐Ÿ”Œ MCP Framework

FastMCP 3.x

๐Ÿ–ฅ๏ธ Frontend

Streamlit

๐Ÿ”Ž Search

Tavily Search API

๐ŸŒ HTTP Client

httpx (async)

๐Ÿง  Clustering

Pure-Python TF-IDF + K-Means

โš™๏ธ Config

python-dotenv


๐Ÿš€ Quick Start

1๏ธโƒฃ Clone the repository

git clone https://github.com/yourusername/deep-research-agent.git
cd deep-research-agent

2๏ธโƒฃ Install dependencies

pip install -r requirements.txt

3๏ธโƒฃ Configure environment

cp .env.example .env

Edit .env and add your Tavily API key:

TAVILY_API_KEY=tvly-your-key-here
MCP_HOST=localhost
MCP_PORT=8000
TAVILY_SEARCH_DEPTH=advanced

๐Ÿ”‘ Get a free Tavily API key at app.tavily.com

4๏ธโƒฃ Start the MCP server

# Terminal 1
python server.py

You should see:

๐Ÿ” Deep Research Agent MCP Server
   Host : localhost
   Port : 8000
   Tools: search_web, fetch_and_chunk, cluster_findings, generate_report
   Tavily key: โœ“ set

5๏ธโƒฃ Launch the Streamlit UI

# Terminal 2
streamlit run app.py

Open http://localhost:8501 and start researching! ๐ŸŽ‰


๐Ÿ”ง MCP Tools Reference

๐Ÿ”Ž search_web(query, max_results)

  • Searches the web via Tavily API and returns ranked results with scores.

search_web(
    query="AI energy crisis 2026",
    max_results=8,              # 1โ€“15
    include_domains=None,       # e.g. [".edu", ".gov"]
    exclude_domains=None
)

๐ŸŒ fetch_and_chunk(urls, chunk_size)

  • Fetches pages asynchronously and splits content into overlapping text chunks.

fetch_and_chunk(
    urls=["https://example.com/article"],
    chunk_size=400,             # words per chunk
    chunk_overlap=50,           # overlap between chunks
    max_chunks_per_url=6
)

๐Ÿง  cluster_findings(chunks, n_clusters)

  • Groups chunks into semantic themes using TF-IDF vectorization + K-Means clustering.

cluster_findings(
    chunks=[...],               # from fetch_and_chunk
    n_clusters=4,               # 2โ€“6 themes
    top_terms_per_cluster=8
)

๐Ÿ“„ generate_report(topic, clusters)

  • Synthesizes all clusters into a structured research report.

generate_report(
    topic="AI energy crisis 2026",
    clusters=[...],             # from cluster_findings
    format="markdown",          # or "json"
    include_sources=True
)

๐Ÿ”„ Research Pipeline

๐Ÿ“ User enters topic
        โ”‚
        โ–ผ
๐Ÿ”Ž search_web() ร— 3 queries โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ 24 ranked sources
        โ”‚
        โ–ผ
๐ŸŒ fetch_and_chunk() on top 5 URLs โ”€โ”€โ”€โ–บ 20โ€“30 text chunks
        โ”‚
        โ–ผ
๐Ÿง  cluster_findings() โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ 4 semantic themes
        โ”‚
        โ–ผ
๐Ÿ“„ generate_report() โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Structured .md report
        โ”‚
        โ–ผ
โฌ‡๏ธ  Download / Display in UI

๐Ÿ“ Project Structure

deep-research-agent/
โ”‚
โ”œโ”€โ”€ ๐Ÿ“„ server.py          # FastMCP server โ€” all 4 tools
โ”œโ”€โ”€ ๐Ÿ–ฅ๏ธ  app.py             # Streamlit frontend
โ”œโ”€โ”€ ๐Ÿ“‹ requirements.txt   # Python dependencies
โ”œโ”€โ”€ ๐Ÿ”’ .env.example       # Environment variable template
โ””โ”€โ”€ ๐Ÿ“– README.md          # This file

โš™๏ธ Configuration

Variable

Default

Description

TAVILY_API_KEY

(required)

Your Tavily search API key

MCP_HOST

localhost

MCP server host

MCP_PORT

8000

MCP server port

TAVILY_SEARCH_DEPTH

advanced

basic (faster) or advanced (thorough)


๐Ÿค Contributing

  • Contributions are welcome and appreciated! Here's how to get involved:

๐Ÿ› Reporting Bugs

  1. Check the Issues page to see if it's already reported

  2. Open a new issue with:

    • A clear title and description

    • Steps to reproduce

    • Expected vs actual behaviour

    • Your Python version and OS

๐Ÿ’ก Suggesting Features

Open an issue with the enhancement label and describe:

  • The problem you're trying to solve

  • Your proposed solution

  • Why would it benefit other users

๐Ÿ”ง Submitting Pull Requests

  1. Fork the repository

  2. Create a feature branch

    git checkout -b feature/your-feature-name
  3. Make your changes with clear, descriptive commits

    git commit -m "feat: add BM25 ranking to cluster_findings"
  4. Test your changes thoroughly

  5. Push to your fork

    git push origin feature/your-feature-name
  6. Open a Pull Request with a clear description of what you changed and why

๐Ÿ“ Code Style

  • Follow PEP 8 for Python code

  • Use type hints wherever possible

  • Add docstrings to all new functions

  • Keep functions focused โ€” one responsibility per function

๐ŸŒฑ Good First Issues

  • Looking for a place to start? Check issues tagged good first issue:

  • Adding more cluster label categories to _infer_cluster_label()

  • Improving the HTML cleaning regex patterns

  • Adding a progress bar to the Streamlit UI

  • Supporting additional output formats (PDF, DOCX)

  • Writing unit tests for the clustering functions


๐Ÿ“œ License

MIT License

Copyright (c) 2026 Deep Research Agent Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including, without limitation, the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT, OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

๐Ÿ™ Acknowledgements

  • FastMCP โ€” Python MCP server framework

  • Tavily โ€” AI-optimised search API

  • Streamlit โ€” Python web app framework

  • httpx โ€” Async HTTP client


  • Made with โค๏ธ and Python ยท Star โญ this repo if you found it useful!

F
license - not found
-
quality - not tested
C
maintenance

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/melisasvr/Deep-Research-Agent-MCP-Server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server