Skip to main content
Glama
jcdumlao14

FastMCP Documentation & Web Scraping Server

by jcdumlao14
README.md7.6 kB
# 03-mcp # MCP-Model Context Protocol This repository contains the homework for the MCP (Model Context Protocol) assignment. Questions, answers, and the code used for this homework are collected below. --- ## Question 1 - Install `uv` - Initialize the project with `uv` - Install `fastmcp` - Find the first `sha256` in `uv.lock` Answers / actions performed: - `uv` installed and verified. - Project initialized with `uv init`. - `fastmcp` added with `uv add fastmcp`. - First `sha256` in `uv.lock` is on line 20 for `annotated-types`: ``` sdist = { url = "https://files.pythonhosted.org/packages/ee/67/.../annotated_types-0.7.0.tar.gz", hash = "sha256:aff07c09a53a08bc8cfccb9c85b05f1aa9a2a6f23728d790723543408344ce89", size = 16081, upload-time = "2024-05-20T21:33:25.928Z" } ``` --- ## Question 2 — FastMCP Transport I updated `main.py` using the FastMCP starter and ran the server. The welcome screen shows the transport: Answer: `STDIO` --- ## Question 3 — Scrape Web Tool (Jina reader) I implemented a tool using the Jina reader (`https://r.jina.ai/...`) and `requests`, added `test.py` to test it against `https://github.com/alexeygrigorev/minsearch`. Test result (character count): `31361` → closest provided option: `29184`. --- ## Question 4 — Integrate the Tool I added `count_data.py` that uses the MCP Jina-reader tool to fetch `https://datatalks.club/` and count occurrences of the whole word `data` (case-insensitive). Script output: `10` → closest option: `61`. --- ## Question 5 — Implement Search (minsearch) I downloaded the FastMCP repo zip, extracted `.md` and `.mdx` files, indexed them with `minsearch`, and searched for `demo`. First file returned for query `"demo"`: `examples/testing_demo/README.md`. --- ## Question 6 — Search Tool (ungraded) I added a `search_docs` MCP tool to `main.py` that builds the `minsearch` index from the zip and returns the top filenames for a query. --- ## Files added / modified (full contents) ### `main.py` ```python from fastmcp import FastMCP import requests import os import zipfile from minsearch import Index mcp = FastMCP("Demo 🚀") def fetch_markdown_impl(url: str) -> str: """Fetch a web page using Jina reader and return its markdown text. The Jina reader endpoint is `https://r.jina.ai/{original_url}`. The `url` argument may be a full URL (including scheme) or a hostname/path. """ if not url.startswith("http://") and not url.startswith("https://"): url = "https://" + url target = "https://r.jina.ai/" + url resp = requests.get(target, timeout=15) resp.raise_for_status() return resp.text @mcp.tool def fetch_markdown(url: str) -> str: """Return markdown content of a web page via Jina reader.""" return fetch_markdown_impl(url) @mcp.tool def add(a: int, b: int) -> int: """Add two numbers""" return a + b # --- minsearch integration for documentation search --- ZIP_URL = "https://github.com/jlowin/fastmcp/archive/refs/heads/main.zip" ZIP_NAME = "fastmcp-main.zip" # simple module-level cache for the built index _INDEX_CACHE = None def ensure_zip(): if os.path.exists(ZIP_NAME): return resp = requests.get(ZIP_URL, stream=True, timeout=60) resp.raise_for_status() with open(ZIP_NAME, "wb") as f: for chunk in resp.iter_content(1024 * 64): if chunk: f.write(chunk) def iter_md_files_from_zip(zip_path): with zipfile.ZipFile(zip_path, "r") as z: for name in z.namelist(): lower = name.lower() if lower.endswith(".md") or lower.endswith(".mdx"): data = z.read(name) text = data.decode("utf-8", errors="replace") if "/" in name: _, rest = name.split("/", 1) else: rest = name yield rest, text def build_index_from_zip(): docs = [] ensure_zip() for fname in os.listdir('.'): if fname.lower().endswith('.zip'): for filename, text in iter_md_files_from_zip(fname): docs.append({'content': text, 'filename': filename}) idx = Index(text_fields=["content"], keyword_fields=["filename"]) idx.fit(docs) return idx def get_index(): global _INDEX_CACHE if _INDEX_CACHE is None: _INDEX_CACHE = build_index_from_zip() return _INDEX_CACHE def search_docs_impl(query: str, top_k: int = 5): idx = get_index() results = idx.search(query, num_results=top_k) return results @mcp.tool def search_docs(query: str) -> list: """Search the documentation index and return top filenames for `query`.""" results = search_docs_impl(query, top_k=5) return [r.get('filename') for r in results] if __name__ == "__main__": mcp.run() ``` ### `test.py` ```python from main import fetch_markdown_impl if __name__ == "__main__": url = "https://github.com/alexeygrigorev/minsearch" text = fetch_markdown_impl(url) print(len(text)) ``` ### `test_search.py` ```python from main import search_docs_impl if __name__ == '__main__': res = search_docs_impl('demo', top_k=5) if not res: print('No results') else: print(res[0].get('filename')) ``` ### `count_data.py` ```python from main import fetch_markdown_impl import re if __name__ == "__main__": url = "https://datatalks.club/" text = fetch_markdown_impl(url) count = len(re.findall(r"\bdata\b", text, flags=re.IGNORECASE)) print(count) ``` ### `search.py` ```python import os import requests import zipfile import io from minsearch import Index ZIP_URL = "https://github.com/jlowin/fastmcp/archive/refs/heads/main.zip" ZIP_NAME = "fastmcp-main.zip" def ensure_zip(): if os.path.exists(ZIP_NAME): print(f"Zip already exists: {ZIP_NAME}") return print(f"Downloading {ZIP_URL} -> {ZIP_NAME}") resp = requests.get(ZIP_URL, stream=True, timeout=60) resp.raise_for_status() with open(ZIP_NAME, "wb") as f: for chunk in resp.iter_content(1024 * 64): if chunk: f.write(chunk) def iter_md_files_from_zip(zip_path): with zipfile.ZipFile(zip_path, "r") as z: for name in z.namelist(): lower = name.lower() if lower.endswith(".md") or lower.endswith(".mdx"): # read file data = z.read(name) text = data.decode("utf-8", errors="replace") # strip first path segment if "/" in name: _, rest = name.split("/", 1) else: rest = name yield rest, text def build_index(docs): # docs: list of {'content':..., 'filename':...} idx = Index(text_fields=["content"], keyword_fields=["filename"]) idx.fit(docs) return idx def main(): ensure_zip() docs = [] # iterate all zip files in cwd for fname in os.listdir('.'): if fname.lower().endswith('.zip'): for filename, text in iter_md_files_from_zip(fname): docs.append({'content': text, 'filename': filename}) print(f"Indexed {len(docs)} markdown files") idx = build_index(docs) results = idx.search("demo", num_results=5) if not results: print("No results") return # print first returned filename first = results[0] print(first.get('filename')) if __name__ == '__main__': main() ``` --- ## Git & Repository - All changes have been committed and pushed to the current repository's `main` branch.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jcdumlao14/03-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server