2025-autumn-mcp
Project Background
This project is about learning how to turn basic data science skills into real, usable services. Rather than running code in isolation, you’ll package text analysis tools into a Model Context Protocol (MCP) server that can be consumed by any MCP-aware client, including modern AI assistants. Along the way you’ll learn how to design structured inputs and outputs (schemas), containerize and run services with Docker, and expose your work in a way that others — whether researchers, policymakers, or fellow students — could immediately integrate into their own workflows. The goal is not to build the most advanced NLP system, but to see how small, well-defined analytics can be made reusable, composable, and sharable across disciplines.
Goals
This sprint focuses on learning the Model Context Protocol by building a text-analysis MCP server.
What You'll Build:
An MCP server with baseline text-analysis tools (group work)
Your own custom MCP tool relevant to your field (individual work on a feature branch)
Using Python, Pydantic schemas, and FastMCP, you'll gain experience with natural language processing techniques (TF-IDF, sentiment analysis, readability metrics), structured data exchange, and service-oriented design.
Deliverables:
Working baseline MCP server with
corpus_answerandtext_profiletoolsYour custom tool on a feature branch with tests
Demo showing your tool in action
Documentation explaining your tool's domain application
Project Structure
Introduction & Setup
Getting Started:
Review the demonstration notebook:
notebooks/MCP_Introduction.ipynbRead about MCP:
Skim this page on Pydantic
Complete the Quick Start below to set up your environment
Phase 1: Group Work - Baseline MCP Server
Part 1: Schemas & Text Analysis Foundations
Objectives (Complete together as a group):
Understand Pydantic schemas and data validation
Learn TF-IDF basics for document search
Set up a shared corpus
Understand MCP tool design patterns
Tasks:
Complete the notebook
notebooks/MCP_Introduction.ipynbBuild your first MCP tool
Work with TF-IDF for document search
Define Pydantic schemas
Register tools with FastMCP
Create a shared corpus
Add 3-5
.txtfiles todata/corpus/Sample documents provided: climate policy, urban planning, AI ethics, public health
Choose documents that demonstrate the tools' capabilities
Review the provided code structure in
src/mcp_server/schemas.py- Pydantic models for tool inputs/outputstools/corpus_answer.py- Document search skeletontools/text_profile.py- Text analytics skeletonserver.py- Main MCP server application
Deliverable: Completed notebook and shared corpus
Part 2: Baseline Tool Implementation
Objectives (Implement together as a group):
Implement the
corpus_answertool with TF-IDF searchImplement the
text_profiletool with text analyticsTest the baseline implementation
Tasks:
Implement (
src/mcp_server/tools/corpus_answer.py)Complete the TODOs in:
_load_corpus()- Load .txt files from the corpus directory_ensure_index()- Build TF-IDF index from documents_synthesize_answer()- Create concise answer snippetscorpus_answer()- Main search and ranking logic
Key steps:
Load all .txt files from
data/corpus/Build TF-IDF vectorizer with appropriate parameters
Transform query and compute cosine similarity
Return top 3-5 results with snippets and scores
Implement (
src/mcp_server/tools/text_profile.py)Complete the TODOs in:
_read_doc()- Read document by ID from corpus_tokenize()- Extract words from text_flesch_reading_ease()- Calculate readability score_top_terms()- Extract keywords using TF-IDFtext_profile()- Compute all text features
Features to calculate:
Character and word counts
Type-token ratio (lexical diversity)
Flesch Reading Ease score
VADER sentiment analysis
Top n-grams and keywords
Test your tools
# Run tests make test # Test specific tool uv run pytest tests/mcp_server/test_corpus_answer.py -vDebug and refine
Use logging to debug
Test with different queries and documents
Ensure all tests pass
Deliverable: Working baseline server with corpus_answer and text_profile tools
Phase 2: Individual Work - Custom Tool Development
Creating Your Own MCP Tool
Now that you understand MCP fundamentals, each student will create their own custom tool on a feature branch.
Objectives (Individual work):
Apply MCP concepts to your own field or interests
Design and implement a non-trivial tool
Write tests for your tool
Demonstrate domain-specific application
Tasks:
Create your feature branch
git checkout -b student/my-custom-toolDesign your tool
Choose a tool relevant to your field or interests. Examples:
Policy analysis: Extract policy recommendations from documents
Data science: Statistical analysis or data transformation tool
Research: Literature review summarization or citation extraction
Education: Readability adaptation or concept explanation
Healthcare: Medical terminology extraction or symptom checking
Environmental: Climate data analysis or carbon footprint calculation
Your tool should:
Be non-trivial (more complex than a simple calculation)
Have a clear use case in your domain
Use Pydantic schemas for inputs/outputs
Return structured, useful data
Implement your tool
Create
src/mcp_server/tools/my_tool_name.py:from pydantic import BaseModel, Field class MyToolInput(BaseModel): """Input schema for my tool.""" # Define your inputs class MyToolOutput(BaseModel): """Output schema for my tool.""" # Define your outputs def my_tool(input: MyToolInput) -> MyToolOutput: """Your tool implementation.""" # Your logic hereRegister your tool in
src/mcp_server/server.py:from mcp_server.tools.my_tool_name import my_tool, MyToolInput, MyToolOutput @mcp.tool def my_tool_tool(input: MyToolInput) -> MyToolOutput: """My custom tool description.""" return my_tool(input)Write tests in
tests/mcp_server/test_my_tool.py:def test_my_tool(): result = my_tool(MyToolInput(...)) assert result.some_field == expected_valueTest and document
Run
make testto verify tests passRun
uv run python tests/manual_server_test.pyto test end-to-endDocument your tool's purpose and usage in comments
Deliverable: Working custom tool with tests on your feature branch
Demo & Presentation
Objectives:
Demonstrate your custom tool in action
Show how it applies MCP concepts to your domain
Present test results
Reflect on real-world applications
Tasks:
Test your server
Option A: Quick test (validate tools work)
make run-interactive uv run pytest tests/manual_server_test.py -vOption B: MCP Inspector (full protocol test)
# Terminal 1: Start server make run-interactive uv run python -m mcp_server.server # Terminal 2: Run Inspector on HOST (not in container) npx @modelcontextprotocol/inspector # Choose: STDIO transport, command: ./run_mcp_server.shSee
notebooks/MCP_Introduction.ipynbfor complete Inspector setup instructions.Prepare your demo presentation
Your demo should show:
All three tools: Baseline tools (
corpus_answer,text_profile) + your custom toolYour custom tool in depth:
What problem it solves in your domain
Example inputs and outputs
How the Pydantic schemas are designed
Test results proving it works
Real-world application: How someone in your field would actually use this tool
Write documentation for your custom tool In your tool file or a separate doc, explain:
What problem your tool solves
How to use it (with examples)
Design decisions (why this schema? why this approach?)
Potential applications in your field
Limitations and future improvements
Reflection questions (for your documentation)
How does your tool address a real need in your domain?
What challenges did you face in implementing it?
How could it be extended or improved?
How might it integrate with other tools or systems?
Final Deliverable:
Feature branch with your custom tool
Passing test suite
Documentation explaining your tool and its domain application
Quick Start
Note: The corpus files are included in the repository at data/corpus/. You can modify or add to them for your project.
Option A: Using VS Code/Cursor (Recommended)
If you're using VS Code or Cursor, you can use the devcontainer:
Option B: Using Make Commands
Technical Expectations
Prerequisites
We use Docker, Make, uv, and Node.js as part of our curriculum. If you are unfamiliar with them, it is strongly recommended you read over the following:
Required on your HOST machine:
Docker: An introduction to Docker
Make: Usually pre-installed on macOS/Linux. Windows users: install via Chocolatey or use WSL
Node.js: Required for MCP Inspector testing tool
Install from nodejs.org (LTS version)
Or use package manager:
brew install node(macOS),apt install nodejs npm(Ubuntu)Verify:
node --versionshould show v18.x or higher
Inside the Docker container:
uv: An introduction to uv - for Python package management
Container-Based Development
All code must be run inside the Docker container. This ensures consistent environments across different machines and eliminates "works on my machine" issues.
Environment Management with uv
We use uv for Python environment and package management inside the container. uv handles:
Virtual environment creation and management (replaces venv/pyenv)
Package installation and dependency resolution (replaces pip)
Project dependency management via
pyproject.toml
Important: When running Python code, prefix commands with uv run to maintain the proper environment:
Usage & Testing
Running Tests
Docker & Make
We use docker and make to run our code. Common make commands:
make build-only: Build the Docker image onlymake run-interactive: Start an interactive bash session in the containermake test: Run all tests with pytestmake devcontainer: Build and prepare devcontainer for VS Code/Cursormake clean: Clean up Docker images and containers
The file Makefile contains details about the specific commands that are run when calling each make target.
Additional Resources
MCP and FastMCP
Text Analysis Libraries
Reference Implementation
Review
notebooks/MCP_Introduction.ipynbfor interactive examples
Style
We use ruff to enforce style standards and grade code quality. This is an automated code checker that looks for specific issues in the code that need to be fixed to make it readable and consistent with common standards. ruff is run before each commit via pre-commit. If it fails, the commit will be blocked and the user will be shown what needs to be changed.
To run pre-commit inside the container:
You can also run ruff directly: