TESTING.mdโข16.6 kB
# Testing Guide - MCP Wikipedia Server
Comprehensive testing documentation for the MCP Wikipedia Server project.
## ๐งช Testing Overview
This project uses a multi-layered testing approach to ensure reliability and functionality:
1. **Unit Tests**: Test individual functions and components
2. **Integration Tests**: Test MCP protocol compliance and Wikipedia API integration
3. **End-to-End Tests**: Test complete workflows with real client interactions
4. **Performance Tests**: Validate response times and resource usage
5. **Manual Tests**: Human verification of complex scenarios
## ๐ Quick Start Testing
### Run All Tests
```bash
# Activate environment
source .venv311/bin/activate
# Run the test suite
python -m pytest tests/ -v
# Run with coverage
python -m pytest tests/ --cov=src --cov-report=html
```
### Test Individual Components
```bash
# Test server functionality
python tests/test_server.py
# Test with example client
python example_client.py
# Manual server test
cd src/mcp_server && python mcp_server.py --test
```
## ๐ Test Categories
### 1. Unit Tests
Located in `tests/test_server.py`, these tests verify individual tool functionality.
#### Running Unit Tests
```bash
python -m pytest tests/test_server.py -v
```
#### Test Structure
```python
import pytest
import asyncio
from src.mcp_server.mcp_server import WikipediaServer
class TestWikipediaTools:
@pytest.fixture
def server(self):
"""Create a server instance for testing."""
return WikipediaServer()
@pytest.mark.asyncio
async def test_fetch_wikipedia_info_success(self, server):
"""Test successful Wikipedia article retrieval."""
result = await server.fetch_wikipedia_info("Python programming")
assert result["success"] is True
assert "data" in result
assert "summary" in result["data"]
assert len(result["data"]["summary"]) > 0
assert "url" in result["data"]
@pytest.mark.asyncio
async def test_list_sections_success(self, server):
"""Test section listing functionality."""
result = await server.list_wikipedia_sections("Machine Learning")
assert result["success"] is True
assert "sections" in result["data"]
assert len(result["data"]["sections"]) > 0
assert all("title" in section for section in result["data"]["sections"])
```
#### Test Cases Covered
| Test Case | Purpose | Expected Result |
|-----------|---------|-----------------|
| `test_fetch_info_success` | Valid article search | Returns article data |
| `test_fetch_info_not_found` | Invalid article name | Returns error with suggestions |
| `test_fetch_info_disambiguation` | Ambiguous search | Returns disambiguation options |
| `test_list_sections_success` | Valid article sections | Returns section list |
| `test_list_sections_not_found` | Invalid article | Returns appropriate error |
| `test_section_content_success` | Valid section retrieval | Returns section text |
| `test_section_content_not_found` | Invalid section name | Returns error message |
| `test_input_validation` | Invalid parameters | Returns validation errors |
### 2. Integration Tests
These tests verify MCP protocol compliance and Wikipedia API integration.
#### MCP Protocol Tests
```python
import json
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
@pytest.mark.asyncio
async def test_mcp_protocol_compliance():
"""Test that server properly implements MCP protocol."""
server_params = StdioServerParameters(
command="python",
args=["src/mcp_server/mcp_server.py"]
)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
# Test initialization
await session.initialize()
# Test tool listing
tools = await session.list_tools()
assert len(tools.tools) == 3
tool_names = [tool.name for tool in tools.tools]
assert "fetch_wikipedia_info" in tool_names
assert "list_wikipedia_sections" in tool_names
assert "get_section_content" in tool_names
```
#### Wikipedia API Integration Tests
```python
@pytest.mark.asyncio
async def test_wikipedia_api_integration():
"""Test integration with Wikipedia API."""
server = WikipediaServer()
# Test various search scenarios
test_cases = [
("Python programming", True), # Should find article
("Artificial Intelligence", True), # Should find article
("NonExistentArticle12345", False), # Should not find
("Python", True), # Disambiguation page
]
for query, should_succeed in test_cases:
result = await server.fetch_wikipedia_info(query)
if should_succeed:
assert result["success"] is True or "suggestions" in result
else:
assert result["success"] is False
```
### 3. End-to-End Tests
Complete workflow tests using the example client.
#### Example Client Test
```bash
# Run the example client as a test
python example_client.py --test-mode
# Expected output:
# โ
Server connection successful
# โ
Tool listing successful
# โ
Wikipedia search successful
# โ
Section listing successful
# โ
Section content retrieval successful
```
#### Custom E2E Test Script
```python
# tests/test_e2e.py
import asyncio
import subprocess
import time
from mcp_client import WikipediaClient
async def test_complete_workflow():
"""Test complete Wikipedia research workflow."""
# Start server
server_process = subprocess.Popen([
"python", "src/mcp_server/mcp_server.py"
])
try:
# Wait for server to start
time.sleep(2)
# Create client
client = WikipediaClient()
# Test workflow: Research Python programming
topic = "Python (programming language)"
# Step 1: Get article summary
summary = await client.search_wikipedia(topic)
assert summary["success"] is True
# Step 2: List article sections
sections = await client.list_sections(topic)
assert sections["success"] is True
assert len(sections["data"]["sections"]) > 5
# Step 3: Get specific section content
history_content = await client.get_section_content(topic, "History")
assert history_content["success"] is True
assert "Guido van Rossum" in history_content["data"]["content"]
print("โ
Complete workflow test passed")
finally:
server_process.terminate()
server_process.wait()
if __name__ == "__main__":
asyncio.run(test_complete_workflow())
```
### 4. Performance Tests
Validate response times and resource usage.
#### Response Time Tests
```python
import time
import asyncio
from statistics import mean
async def test_response_times():
"""Test that response times are within acceptable limits."""
server = WikipediaServer()
# Test multiple requests for each tool
search_times = []
for i in range(10):
start = time.time()
result = await server.fetch_wikipedia_info("Machine Learning")
end = time.time()
if result["success"]:
search_times.append(end - start)
avg_search_time = mean(search_times)
assert avg_search_time < 2.0, f"Average search time {avg_search_time}s exceeds 2s limit"
print(f"โ
Average search time: {avg_search_time:.2f}s")
```
#### Concurrent Request Tests
```python
async def test_concurrent_requests():
"""Test server handling of concurrent requests."""
server = WikipediaServer()
# Create multiple concurrent requests
queries = [
"Artificial Intelligence",
"Machine Learning",
"Python programming",
"Data Science",
"Neural Networks"
]
start_time = time.time()
# Execute all requests concurrently
tasks = [server.fetch_wikipedia_info(query) for query in queries]
results = await asyncio.gather(*tasks)
end_time = time.time()
total_time = end_time - start_time
# Verify all requests succeeded
successful_requests = sum(1 for result in results if result["success"])
assert successful_requests >= 4, "Too many concurrent requests failed"
# Verify total time is reasonable (should be much less than sequential)
assert total_time < 10.0, f"Concurrent requests took {total_time}s (too slow)"
print(f"โ
{successful_requests}/{len(queries)} concurrent requests succeeded in {total_time:.2f}s")
```
### 5. Error Handling Tests
Verify proper error handling and recovery.
#### Network Error Simulation
```python
from unittest.mock import patch, Mock
import wikipedia
async def test_network_error_handling():
"""Test handling of network errors."""
server = WikipediaServer()
# Mock network failure
with patch('wikipedia.search', side_effect=Exception("Network error")):
result = await server.fetch_wikipedia_info("Test query")
assert result["success"] is False
assert "error" in result
assert "Network error" in result["error"]
```
#### Invalid Input Tests
```python
async def test_invalid_input_handling():
"""Test handling of invalid inputs."""
server = WikipediaServer()
# Test empty query
result = await server.fetch_wikipedia_info("")
assert result["success"] is False
# Test extremely long query
long_query = "x" * 10000
result = await server.fetch_wikipedia_info(long_query)
assert result["success"] is False
# Test special characters
special_query = "!@#$%^&*()"
result = await server.fetch_wikipedia_info(special_query)
# Should handle gracefully (may succeed or fail, but shouldn't crash)
assert "error" in result or "success" in result
```
### 6. Manual Tests
Human verification of complex scenarios.
#### Manual Test Checklist
**Server Startup**
- [ ] Server starts without errors
- [ ] All dependencies are loaded
- [ ] MCP protocol initialization successful
- [ ] Server responds to test queries
**Tool Functionality**
- [ ] Search finds relevant articles
- [ ] Section listing is complete and accurate
- [ ] Section content is properly formatted
- [ ] Error messages are helpful and actionable
**Edge Cases**
- [ ] Disambiguation pages handled correctly
- [ ] Non-existent articles return appropriate errors
- [ ] Special characters in queries work properly
- [ ] Very long articles load completely
**Performance**
- [ ] Response times under 3 seconds for typical queries
- [ ] Multiple concurrent requests handled properly
- [ ] Memory usage remains stable over time
- [ ] No memory leaks during extended usage
**Integration**
- [ ] Works with example client
- [ ] Compatible with Claude Desktop
- [ ] MCP protocol compliance verified
- [ ] Error messages display properly in clients
## ๐ง Test Configuration
### pytest Configuration (`pytest.ini`)
```ini
[tool:pytest]
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*
addopts =
-v
--tb=short
--strict-markers
--disable-warnings
markers =
slow: marks tests as slow (deselect with '-m "not slow"')
integration: marks tests as integration tests
e2e: marks tests as end-to-end tests
asyncio_mode = auto
```
### Test Dependencies
```bash
# Install test dependencies
pip install pytest pytest-asyncio pytest-cov
pip install unittest-mock # For mocking Wikipedia API
```
### Environment Variables for Testing
```bash
# Set test environment variables
export TESTING=true
export WIKIPEDIA_TIMEOUT=5
export LOG_LEVEL=DEBUG
```
## ๐ Test Coverage
### Coverage Requirements
- **Unit Tests**: 90%+ code coverage
- **Integration Tests**: All MCP protocol endpoints
- **Error Handling**: All error paths tested
- **Performance**: Response time benchmarks
### Running Coverage Reports
```bash
# Generate HTML coverage report
python -m pytest tests/ --cov=src --cov-report=html
# View report
open htmlcov/index.html
# Generate terminal coverage report
python -m pytest tests/ --cov=src --cov-report=term-missing
```
### Coverage Targets
| Component | Target Coverage | Current Status |
|-----------|----------------|----------------|
| `mcp_server.py` | 95% | โ
Achieved |
| `mcp_client.py` | 90% | โ
Achieved |
| Error handling | 100% | โ
Achieved |
| Tool functions | 95% | โ
Achieved |
## ๐จ Continuous Integration
### GitHub Actions Workflow (`.github/workflows/test.yml`)
```yaml
name: Test Suite
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main ]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.11, 3.12]
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v3
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install pytest pytest-asyncio pytest-cov
pip install -r requirements.txt
- name: Run tests
run: |
python -m pytest tests/ -v --cov=src --cov-report=xml
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
with:
file: ./coverage.xml
```
## ๐ Debugging Tests
### Common Test Failures
**Import Errors**
```bash
# Fix Python path issues
export PYTHONPATH="${PYTHONPATH}:$(pwd)/src"
python -m pytest tests/
```
**Async Test Issues**
```python
# Ensure proper async test decoration
@pytest.mark.asyncio
async def test_async_function():
result = await some_async_function()
assert result is not None
```
**Wikipedia API Timeouts**
```python
# Increase timeout for slow connections
@pytest.mark.slow
async def test_with_longer_timeout():
# Test implementation
pass
# Run without slow tests
python -m pytest -m "not slow"
```
### Debug Mode Testing
```bash
# Run tests in debug mode
python -m pytest tests/ -v -s --tb=long --log-cli-level=DEBUG
```
## ๐ Test Metrics and Reporting
### Performance Benchmarks
```python
# tests/benchmarks.py
import time
import asyncio
from statistics import mean, stdev
async def benchmark_search_performance():
"""Benchmark search performance across different query types."""
server = WikipediaServer()
# Different query complexity levels
queries = {
"simple": ["Python", "Java", "C++"],
"medium": ["Machine Learning", "Data Science", "Web Development"],
"complex": ["Artificial Intelligence in Healthcare", "Quantum Computing Applications"]
}
results = {}
for complexity, query_list in queries.items():
times = []
for query in query_list:
start = time.time()
result = await server.fetch_wikipedia_info(query)
end = time.time()
if result["success"]:
times.append(end - start)
if times:
results[complexity] = {
"mean": mean(times),
"stdev": stdev(times) if len(times) > 1 else 0,
"min": min(times),
"max": max(times)
}
return results
```
### Test Reports
Generate comprehensive test reports:
```bash
# Generate JUnit XML report
python -m pytest tests/ --junitxml=report.xml
# Generate HTML report
python -m pytest tests/ --html=report.html --self-contained-html
```
---
## ๐ Testing Support
### Running Into Issues?
1. **Check test dependencies**: `pip list | grep pytest`
2. **Verify Python version**: `python --version` (should be 3.11+)
3. **Check environment**: `source .venv311/bin/activate`
4. **Clear cache**: `python -m pytest --cache-clear`
5. **Verbose output**: `python -m pytest -v -s`
### Contributing Tests
When contributing new features:
1. **Write tests first** (TDD approach recommended)
2. **Include both positive and negative test cases**
3. **Test edge cases and error conditions**
4. **Ensure tests are deterministic and repeatable**
5. **Add performance tests for new tools**
For detailed testing guidelines, see [DEVELOPMENT.md](DEVELOPMENT.md).
---
*This testing guide ensures the MCP Wikipedia Server maintains high quality and reliability. Regular testing helps catch issues early and maintains user confidence in the system.*