# Code quality improvements: Constants, docstrings, and comprehensive unit tests
## Summary
Professional code quality improvements addressing all issues identified in the code review. This PR extracts constants, adds comprehensive documentation, implements security improvements, and includes a full unit test suite.
## π§ Constants & Configuration
### Extracted Constants
- **HTTP Headers**: `COMMON_HEADERS` (eliminates 4 duplicate copies)
- **Timeouts**: `DEFAULT_TIMEOUT=15`, `INSTANT_API_TIMEOUT=10`, `AUTOCOMPLETE_TIMEOUT=10`, `PAGE_FETCH_TIMEOUT=15`
- **Content Limits**: `CONTENT_PREVIEW_LENGTH=500`, `MAX_PREVIEW_PARAGRAPHS=5`
- **Selectors**: `RESULT_SELECTORS`, `TITLE_SELECTORS`, `SNIPPET_SELECTORS`, `CONTENT_SELECTORS`
- **Security**: `ALLOWED_URL_SCHEMES = {"http", "https"}`
**Total**: 14 constants extracted, eliminating all magic numbers β
## π‘οΈ Security Improvements
### URL Validation
- New `validate_url()` function checks URL schemes
- Whitelist-based: Only allows `http` and `https`
- Blocks dangerous schemes: `file://`, `javascript:`, `data:`, `ftp://`
- Applied to `get_page_content` before fetching
### Tests Coverage
- 8 tests for URL validation covering all edge cases
- Tests for malicious URLs, malformed URLs, empty strings
## π Code Refactoring
### Helper Functions (DRY Principle)
1. **`get_http_client_from_context(ctx)`**
- Standardizes HTTP client retrieval across all 3 tools
- Returns tuple: `(client, should_close)`
- Eliminates 30+ lines of duplicate code
2. **`validate_url(url)`**
- Security validation for all URLs
- Prevents SSRF and other injection attacks
### Removed Redundancies
- Removed `"status": "success"/"error"` fields (MCP has error handling)
- Consolidated error handling patterns
- Standardized response structures
## π Documentation
### Comprehensive Docstrings
- **11 functions** now have Google-style docstrings
- Includes: Args, Returns, descriptions
- Examples: `validate_url`, `get_http_client_from_context`, `get_autocomplete_suggestions`
- All search functions documented: `search_duckduckgo_instant`, `search_duckduckgo_html`, `search_web`
- All tool functions documented: `web_search`, `get_page_content`, `suggest_related_searches`
### Coverage: 100% β
## π§ͺ Unit Tests
### Test Statistics
- **32 unit tests** across 2 test files
- **100% pass rate** β
- **Fast execution**: 0.10s
### test_tools.py (17 tests)
```text
TestValidateUrl (8 tests)
ββ Valid URLs: http, https
ββ Invalid URLs: file, javascript, data, ftp
ββ Edge cases: malformed, empty
TestGetHttpClientFromContext (3 tests)
ββ Retrieves from lifespan context
ββ Creates new when missing
ββ Handles missing lifespan_context attribute
TestGetAutocompleteSuggestions (6 tests)
ββ Successful API call
ββ Empty suggestions
ββ Malformed response
ββ HTTP error handling
ββ Request error handling
ββ JSON decode error handling
```
### test_search.py (15 tests)
```text
TestExtractDomain (6 tests)
ββ Simple domain extraction
ββ Subdomains
ββ Ports
ββ Lowercase conversion
ββ Malformed URLs
ββ Empty strings
TestSearchResult (2 tests)
ββ Full creation
ββ Default domain
TestSearchDuckduckgoInstant (3 tests)
ββ Successful API call
ββ No results
ββ HTTP error
TestSearchWeb (4 tests)
ββ Combines instant + HTML results
ββ Deduplicates by URL
ββ Filters invalid URLs
ββ Respects count limit
```
### Test Quality
- Uses `pytest` with async support (`pytest-asyncio`)
- Proper mocking with `unittest.mock`
- Clear test names and docstrings
- Edge case coverage (network failures, malformed data, empty responses)
## π¨ Code Style
### Logging Standardization
- Converted all f-strings to %-style formatting
- **Before**: `logger.info(f"Found {len(results)} results")`
- **After**: `logger.info("Found %d results", len(results))`
- **Total**: 7 conversions for consistency
### All Linters Pass β
- black β
- isort β
- flake8 β
- mypy β
## π Changes Summary
### Code Metrics
- **2 commits**: Refactoring + Tests
- **4 files changed**: tools.py, search.py, test_tools.py, test_search.py
- **+659 lines**: Constants, docstrings, tests
- **-192 lines**: Removed duplicates, redundant fields
### Key Improvements
1. Constants extracted: 14
2. Helper functions added: 2
3. Docstrings added: 11
4. Unit tests created: 32
5. Security validations: URL scheme whitelist
6. Code duplicates removed: ~50 lines
7. Logging standardized: 100%
## β
Verification
### Tests
```bash
uv run pytest tests/ -v
# 32 passed in 0.10s β
```
### Linters
```bash
pre-commit run --all-files
# All hooks passed β
```
### Installation
```bash
uv tool install --no-cache .
# Installs successfully β
```
## π Related
Addresses all code quality suggestions from initial review:
- β
Extract constants (headers, selectors, timeouts)
- β
Standardize HTTP client access
- β
Add URL validation
- β
Remove redundant status fields
- β
Standardize logging format
- β
Add comprehensive docstrings
- β
Create unit test suite
- β
Clean up build/ directory
---
π€ Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>