Skip to main content
Glama
elad12390
by elad12390
TASKS.md15.7 kB
# Development Tasks - SearXNG MCP Server **Last Updated:** 2025-01-15 **Update this file whenever you start/complete/block on a task!** --- ## Task Status Legend - 🔴 **Not Started** - No work begun - 🟡 **In Progress** - Active development - 🟢 **Complete** - Done and tested - ⚫ **Blocked** - Waiting on dependency or decision - 🔵 **Testing** - Implementation done, needs verification --- ## Phase 1: Core Infrastructure ✅ ### Task #0: Project Setup **Status:** 🟢 Complete **Priority:** P0 (Critical) **Complexity:** Low **Estimated Time:** 2-3 hours **Actual Time:** ~3 hours **Description:** Set up minimal MCP server with SearXNG integration and crawl4ai page fetching. **Implementation Checklist:** - [x] Create modular package structure (`searxng_mcp/`) - [x] Implement `config.py` with env-based settings - [x] Implement `search.py` with SearXNG JSON API client - [x] Implement `crawler.py` with crawl4ai wrapper - [x] Implement `server.py` with FastMCP and two tools - [x] Add `pyproject.toml` with dependencies and console script - [x] Write `README.md` with setup/usage instructions - [x] Register in OpenCode global config (`~/.config/opencode/opencode.json`) - [x] Test with `uv run searxng-mcp` **Completion Notes:** - Successfully tested with OpenCode - Response limits work correctly (8KB default) - Both tools (`web_search`, `crawl_url`) operational - Environment variables properly override defaults **Files Modified:** - Created: `searxng_mcp/` package - Created: `pyproject.toml`, `requirements.txt`, `README.md` - Modified: `~/.config/opencode/opencode.json` --- ## Phase 2: High-Impact Tools 🚧 ### Task #1: Package Registry Search Tool **Status:** 🟢 Complete **Priority:** P1 (High) **Complexity:** Medium **Estimated Time:** 4-6 hours **Actual Time:** ~2 hours **Completed:** 2025-01-15 **Description:** Add tool to search npm, PyPI, crates.io, and Go packages. Return structured info: downloads, last update, dependencies, security advisories. **User Story:** As a developer, I want to quickly check package details without leaving my coding environment, so I can evaluate libraries faster. **Implementation Plan:** 1. **Create module:** `searxng_mcp/registry.py` - Class: `PackageRegistryClient` - Methods: - `async search_npm(name: str) -> PackageInfo` - `async search_pypi(name: str) -> PackageInfo` - `async search_crates(name: str) -> PackageInfo` - `async search_go(name: str) -> PackageInfo` - Use official registry APIs (not scraping) 2. **Add constants to config.py:** ```python NPM_REGISTRY_URL = "https://registry.npmjs.org" PYPI_API_URL = "https://pypi.org/pypi" CRATES_API_URL = "https://crates.io/api/v1" GO_PROXY_URL = "https://proxy.golang.org" ``` 3. **Add tool to server.py:** ```python @mcp.tool() async def package_info( name: str, registry: Literal["npm", "pypi", "crates", "go"] = "npm" ) -> str: """Fetch package information from registries.""" ``` 4. **Response format:** ``` Package: express (npm) Version: 4.18.2 (latest) Downloads: 25M/week Last Updated: 2023-12-15 License: MIT Dependencies: 30 Security: ✅ No known vulnerabilities Repository: https://github.com/expressjs/express Description: Fast, unopinionated, minimalist web framework ``` **Testing Checklist:** - [x] npm package lookup works - [x] PyPI package lookup works - [x] crates.io package lookup works - [x] Go module lookup works - [x] Handles non-existent packages gracefully - [x] Response fits within 8KB limit - [x] Error handling for network failures - [ ] Caching for repeated lookups (deferred) **Completion Notes:** - All four registries (npm, PyPI, crates.io, Go) working perfectly - Added weekly download stats for npm (51M/week for express) - Proper error handling for 404s with helpful messages - Response formatting clean and readable - Time-ago formatting (e.g., "7 months ago") improves UX - No new dependencies required (used existing httpx) **Files Created:** - `searxng_mcp/registry.py` (237 lines) - Registry client with all 4 APIs - Updated: `searxng_mcp/server.py` - Added `package_info` tool **Performance:** - npm lookups: ~1-2s - PyPI lookups: ~1s - crates.io lookups: ~1s - Go lookups: ~1-2s **Related Files:** - Spec: `docs/new-features/package-registry/SPEC.md` - Tests: Manual testing completed, unit tests optional --- ### Task #2: Error Translator Tool **Status:** 🔴 Not Started **Priority:** P1 (High) **Complexity:** High **Estimated Time:** 8-10 hours **Assigned To:** Unassigned **Description:** Tool that takes error messages/stack traces and returns ranked solutions from Stack Overflow, GitHub issues, and documentation. **User Story:** As a developer hitting an error, I want instant ranked solutions instead of manually searching Stack Overflow and clicking through multiple answers. **Implementation Plan:** 1. **Create module:** `searxng_mcp/error_solver.py` - Class: `ErrorSolver` - Methods: - `async analyze_error(error_text: str, context: dict) -> List[Solution]` - `async search_stackoverflow(error_sig: str) -> List[SOAnswer]` - `async search_github_issues(error_sig: str) -> List[GitHubIssue]` 2. **Error signature extraction:** - Strip file paths, line numbers (too specific) - Keep error type, core message - Identify language/framework from stack trace 3. **Ranking algorithm:** - Stack Overflow: accepted answer > votes > recency - GitHub: closed issues > issue reactions > recency - Filter: solutions must be <2 years old (default) 4. **Add tool to server.py:** ```python @mcp.tool() async def solve_error( error: str, language: str = "auto", max_solutions: int = 3 ) -> str: """Find solutions for error messages/stack traces.""" ``` 5. **Response format:** ``` Error Analysis: Type: TypeError Language: Python Likely Cause: NoneType attribute access Solution #1 (Stack Overflow - Accepted, 234 votes) Check for None before accessing attributes: if obj is not None: obj.method() Source: https://stackoverflow.com/... Solution #2 (GitHub Issue - Closed) [Solution details...] ``` **Testing Checklist:** - [ ] Extracts error signatures correctly - [ ] Python errors work - [ ] JavaScript errors work - [ ] Type errors, syntax errors, runtime errors all handled - [ ] Solutions ranked appropriately - [ ] No duplicate solutions - [ ] Handles malformed error text gracefully - [ ] Response fits size limits **Dependencies:** - May need: `beautifulsoup4` (already in crawl4ai deps) - May need: Stack Overflow API token (optional, higher rate limits) **Blockers:** - Need to decide: Use Stack Overflow API or scrape? (API has rate limits but cleaner) **Related Files:** - Spec: `docs/new-features/error-translator/SPEC.md` - Tests: `tests/test_error_solver.py` --- ### Task #3: GitHub Repository Info Tool **Status:** 🟢 Complete **Priority:** P2 (Medium) **Complexity:** Low **Estimated Time:** 3-4 hours **Actual Time:** ~1.5 hours **Completed:** 2025-01-15 **Description:** Tool to fetch GitHub repo metadata: stars, issues, recent commits, README, contributors. **User Story:** As a developer evaluating a library, I want quick repo health metrics without opening GitHub in browser. **Implementation Plan:** 1. **Create module:** `searxng_mcp/github.py` - Class: `GitHubClient` - Use GitHub REST API (no auth needed for public repos, but support token for rate limits) - Methods: - `async get_repo_info(owner: str, repo: str) -> RepoInfo` - `async get_recent_commits(owner: str, repo: str, count: int) -> List[Commit]` 2. **Add constants to config.py:** ```python GITHUB_API_URL = "https://api.github.com" GITHUB_TOKEN = _env_str("GITHUB_TOKEN", "") # Optional, for higher rate limits ``` 3. **Add tool to server.py:** ```python @mcp.tool() async def github_repo( repo: str, # Format: "owner/repo" or full URL include_commits: bool = True ) -> str: """Fetch GitHub repository information and health metrics.""" ``` 4. **Response format:** ``` Repository: unclecode/crawl4ai Stars: 15.2k | Forks: 1.3k | Watchers: 234 License: Apache-2.0 Last Updated: 2 hours ago Open Issues: 45 | Open PRs: 12 Recent Activity (Last 5 Commits): - [2h ago] feat: add webhook support (unclecode) - [1d ago] fix: memory leak in crawler (contributor) ... Description: Open-source LLM Friendly Web Crawler & Scraper Homepage: https://crawl4ai.com ``` **Testing Checklist:** - [x] Parses "owner/repo" format - [x] Parses full GitHub URLs - [x] Works without GitHub token - [x] Respects rate limits (60/hr without token, 5000/hr with) - [x] Handles non-existent repos - [x] Handles private repos (returns "not accessible") - [x] Response fits size limits **Completion Notes:** - Full GitHub REST API integration working perfectly - Supports both `owner/repo` and full URL formats - Returns comprehensive repo health metrics: stars, forks, issues, PRs, language, license - Includes recent commit history (optional) - Proper error handling for 404s and rate limits - Optional GitHub token support via `GITHUB_TOKEN` env var - Response time: ~2-3s per request **Files Created:** - `searxng_mcp/github.py` (209 lines) - Complete GitHub API client - Updated: `searxng_mcp/server.py` - Added `github_repo` tool **Performance:** - Repo info: ~1-2s - With commits: ~2-3s - Rate limits: 60 req/hr (no token), 5000 req/hr (with token) **Related Files:** - Spec: `docs/new-features/github-info/SPEC.md` - Tests: Manual testing completed --- ### Task #4: Structured Data Extraction Enhancement **Status:** 🔴 Not Started **Priority:** P2 (Medium) **Complexity:** Medium **Estimated Time:** 5-7 hours **Assigned To:** Unassigned **Description:** Enhance existing `crawl_url` or add new tool to extract structured data (tables, prices, specs) using CSS selectors or LLM extraction. **User Story:** As a developer, I want to extract specific fields from web pages (e.g., "get all prices from this e-commerce page") without processing full page markdown. **Implementation Plan:** 1. **Enhance crawler.py:** - Add method: `async extract_structured(url: str, schema: dict) -> dict` - Use crawl4ai's built-in extraction strategies: - `JsonCssExtractionStrategy` for CSS selector-based extraction - `LLMExtractionStrategy` for AI-powered extraction (optional) 2. **Add tool to server.py:** ```python @mcp.tool() async def extract_data( url: str, fields: str, # JSON string describing what to extract method: Literal["css", "llm"] = "css" ) -> str: """Extract structured data from a web page.""" ``` 3. **Example usage:** ``` extract_data( url="https://example.com/products", fields='{"name": "h2.product-title", "price": "span.price"}', method="css" ) Returns: [ {"name": "Product A", "price": "$29.99"}, {"name": "Product B", "price": "$39.99"} ] ``` **Testing Checklist:** - [ ] CSS selector extraction works - [ ] Handles missing elements gracefully - [ ] Works with tables - [ ] Works with lists - [ ] LLM extraction works (if implemented) - [ ] Response is valid JSON - [ ] Handles extraction failures **Dependencies:** - Already have crawl4ai with extraction support - Optional: LLM provider API key for LLM extraction mode **Blockers:** - Need to decide: CSS-only first, or include LLM extraction from start? **Related Files:** - Spec: `docs/new-features/structured-extract/SPEC.md` - Tests: `tests/test_extraction.py` --- ### Task #5: API Explorer Tool **Status:** 🔴 Not Started **Priority:** P2 (Medium) **Complexity:** High **Estimated Time:** 8-10 hours **Assigned To:** Unassigned **Description:** Tool to fetch OpenAPI/Swagger specs, generate example requests, test endpoints. **User Story:** As a developer integrating with an API, I want to explore endpoints and see example requests without leaving my environment. **Implementation Plan:** 1. **Create module:** `searxng_mcp/api_explorer.py` - Class: `APIExplorer` - Methods: - `async fetch_openapi_spec(url: str) -> dict` - `async list_endpoints(spec: dict) -> List[Endpoint]` - `async generate_example(endpoint: str, method: str) -> str` - `async test_endpoint(url: str, method: str, headers: dict) -> Response` 2. **Add tool to server.py:** ```python @mcp.tool() async def explore_api( spec_url: str, # URL to OpenAPI/Swagger JSON endpoint: str = "", # Optional: specific endpoint to explore ) -> str: """Explore API using OpenAPI/Swagger specification.""" ``` 3. **Response format:** ``` API: Stripe API v1 Base URL: https://api.stripe.com/v1 Authentication: Bearer token Available Endpoints (showing 5 of 234): 1. POST /v1/charges Description: Create a charge Required: amount, currency Example: curl -X POST https://api.stripe.com/v1/charges \ -H "Authorization: Bearer sk_test_..." \ -d amount=2000 \ -d currency=usd 2. GET /v1/customers/{id} ... ``` **Testing Checklist:** - [ ] Fetches OpenAPI 3.0 specs - [ ] Fetches Swagger 2.0 specs - [ ] Lists all endpoints - [ ] Generates valid curl examples - [ ] Handles authentication schemes - [ ] Identifies required vs optional params - [ ] Response fits size limits - [ ] Handles invalid/malformed specs **Dependencies:** - May need: `pydantic` for spec validation (already transitive dep of crawl4ai) - May need: `openapi-spec-validator` (optional) **Blockers:** - None **Related Files:** - Spec: `docs/new-features/api-explorer/SPEC.md` - Tests: `tests/test_api_explorer.py` --- ## Phase 3: Advanced Features 📅 ### Task #6: Multi-Source Aggregation **Status:** 🔴 Not Started **Priority:** P3 (Low) **Complexity:** Medium **Estimated Time:** 6-8 hours **Description:** Tool that searches multiple sources, crawls top results, and summarizes findings. **Deferred:** Will design after Phase 2 completion. --- ### Task #7: Site-Specific Search Filters **Status:** 🔴 Not Started **Priority:** P3 (Low) **Complexity:** Low **Estimated Time:** 2-3 hours **Description:** Add `site:` parameter to `web_search` tool for domain-scoped queries. **Deferred:** Will implement after Phase 2. --- ### Task #8: Security/CVE Monitoring **Status:** 🔴 Not Started **Priority:** P3 (Low) **Complexity:** High **Estimated Time:** 10-12 hours **Description:** Tool to check CVE databases and security advisories for specific tech stacks. **Deferred:** Requires more research into CVE APIs. --- ## Notes & Decisions ### 2025-01-15: Initial Planning - Prioritized based on daily usage frequency analysis - Package registry ranked #1 for quick win (high frequency, low complexity) - Error translator ranked high for impact but higher complexity - GitHub info easy to implement, good ROI - Structured extraction builds on existing crawler - API explorer more complex but very useful for integration work ### Future Considerations - Add caching layer for frequently accessed data (npm packages, GitHub repos) - Consider rate limiting to respect external API quotas - May want to add telemetry to track which tools are most used - Should add comprehensive error logging for debugging --- **Remember:** Update this file every time you start, progress, or complete a task!

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/elad12390/web-research-assistant'

If you have feedback or need assistance with the MCP directory API, please join our Discord server