Integrates with Crunchbase API to enrich company profiles with detailed funding information, employee counts, and other company data during the job URL analysis process.
Powers the microservice API, providing endpoints for job URL analysis and company profile generation.
Supports deployment and orchestration of the service in containerized environments with provided manifests.
Generates comprehensive company analysis reports in Markdown format with extracted and enriched company data.
Provides distributed tracing capabilities for monitoring and debugging the job analysis pipeline.
Exposes metrics endpoints for monitoring service performance, analysis success rates, and data completeness scores.
Manages database operations for storing and retrieving analyzed job data and company profiles.
Job URL Analyzer MCP Server
A comprehensive FastAPI-based microservice for analyzing job URLs and extracting detailed company information. Built with modern async Python, this service crawls job postings and company websites to build rich company profiles with data enrichment from external providers.
✨ Features
- 🕷️ Intelligent Web Crawling: Respectful crawling with robots.txt compliance and rate limiting
- 🧠 Content Extraction: Advanced HTML parsing using Selectolax for fast, accurate data extraction
- 🔗 Data Enrichment: Pluggable enrichment providers (Crunchbase, LinkedIn, custom APIs)
- 📊 Quality Scoring: Completeness and confidence metrics for extracted data
- 📝 Markdown Reports: Beautiful, comprehensive company analysis reports
- 🔍 Observability: OpenTelemetry tracing, Prometheus metrics, structured logging
- 🚀 Production Ready: Docker, Kubernetes, health checks, graceful shutdown
- 🧪 Well Tested: Comprehensive test suite with 80%+ coverage
🏗️ Architecture
🚀 Quick Start
Prerequisites
- Python 3.11+
- Poetry (for dependency management)
- Docker & Docker Compose (optional)
Local Development
- Clone and Setup
- Environment Configuration (Optional)
- Database Setup
- Run Development Server
Docker Deployment
- Development
- Production
📡 API Usage
Analyze Job URL
Response Example
⚙️ Configuration
Environment Variables
Variable | Description | Default |
---|---|---|
DEBUG | Enable debug mode | false |
HOST | Server host | 0.0.0.0 |
PORT | Server port | 8000 |
DATABASE_URL | Database connection string | sqlite+aiosqlite:///./data/job_analyzer.db |
MAX_CONCURRENT_REQUESTS | Max concurrent HTTP requests | 10 |
REQUEST_TIMEOUT | HTTP request timeout (seconds) | 30 |
CRAWL_DELAY | Delay between requests (seconds) | 1.0 |
RESPECT_ROBOTS_TXT | Respect robots.txt | true |
ENABLE_CRUNCHBASE | Enable Crunchbase enrichment | false |
CRUNCHBASE_API_KEY | Crunchbase API key | "" |
DATA_RETENTION_DAYS | Data retention period | 90 |
📊 Monitoring
Metrics Endpoints
- Health Check:
GET /health
- Prometheus Metrics:
GET /metrics
Key Metrics
job_analyzer_requests_total
- Total API requestsjob_analyzer_analysis_success_total
- Successful analysesjob_analyzer_completeness_score
- Data completeness distributionjob_analyzer_crawl_requests_total
- Crawl requests by statusjob_analyzer_enrichment_success_total
- Enrichment success by provider
🧪 Testing
Run Tests
🚀 Deployment
Kubernetes
Production Checklist
- Environment variables configured
- Database migrations applied
- SSL certificates configured
- Monitoring dashboards set up
- Log aggregation configured
- Backup strategy implemented
- Rate limiting configured
- Resource limits set
🔧 Development
Project Structure
Code Quality
The project uses:
- Black for code formatting
- Ruff for linting
- MyPy for type checking
- Pre-commit hooks for quality gates
📝 Recent Changes
Dependency Updates
- Fixed: Replaced non-existent
aiohttp-robotparser
dependency withrobotexclusionrulesparser
for robots.txt parsing - Improved: Setup process now works out-of-the-box without requiring
.env
file configuration
🤝 Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Make your changes
- Add tests for new functionality
- Ensure all tests pass (
poetry run pytest
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🆘 Support
- Documentation: This README and inline code comments
- Issues: GitHub Issues for bug reports and feature requests
- Discussions: GitHub Discussions for questions and community
Built with ❤️ using FastAPI, SQLAlchemy, and modern Python tooling.
This server cannot be installed
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
A FastAPI-based microservice that analyzes job URLs and extracts detailed company information by crawling job postings and company websites, with data enrichment from external providers.
Related MCP Servers
- AsecurityAlicenseAqualityA server that provides web scraping and intelligent content searching capabilities using the Firecrawl API, enabling AI agents to extract structured data from websites and perform content searches.Last updated -52TypeScriptMIT License
- AsecurityAlicenseAqualityA Model Context Protocol server that provides Google Jobs search capabilities through SerpAPI integration, featuring multi-language support, flexible search parameters, and rich job details.Last updated -128JavaScriptMIT License
- AsecurityAlicenseAqualityProvides tools to interact with the HireBase Job API, enabling users to search for jobs using various criteria and retrieve detailed job information through natural language.Last updated -22PythonMIT License
- -securityFlicense-qualityA server that enables AI assistants to interact with LinkedIn programmatically for job searching, resume/cover letter generation, and managing job applications through standardized JSON-RPC requests.Last updated -5Python