Job URL Analyzer MCP Server

A comprehensive FastAPI-based microservice for analyzing job URLs and extracting detailed company information. Built with modern async Python, this service crawls job postings and company websites to build rich company profiles with data enrichment from external providers.

✨ Features

🕷️ Intelligent Web Crawling: Respectful crawling with robots.txt compliance and rate limiting
🧠 Content Extraction: Advanced HTML parsing using Selectolax for fast, accurate data extraction
🔗 Data Enrichment: Pluggable enrichment providers (Crunchbase, LinkedIn, custom APIs)
📊 Quality Scoring: Completeness and confidence metrics for extracted data
📝 Markdown Reports: Beautiful, comprehensive company analysis reports
🔍 Observability: OpenTelemetry tracing, Prometheus metrics, structured logging
🚀 Production Ready: Docker, Kubernetes, health checks, graceful shutdown
🧪 Well Tested: Comprehensive test suite with 80%+ coverage

🏗️ Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   FastAPI App   │───▶│   Orchestrator  │───▶│   Web Crawler   │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                                │                        │
                                ▼                        ▼
                       ┌─────────────────┐    ┌─────────────────┐
                       │ Content Extract │    │    Database     │
                       └─────────────────┘    │   (SQLAlchemy)  │
                                │             └─────────────────┘
                                ▼                        
                       ┌─────────────────┐    ┌─────────────────┐
                       │   Enrichment    │───▶│    Providers    │
                       │    Manager      │    │ (Crunchbase,etc)│
                       └─────────────────┘    └─────────────────┘
                                │                        
                                ▼                        
                       ┌─────────────────┐              
                       │ Report Generator│              
                       └─────────────────┘              

🚀 Quick Start

Prerequisites

Python 3.11+
Poetry (for dependency management)
Docker & Docker Compose (optional)

Local Development

Clone and Setup
git clone https://github.com/subslink326/job-url-analyzer-mcp.git cd job-url-analyzer-mcp poetry install
Environment Configuration (Optional)
# The application has sensible defaults and can run without environment configuration # To customize settings, create a .env file with your configuration # See src/job_url_analyzer/config.py for available settings
Database Setup
poetry run alembic upgrade head
Run Development Server
poetry run python -m job_url_analyzer.main # Server starts at http://localhost:8000

Docker Deployment

Development
docker-compose up --build
Production
docker-compose -f docker-compose.prod.yml up -d

📡 API Usage

Analyze Job URL

curl -X POST "http://localhost:8000/analyze" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://company.com/jobs/software-engineer",
    "include_enrichment": true,
    "force_refresh": false
  }'

Response Example

{
  "profile_id": "123e4567-e89b-12d3-a456-426614174000",
  "source_url": "https://company.com/jobs/software-engineer",
  "company_profile": {
    "name": "TechCorp",
    "description": "Leading AI company...",
    "industry": "Technology",
    "employee_count": 150,
    "funding_stage": "Series B",
    "total_funding": 25.0,
    "headquarters": "San Francisco, CA",
    "tech_stack": ["Python", "React", "AWS"],
    "benefits": ["Health insurance", "Remote work"]
  },
  "completeness_score": 0.85,
  "confidence_score": 0.90,
  "processing_time_ms": 3450,
  "enrichment_sources": ["crunchbase"],
  "markdown_report": "# TechCorp - Company Analysis Report\n..."
}

⚙️ Configuration

Environment Variables

Variable	Description	Default
`DEBUG`	Enable debug mode	`false`
`HOST`	Server host	`0.0.0.0`
`PORT`	Server port	`8000`
`DATABASE_URL`	Database connection string	`sqlite+aiosqlite:///./data/job_analyzer.db`
`MAX_CONCURRENT_REQUESTS`	Max concurrent HTTP requests	`10`
`REQUEST_TIMEOUT`	HTTP request timeout (seconds)	`30`
`CRAWL_DELAY`	Delay between requests (seconds)	`1.0`
`RESPECT_ROBOTS_TXT`	Respect robots.txt	`true`
`ENABLE_CRUNCHBASE`	Enable Crunchbase enrichment	`false`
`CRUNCHBASE_API_KEY`	Crunchbase API key	`""`
`DATA_RETENTION_DAYS`	Data retention period	`90`

📊 Monitoring

Metrics Endpoints

Health Check: GET /health
Prometheus Metrics: GET /metrics

Key Metrics

job_analyzer_requests_total - Total API requests
job_analyzer_analysis_success_total - Successful analyses
job_analyzer_completeness_score - Data completeness distribution
job_analyzer_crawl_requests_total - Crawl requests by status
job_analyzer_enrichment_success_total - Enrichment success by provider

🧪 Testing

Run Tests

# Unit tests
poetry run pytest

# With coverage
poetry run pytest --cov=job_url_analyzer --cov-report=html

# Integration tests only
poetry run pytest -m integration

# Skip slow tests
poetry run pytest -m "not slow"

🚀 Deployment

Kubernetes

# Apply manifests
kubectl apply -f kubernetes/

# Check deployment
kubectl get pods -l app=job-analyzer
kubectl logs -f deployment/job-analyzer

Production Checklist

Environment variables configured
Database migrations applied
SSL certificates configured
Monitoring dashboards set up
Log aggregation configured
Backup strategy implemented
Rate limiting configured
Resource limits set

🔧 Development

Project Structure

job-url-analyzer/
├── src/job_url_analyzer/          # Main application code
│   ├── enricher/                  # Enrichment providers
│   ├── main.py                    # FastAPI application
│   ├── config.py                  # Configuration
│   ├── models.py                  # Pydantic models
│   ├── database.py                # Database models
│   ├── crawler.py                 # Web crawler
│   ├── extractor.py               # Content extraction
│   ├── orchestrator.py            # Main orchestrator
│   └── report_generator.py        # Report generation
├── tests/                         # Test suite
├── alembic/                       # Database migrations
├── kubernetes/                    # K8s manifests
├── monitoring/                    # Monitoring configs
├── docker-compose.yml             # Development setup
├── docker-compose.prod.yml        # Production setup
└── Dockerfile                     # Container definition

Code Quality

The project uses:

Black for code formatting
Ruff for linting
MyPy for type checking
Pre-commit hooks for quality gates

# Setup pre-commit
poetry run pre-commit install

# Run quality checks
poetry run black .
poetry run ruff check .
poetry run mypy src/

📝 Recent Changes

Dependency Updates

Fixed: Replaced non-existent aiohttp-robotparser dependency with robotexclusionrulesparser for robots.txt parsing
Improved: Setup process now works out-of-the-box without requiring .env file configuration

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Add tests for new functionality
Ensure all tests pass (poetry run pytest)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

Documentation: This README and inline code comments
Issues: GitHub Issues for bug reports and feature requests
Discussions: GitHub Discussions for questions and community

Built with ❤️ using FastAPI, SQLAlchemy, and modern Python tooling.

This server cannot be installed

security - not tested

license - permissive license

quality - not tested

How are these scores calculated?

remote-capable server

The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.

A FastAPI-based microservice that analyzes job URLs and extracts detailed company information by crawling job postings and company websites, with data enrichment from external providers.

Related MCP Servers

HireBase MCP Server
jhgaylor
A
security
A
license
A
quality
Provides tools to interact with the HireBase Job API, enabling users to search for jobs using various criteria and retrieve detailed job information through natural language.
Last updated -
2
6
Python
MIT License
LinkedIn Model Context Protocol (MCP) Server
Rayyan9477
-
security
F
license
-
quality
A server that enables AI assistants to interact with LinkedIn programmatically for job searching, resume/cover letter generation, and managing job applications through standardized JSON-RPC requests.
Last updated -
9
Python
JobApply MCP Server
Sakshee5
-
security
F
license
-
quality
A Model Context Protocol server that helps with job applications by analyzing job postings and optimizing resumes and cover letters through document analysis tools.
Last updated -
4
Resume Analysis MCP Server
sms03
-
security
F
license
-
quality
An intelligent server that processes and evaluates resumes by extracting structured data, analyzing skills and experience, scoring candidates against job requirements, and generating detailed reports.
Last updated -
Python

View all related MCP servers

Job URL Analyzer MCP Server