Skip to main content
Glama

Job URL Analyzer MCP Server

by subslink326

Job URL Analyzer MCP Server

A comprehensive FastAPI-based microservice for analyzing job URLs and extracting detailed company information. Built with modern async Python, this service crawls job postings and company websites to build rich company profiles with data enrichment from external providers.

✨ Features

  • 🕷️ Intelligent Web Crawling: Respectful crawling with robots.txt compliance and rate limiting
  • 🧠 Content Extraction: Advanced HTML parsing using Selectolax for fast, accurate data extraction
  • 🔗 Data Enrichment: Pluggable enrichment providers (Crunchbase, LinkedIn, custom APIs)
  • 📊 Quality Scoring: Completeness and confidence metrics for extracted data
  • 📝 Markdown Reports: Beautiful, comprehensive company analysis reports
  • 🔍 Observability: OpenTelemetry tracing, Prometheus metrics, structured logging
  • 🚀 Production Ready: Docker, Kubernetes, health checks, graceful shutdown
  • 🧪 Well Tested: Comprehensive test suite with 80%+ coverage

🏗️ Architecture

┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ FastAPI App │───▶│ Orchestrator │───▶│ Web Crawler │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Content Extract │ │ Database │ └─────────────────┘ │ (SQLAlchemy) │ │ └─────────────────┘ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Enrichment │───▶│ Providers │ │ Manager │ │ (Crunchbase,etc)│ └─────────────────┘ └─────────────────┘ │ ▼ ┌─────────────────┐ │ Report Generator│ └─────────────────┘

🚀 Quick Start

Prerequisites

  • Python 3.11+
  • Poetry (for dependency management)
  • Docker & Docker Compose (optional)

Local Development

  1. Clone and Setup
    git clone https://github.com/subslink326/job-url-analyzer-mcp.git cd job-url-analyzer-mcp poetry install
  2. Environment Configuration (Optional)
    # The application has sensible defaults and can run without environment configuration # To customize settings, create a .env file with your configuration # See src/job_url_analyzer/config.py for available settings
  3. Database Setup
    poetry run alembic upgrade head
  4. Run Development Server
    poetry run python -m job_url_analyzer.main # Server starts at http://localhost:8000

Docker Deployment

  1. Development
    docker-compose up --build
  2. Production
    docker-compose -f docker-compose.prod.yml up -d

📡 API Usage

Analyze Job URL

curl -X POST "http://localhost:8000/analyze" \ -H "Content-Type: application/json" \ -d '{ "url": "https://company.com/jobs/software-engineer", "include_enrichment": true, "force_refresh": false }'

Response Example

{ "profile_id": "123e4567-e89b-12d3-a456-426614174000", "source_url": "https://company.com/jobs/software-engineer", "company_profile": { "name": "TechCorp", "description": "Leading AI company...", "industry": "Technology", "employee_count": 150, "funding_stage": "Series B", "total_funding": 25.0, "headquarters": "San Francisco, CA", "tech_stack": ["Python", "React", "AWS"], "benefits": ["Health insurance", "Remote work"] }, "completeness_score": 0.85, "confidence_score": 0.90, "processing_time_ms": 3450, "enrichment_sources": ["crunchbase"], "markdown_report": "# TechCorp - Company Analysis Report\n..." }

⚙️ Configuration

Environment Variables

VariableDescriptionDefault
DEBUGEnable debug modefalse
HOSTServer host0.0.0.0
PORTServer port8000
DATABASE_URLDatabase connection stringsqlite+aiosqlite:///./data/job_analyzer.db
MAX_CONCURRENT_REQUESTSMax concurrent HTTP requests10
REQUEST_TIMEOUTHTTP request timeout (seconds)30
CRAWL_DELAYDelay between requests (seconds)1.0
RESPECT_ROBOTS_TXTRespect robots.txttrue
ENABLE_CRUNCHBASEEnable Crunchbase enrichmentfalse
CRUNCHBASE_API_KEYCrunchbase API key""
DATA_RETENTION_DAYSData retention period90

📊 Monitoring

Metrics Endpoints

  • Health Check: GET /health
  • Prometheus Metrics: GET /metrics

Key Metrics

  • job_analyzer_requests_total - Total API requests
  • job_analyzer_analysis_success_total - Successful analyses
  • job_analyzer_completeness_score - Data completeness distribution
  • job_analyzer_crawl_requests_total - Crawl requests by status
  • job_analyzer_enrichment_success_total - Enrichment success by provider

🧪 Testing

Run Tests

# Unit tests poetry run pytest # With coverage poetry run pytest --cov=job_url_analyzer --cov-report=html # Integration tests only poetry run pytest -m integration # Skip slow tests poetry run pytest -m "not slow"

🚀 Deployment

Kubernetes

# Apply manifests kubectl apply -f kubernetes/ # Check deployment kubectl get pods -l app=job-analyzer kubectl logs -f deployment/job-analyzer

Production Checklist

  • Environment variables configured
  • Database migrations applied
  • SSL certificates configured
  • Monitoring dashboards set up
  • Log aggregation configured
  • Backup strategy implemented
  • Rate limiting configured
  • Resource limits set

🔧 Development

Project Structure

job-url-analyzer/ ├── src/job_url_analyzer/ # Main application code │ ├── enricher/ # Enrichment providers │ ├── main.py # FastAPI application │ ├── config.py # Configuration │ ├── models.py # Pydantic models │ ├── database.py # Database models │ ├── crawler.py # Web crawler │ ├── extractor.py # Content extraction │ ├── orchestrator.py # Main orchestrator │ └── report_generator.py # Report generation ├── tests/ # Test suite ├── alembic/ # Database migrations ├── kubernetes/ # K8s manifests ├── monitoring/ # Monitoring configs ├── docker-compose.yml # Development setup ├── docker-compose.prod.yml # Production setup └── Dockerfile # Container definition

Code Quality

The project uses:

  • Black for code formatting
  • Ruff for linting
  • MyPy for type checking
  • Pre-commit hooks for quality gates
# Setup pre-commit poetry run pre-commit install # Run quality checks poetry run black . poetry run ruff check . poetry run mypy src/

📝 Recent Changes

Dependency Updates

  • Fixed: Replaced non-existent aiohttp-robotparser dependency with robotexclusionrulesparser for robots.txt parsing
  • Improved: Setup process now works out-of-the-box without requiring .env file configuration

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Add tests for new functionality
  5. Ensure all tests pass (poetry run pytest)
  6. Commit your changes (git commit -m 'Add amazing feature')
  7. Push to the branch (git push origin feature/amazing-feature)
  8. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

  • Documentation: This README and inline code comments
  • Issues: GitHub Issues for bug reports and feature requests
  • Discussions: GitHub Discussions for questions and community

Built with ❤️ using FastAPI, SQLAlchemy, and modern Python tooling.

-
security - not tested
A
license - permissive license
-
quality - not tested

remote-capable server

The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.

A FastAPI-based microservice that analyzes job URLs and extracts detailed company information by crawling job postings and company websites, with data enrichment from external providers.

  1. ✨ Features
    1. 🏗️ Architecture
      1. 🚀 Quick Start
        1. Prerequisites
        2. Local Development
        3. Docker Deployment
      2. 📡 API Usage
        1. Analyze Job URL
        2. Response Example
      3. ⚙️ Configuration
        1. Environment Variables
      4. 📊 Monitoring
        1. Metrics Endpoints
        2. Key Metrics
      5. 🧪 Testing
        1. Run Tests
      6. 🚀 Deployment
        1. Kubernetes
        2. Production Checklist
      7. 🔧 Development
        1. Project Structure
        2. Code Quality
      8. 📝 Recent Changes
        1. Dependency Updates
      9. 🤝 Contributing
        1. 📄 License
          1. 🆘 Support

            Related MCP Servers

            • A
              security
              A
              license
              A
              quality
              A server that provides web scraping and intelligent content searching capabilities using the Firecrawl API, enabling AI agents to extract structured data from websites and perform content searches.
              Last updated -
              5
              2
              TypeScript
              MIT License
              • Apple
              • Linux
            • A
              security
              A
              license
              A
              quality
              A Model Context Protocol server that provides Google Jobs search capabilities through SerpAPI integration, featuring multi-language support, flexible search parameters, and rich job details.
              Last updated -
              1
              2
              8
              JavaScript
              MIT License
            • A
              security
              A
              license
              A
              quality
              Provides tools to interact with the HireBase Job API, enabling users to search for jobs using various criteria and retrieve detailed job information through natural language.
              Last updated -
              2
              2
              Python
              MIT License
              • Apple
              • Linux
            • -
              security
              F
              license
              -
              quality
              A server that enables AI assistants to interact with LinkedIn programmatically for job searching, resume/cover letter generation, and managing job applications through standardized JSON-RPC requests.
              Last updated -
              5
              Python

            View all related MCP servers

            MCP directory API

            We provide all the information about MCP servers via our MCP API.

            curl -X GET 'https://glama.ai/api/mcp/v1/servers/subslink326/job-url-analyzer-mcp'

            If you have feedback or need assistance with the MCP directory API, please join our Discord server