Skip to main content
Glama

MCP Job Search

by adamd9

This project implements a LinkedIn job scraper with persistent job indexing, deep scanning, and filtering capabilities using Cloudflare Workers. It scrapes LinkedIn job listings, performs detailed analysis of each job against a candidate profile using OpenAI, stores matches in Cloudflare KV, and exposes MCP-compatible HTTP endpoints.

Architecture

This implementation uses:

  • Cloudflare Workers for serverless execution
  • Cloudflare's Playwright fork for web scraping
  • Cloudflare KV for persistent data storage
  • OpenAI API for job analysis and matching
  • MCP (Model Context Protocol) for tool integration

Current Status

The worker is fully functional and can be run locally. It includes a complete MCP server with authentication, CORS handling, a health check endpoint, and SSE endpoints for real-time updates.

Some tools are currently implemented as stubs returning mock data, allowing for testing the end-to-end flow. The plan management tools (get_plan, create_plan, update_plan) and email digest functionality (send_digest) are fully implemented.

Setup

Prerequisites

  • Node.js (v18 or later)
  • npm or yarn
  • Cloudflare account (for deployment)

Environment Variables

Create a .env file in the project root with the following variables:

# OpenAI Configuration OPENAI_API_KEY=your-openai-api-key OPENAI_MODEL=gpt-4o # LinkedIn Credentials (for scraping) LINKEDIN_EMAIL=your-linkedin-email@example.com LINKEDIN_PASSWORD=your-linkedin-password # Email Configuration (for digests) SMTP_HOST=smtp.example.com SMTP_PORT=587 SMTP_USER=your-smtp-username SMTP_PASS=your-smtp-password DIGEST_FROM=jobs@example.com DIGEST_TO=you@example.com # Application Settings TIMEZONE=Australia/Sydney ACCESS_TOKEN=your-secure-random-token DEEP_SCAN_CONCURRENCY=2

The ACCESS_TOKEN is used for API authentication and should be a secure random string.

Installation

  1. Install dependencies:
    npm install
  2. Set up your environment variables in .env
  3. Create a job search plan (see Plan Management section below)

Running the Worker

Start the development server:

npm run dev

The worker will be available at http://localhost:8787.

Core Features

Define your profile, search terms, and scan criteria in a job search plan. The system uses this plan to:

  • Target relevant job searches on LinkedIn
  • Analyze job matches against your profile using OpenAI
  • Score jobs based on fit and requirements

Persistent Job Index

All scraped jobs are stored persistently in Cloudflare KV with:

  • Job details and metadata
  • Match scores and analysis
  • Scan history and timestamps
  • Deduplication to avoid processing the same job twice

Deep Scanning

Visits each job posting to extract comprehensive details:

  • Full job description and requirements
  • Company information and culture
  • Salary and benefits information
  • AI-powered analysis against your profile

Email Digests

Automated email summaries of your best job matches:

  • Configurable match score thresholds
  • Rich HTML formatting with job details
  • Direct links to job postings
  • Scheduled delivery options

API Reference

MCP Tools

The following tools are available via the MCP server:

Plan Management
  • get_plan: Get the current job search plan
  • create_plan: Create a new job search plan from a description
  • update_plan: Update the existing job search plan
Job Scanning & Analysis
  • scan: Scan LinkedIn job pages using Playwright - if URL provided, scans that page; otherwise uses plan URLs
  • rescan: Rescan existing jobs using URLs from the last scan or current plan
  • deep_scan_job: Manually deep scan a specific LinkedIn job URL for testing and debugging
  • failed_jobs: Get a report of jobs that failed during deep scanning with error analysis
Job Index Management
  • get_job_index: Get the current raw job index data for inspection (with filtering options)
  • reset_job_index: Reset the job index to start fresh - removes all stored jobs
System Operations
  • status: Check the status of background jobs (scan progress, errors, etc.)
  • send_digest: Send digest email with job matches to specified email address

HTTP Endpoints

The worker exposes HTTP endpoints for direct API access:

Core Endpoints
  • GET /health - Health check endpoint (no authentication required)
  • POST /mcp - MCP server endpoint (handles all tool calls with authentication)

Note: All MCP tools are accessed via the /mcp endpoint using the MCP protocol. The worker uses token-based authentication for the MCP endpoint.

Plan Management

The job search plan is the core configuration that drives the entire system. It defines:

Plan Structure

{ "profile": { "name": "Your Name", "experience": "Senior Software Engineer with 8+ years...", "skills": ["JavaScript", "React", "Node.js", "AWS"], "preferences": { "remote": true, "location": "Sydney, Australia", "salary_min": 120000 } }, "searches": [ { "keywords": "Senior Software Engineer React", "location": "Sydney", "filters": { "experience_level": "mid_senior", "job_type": "full_time" } } ], "scan_prompt": "Analyze this job posting for a senior software engineer..." }

Creating a Plan

You can create a plan in several ways:

  1. Via MCP Tool: Use the create_plan tool with a natural language description
  2. Via HTTP API: POST to /plan with either JSON or description
  3. Direct File: Create a plan.json file in the project root

Plan Examples

Natural Language Description:

I'm a senior full-stack developer with 8 years experience in React, Node.js, and AWS. I'm looking for remote senior engineer roles in fintech or healthcare, preferably $120k+ with equity options.

Structured JSON:

{ "profile": { "name": "Senior Developer", "experience": "8+ years full-stack development", "skills": ["React", "Node.js", "AWS", "TypeScript"] }, "searches": [ { "keywords": "Senior Full Stack Engineer", "location": "Remote" } ] }

Deployment

Local Development

For local development, the worker runs using Wrangler:

npm run dev

This starts a local development server at http://localhost:8787.

Production Deployment

To deploy to Cloudflare Workers:

  1. Configure Wrangler: Ensure you have a wrangler.toml file configured
  2. Set Environment Variables: Configure secrets in Cloudflare Workers dashboard
  3. Deploy: Run the deployment command
npm run deploy

Environment Variables in Production

Set these as secrets in your Cloudflare Workers environment:

wrangler secret put OPENAI_API_KEY wrangler secret put LINKEDIN_EMAIL wrangler secret put LINKEDIN_PASSWORD wrangler secret put SMTP_HOST wrangler secret put SMTP_USER wrangler secret put SMTP_PASS wrangler secret put ACCESS_TOKEN

Implementation Status

✅ Fully Implemented Features

Core Infrastructure
  • MCP server with complete tool integration
  • Cloudflare Workers runtime environment
  • Token-based authentication and CORS handling
  • Background job processing with status tracking
Plan Management
  • Plan Creation & Updates: AI-powered plan generation from natural language descriptions
  • Plan Storage: Persistent storage in Cloudflare KV
  • Search URL Generation: Automatic LinkedIn search URL creation
  • Plan Feedback: AI analysis and recommendations for plan improvement
Job Scanning & Analysis
  • LinkedIn Scraping: Full Playwright-based job page scraping
  • Deep Scanning: Individual job analysis with OpenAI integration
  • Background Processing: Non-blocking scan operations with status tracking
  • Error Handling: Comprehensive error reporting and failed job analysis
  • Fallback Matching: Keyword-based matching when AI is unavailable
Job Index Management
  • Persistent Storage: Cloudflare KV-based job index with deduplication
  • Job Tracking: Scan status, match scores, and metadata storage
  • Index Inspection: Detailed job index viewing with filtering options
  • Index Reset: Complete job index cleanup functionality
Email Digest System
  • SMTP Integration: Nodemailer-based email sending
  • HTML Email Generation: Rich formatting with job details and links
  • Auto-digest: Automatic email sending after scan completion
  • Job Tracking: Mark jobs as sent to avoid duplicates
Debugging & Monitoring
  • Manual Deep Scan: Test individual job URLs for debugging
  • Failed Jobs Report: Detailed analysis of scan failures with error categorization
  • Status Monitoring: Real-time background job status tracking

Authentication

API endpoints are protected with token-based authentication. Include your ACCESS_TOKEN in requests:

curl -H "Authorization: Bearer your-access-token" http://localhost:8787/status

Troubleshooting

Common Issues

  1. LinkedIn Authentication: Ensure your LinkedIn credentials are correct and the account isn't locked
  2. OpenAI API: Verify your API key has sufficient credits and proper permissions
  3. Email Delivery: Check SMTP settings and ensure the sender email is authorized
  4. Environment Variables: Verify all required variables are set in your .env file

Known Warnings

When testing email functionality, you may see network-related warnings in the Cloudflare Workers environment:

  • "Failed to resolve IPv4 addresses with current network"
  • "Possible EventEmitter memory leak detected"

These are environmental warnings and don't prevent functionality from working correctly.

Development Tips

  • Use the /health endpoint to verify the worker is running
  • Check the browser console for detailed error messages
  • Use the mock data endpoints for testing without external dependencies
  • Test plan creation with natural language descriptions before implementing complex JSON structures

Architecture Notes

Data Storage

The worker uses Cloudflare KV for persistent storage of job indexes, search plans, and scan history.

CORS Handling

Comprehensive CORS support is included for cross-origin requests from web applications.

SSE Support

Server-Sent Events are supported for real-time updates during long-running operations like job scanning.

Limitations

The Worker implementation has some limitations compared to the Node.js version:

  1. No Raw File Storage: The Worker cannot store raw HTML or job extraction files due to lack of filesystem access.
  2. No Screenshots: Screenshot capture is not supported in the Worker environment.
  3. Limited Storage: Job data is stored in Cloudflare KV, which has size limitations.
-
security - not tested
F
license - not found
-
quality - not tested

remote-capable server

The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.

Helps users find suitable LinkedIn job opportunities by automatically scraping listings, analyzing compatibility with user profiles using AI, and sending custom match reports via email.

  1. Architecture
    1. Current Status
      1. Setup
        1. Prerequisites
        2. Environment Variables
        3. Installation
        4. Running the Worker
      2. Core Features
        1. Plan-Driven Search
        2. Persistent Job Index
        3. Deep Scanning
        4. Email Digests
      3. API Reference
        1. MCP Tools
        2. HTTP Endpoints
      4. Plan Management
        1. Plan Structure
        2. Creating a Plan
        3. Plan Examples
      5. Deployment
        1. Local Development
        2. Production Deployment
        3. Environment Variables in Production
      6. Implementation Status
        1. ✅ Fully Implemented Features
        2. Authentication
      7. Troubleshooting
        1. Common Issues
        2. Known Warnings
        3. Development Tips
      8. Architecture Notes
        1. Data Storage
        2. CORS Handling
        3. SSE Support
      9. Limitations

        Related MCP Servers

        • -
          security
          A
          license
          -
          quality
          A Model Context Protocol (MCP) server that provides tools to interact with LinkedIn's Feeds and Job API. You can do "search for 3 data engineer jobs in . For each job check if it a good match for me by analyzing it against my resume in file resume.md."
          Last updated -
          2
          158
          Python
          The Unlicense
        • -
          security
          F
          license
          -
          quality
          A Model Context Protocol server that enables seamless interaction with LinkedIn for job applications, profile retrieval, feed browsing, and resume analysis through natural language commands.
          Last updated -
          19
          Python
        • A
          security
          A
          license
          A
          quality
          Enables AI assistants to interact with LinkedIn data through the Model Context Protocol, allowing profile searches, job discovery, messaging, and network analytics.
          Last updated -
          28
          6
          11
          TypeScript
          MIT License
          • Apple
        • A
          security
          A
          license
          A
          quality
          Provides tools to interact with the HireBase Job API, enabling users to search for jobs using various criteria and retrieve detailed job information through natural language.
          Last updated -
          2
          7
          Python
          MIT License
          • Apple
          • Linux

        View all related MCP servers

        MCP directory API

        We provide all the information about MCP servers via our MCP API.

        curl -X GET 'https://glama.ai/api/mcp/v1/servers/adamd9/mcp-jobsearch'

        If you have feedback or need assistance with the MCP directory API, please join our Discord server