Uses environment variables from a .env file for configuration including credentials, API keys, and service endpoints
Manages project dependencies and provides scripts for testing and running the application
Analyzes job details against candidate profiles to generate match scores and explanations using models like GPT-4o
MCP Job Search
This project implements a LinkedIn job scraper with persistent job indexing, deep scanning, and filtering capabilities using Cloudflare Workers. It scrapes LinkedIn job listings, performs detailed analysis of each job against a candidate profile using OpenAI, stores matches in Cloudflare KV, and exposes MCP-compatible HTTP endpoints.
Architecture
This implementation uses:
- Cloudflare Workers for serverless execution
- Cloudflare's Playwright fork for web scraping
- Cloudflare KV for persistent data storage
- OpenAI API for job analysis and matching
- MCP (Model Context Protocol) for tool integration
Current Status
The worker is fully functional and can be run locally. It includes a complete MCP server with authentication, CORS handling, a health check endpoint, and SSE endpoints for real-time updates.
Some tools are currently implemented as stubs returning mock data, allowing for testing the end-to-end flow. The plan management tools (get_plan
, create_plan
, update_plan
) and email digest functionality (send_digest
) are fully implemented.
Setup
Prerequisites
- Node.js (v18 or later)
- npm or yarn
- Cloudflare account (for deployment)
Environment Variables
Create a .env
file in the project root with the following variables:
The ACCESS_TOKEN
is used for API authentication and should be a secure random string.
Installation
- Install dependencies:
- Set up your environment variables in
.env
- Create a job search plan (see Plan Management section below)
Running the Worker
Start the development server:
The worker will be available at http://localhost:8787
.
Core Features
Plan-Driven Search
Define your profile, search terms, and scan criteria in a job search plan. The system uses this plan to:
- Target relevant job searches on LinkedIn
- Analyze job matches against your profile using OpenAI
- Score jobs based on fit and requirements
Persistent Job Index
All scraped jobs are stored persistently in Cloudflare KV with:
- Job details and metadata
- Match scores and analysis
- Scan history and timestamps
- Deduplication to avoid processing the same job twice
Deep Scanning
Visits each job posting to extract comprehensive details:
- Full job description and requirements
- Company information and culture
- Salary and benefits information
- AI-powered analysis against your profile
Email Digests
Automated email summaries of your best job matches:
- Configurable match score thresholds
- Rich HTML formatting with job details
- Direct links to job postings
- Scheduled delivery options
API Reference
MCP Tools
The following tools are available via the MCP server:
Plan Management
get_plan
: Get the current job search plancreate_plan
: Create a new job search plan from a descriptionupdate_plan
: Update the existing job search plan
Job Scanning & Analysis
scan
: Scan LinkedIn job pages using Playwright - if URL provided, scans that page; otherwise uses plan URLsrescan
: Rescan existing jobs using URLs from the last scan or current plandeep_scan_job
: Manually deep scan a specific LinkedIn job URL for testing and debuggingfailed_jobs
: Get a report of jobs that failed during deep scanning with error analysis
Job Index Management
get_job_index
: Get the current raw job index data for inspection (with filtering options)reset_job_index
: Reset the job index to start fresh - removes all stored jobs
System Operations
status
: Check the status of background jobs (scan progress, errors, etc.)send_digest
: Send digest email with job matches to specified email address
HTTP Endpoints
The worker exposes HTTP endpoints for direct API access:
Core Endpoints
GET /health
- Health check endpoint (no authentication required)POST /mcp
- MCP server endpoint (handles all tool calls with authentication)
Note: All MCP tools are accessed via the /mcp
endpoint using the MCP protocol. The worker uses token-based authentication for the MCP endpoint.
Plan Management
The job search plan is the core configuration that drives the entire system. It defines:
Plan Structure
Creating a Plan
You can create a plan in several ways:
- Via MCP Tool: Use the
create_plan
tool with a natural language description - Via HTTP API: POST to
/plan
with either JSON or description - Direct File: Create a
plan.json
file in the project root
Plan Examples
Natural Language Description:
Structured JSON:
Deployment
Local Development
For local development, the worker runs using Wrangler:
This starts a local development server at http://localhost:8787
.
Production Deployment
To deploy to Cloudflare Workers:
- Configure Wrangler: Ensure you have a
wrangler.toml
file configured - Set Environment Variables: Configure secrets in Cloudflare Workers dashboard
- Deploy: Run the deployment command
Environment Variables in Production
Set these as secrets in your Cloudflare Workers environment:
Implementation Status
✅ Fully Implemented Features
Core Infrastructure
- MCP server with complete tool integration
- Cloudflare Workers runtime environment
- Token-based authentication and CORS handling
- Background job processing with status tracking
Plan Management
- Plan Creation & Updates: AI-powered plan generation from natural language descriptions
- Plan Storage: Persistent storage in Cloudflare KV
- Search URL Generation: Automatic LinkedIn search URL creation
- Plan Feedback: AI analysis and recommendations for plan improvement
Job Scanning & Analysis
- LinkedIn Scraping: Full Playwright-based job page scraping
- Deep Scanning: Individual job analysis with OpenAI integration
- Background Processing: Non-blocking scan operations with status tracking
- Error Handling: Comprehensive error reporting and failed job analysis
- Fallback Matching: Keyword-based matching when AI is unavailable
Job Index Management
- Persistent Storage: Cloudflare KV-based job index with deduplication
- Job Tracking: Scan status, match scores, and metadata storage
- Index Inspection: Detailed job index viewing with filtering options
- Index Reset: Complete job index cleanup functionality
Email Digest System
- SMTP Integration: Nodemailer-based email sending
- HTML Email Generation: Rich formatting with job details and links
- Auto-digest: Automatic email sending after scan completion
- Job Tracking: Mark jobs as sent to avoid duplicates
Debugging & Monitoring
- Manual Deep Scan: Test individual job URLs for debugging
- Failed Jobs Report: Detailed analysis of scan failures with error categorization
- Status Monitoring: Real-time background job status tracking
Authentication
API endpoints are protected with token-based authentication. Include your ACCESS_TOKEN
in requests:
Troubleshooting
Common Issues
- LinkedIn Authentication: Ensure your LinkedIn credentials are correct and the account isn't locked
- OpenAI API: Verify your API key has sufficient credits and proper permissions
- Email Delivery: Check SMTP settings and ensure the sender email is authorized
- Environment Variables: Verify all required variables are set in your
.env
file
Known Warnings
When testing email functionality, you may see network-related warnings in the Cloudflare Workers environment:
- "Failed to resolve IPv4 addresses with current network"
- "Possible EventEmitter memory leak detected"
These are environmental warnings and don't prevent functionality from working correctly.
Development Tips
- Use the
/health
endpoint to verify the worker is running - Check the browser console for detailed error messages
- Use the mock data endpoints for testing without external dependencies
- Test plan creation with natural language descriptions before implementing complex JSON structures
Architecture Notes
Data Storage
The worker uses Cloudflare KV for persistent storage of job indexes, search plans, and scan history.
CORS Handling
Comprehensive CORS support is included for cross-origin requests from web applications.
SSE Support
Server-Sent Events are supported for real-time updates during long-running operations like job scanning.
Limitations
The Worker implementation has some limitations compared to the Node.js version:
- No Raw File Storage: The Worker cannot store raw HTML or job extraction files due to lack of filesystem access.
- No Screenshots: Screenshot capture is not supported in the Worker environment.
- Limited Storage: Job data is stored in Cloudflare KV, which has size limitations.
This server cannot be installed
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
Helps users find suitable LinkedIn job opportunities by automatically scraping listings, analyzing compatibility with user profiles using AI, and sending custom match reports via email.
Related MCP Servers
- -securityAlicense-qualityA Model Context Protocol (MCP) server that provides tools to interact with LinkedIn's Feeds and Job API. You can do "search for 3 data engineer jobs in . For each job check if it a good match for me by analyzing it against my resume in file resume.md."Last updated -2158PythonThe Unlicense
- -securityFlicense-qualityA Model Context Protocol server that enables seamless interaction with LinkedIn for job applications, profile retrieval, feed browsing, and resume analysis through natural language commands.Last updated -19Python
- AsecurityAlicenseAqualityEnables AI assistants to interact with LinkedIn data through the Model Context Protocol, allowing profile searches, job discovery, messaging, and network analytics.Last updated -28611TypeScriptMIT License
- AsecurityAlicenseAqualityProvides tools to interact with the HireBase Job API, enabling users to search for jobs using various criteria and retrieve detailed job information through natural language.Last updated -27PythonMIT License