Uses environment variables from a .env file for configuration including credentials, API keys, and service endpoints
Manages project dependencies and provides scripts for testing and running the application
Analyzes job details against candidate profiles to generate match scores and explanations using models like GPT-4o
MCP Job Search
This project implements a LinkedIn job scraper with persistent job indexing, deep scanning, and filtering capabilities using Cloudflare Workers. It scrapes LinkedIn job listings, performs detailed analysis of each job against a candidate profile using OpenAI, stores matches in Cloudflare KV, and exposes MCP-compatible HTTP endpoints.
Architecture
This implementation uses:
Cloudflare Workers for serverless execution
Cloudflare's Playwright fork for web scraping
Cloudflare KV for persistent data storage
OpenAI API for job analysis and matching
MCP (Model Context Protocol) for tool integration
Current Status
The worker is fully functional and can be run locally. It includes a complete MCP server with authentication, CORS handling, a health check endpoint, and SSE endpoints for real-time updates.
Some tools are currently implemented as stubs returning mock data, allowing for testing the end-to-end flow. The plan management tools (get_plan
, create_plan
, update_plan
) and email digest functionality (send_digest
) are fully implemented.
Setup
Prerequisites
Node.js (v18 or later)
npm or yarn
Cloudflare account (for deployment)
Environment Variables
Create a .env
file in the project root with the following variables:
The ACCESS_TOKEN
is used for API authentication and should be a secure random string.
Installation
Install dependencies:
npm installSet up your environment variables in
.env
Create a job search plan (see Plan Management section below)
Running the Worker
Start the development server:
The worker will be available at http://localhost:8787
.
Core Features
Plan-Driven Search
Define your profile, search terms, and scan criteria in a job search plan. The system uses this plan to:
Target relevant job searches on LinkedIn
Analyze job matches against your profile using OpenAI
Score jobs based on fit and requirements
Persistent Job Index
All scraped jobs are stored persistently in Cloudflare KV with:
Job details and metadata
Match scores and analysis
Scan history and timestamps
Deduplication to avoid processing the same job twice
Deep Scanning
Visits each job posting to extract comprehensive details:
Full job description and requirements
Company information and culture
Salary and benefits information
AI-powered analysis against your profile
Email Digests
Automated email summaries of your best job matches:
Configurable match score thresholds
Rich HTML formatting with job details
Direct links to job postings
Scheduled delivery options
API Reference
MCP Tools
The following tools are available via the MCP server:
Plan Management
get_plan
: Get the current job search plancreate_plan
: Create a new job search plan from a descriptionupdate_plan
: Update the existing job search plan
Job Scanning & Analysis
scan
: Scan LinkedIn job pages using Playwright - if URL provided, scans that page; otherwise uses plan URLsrescan
: Rescan existing jobs using URLs from the last scan or current plandeep_scan_job
: Manually deep scan a specific LinkedIn job URL for testing and debuggingfailed_jobs
: Get a report of jobs that failed during deep scanning with error analysis
Job Index Management
get_job_index
: Get the current raw job index data for inspection (with filtering options)reset_job_index
: Reset the job index to start fresh - removes all stored jobs
System Operations
status
: Check the status of background jobs (scan progress, errors, etc.)send_digest
: Send digest email with job matches to specified email address
HTTP Endpoints
The worker exposes HTTP endpoints for direct API access:
Core Endpoints
GET /health
- Health check endpoint (no authentication required)POST /mcp
- MCP server endpoint (handles all tool calls with authentication)
Note: All MCP tools are accessed via the /mcp
endpoint using the MCP protocol. The worker uses token-based authentication for the MCP endpoint.
Plan Management
The job search plan is the core configuration that drives the entire system. It defines:
Plan Structure
Creating a Plan
You can create a plan in several ways:
Via MCP Tool: Use the
create_plan
tool with a natural language descriptionVia HTTP API: POST to
/plan
with either JSON or descriptionDirect File: Create a
plan.json
file in the project root
Plan Examples
Natural Language Description:
Structured JSON:
Deployment
Local Development
For local development, the worker runs using Wrangler:
This starts a local development server at http://localhost:8787
.
Production Deployment
To deploy to Cloudflare Workers:
Configure Wrangler: Ensure you have a
wrangler.toml
file configuredSet Environment Variables: Configure secrets in Cloudflare Workers dashboard
Deploy: Run the deployment command
Environment Variables in Production
Set these as secrets in your Cloudflare Workers environment:
Implementation Status
✅ Fully Implemented Features
Core Infrastructure
MCP server with complete tool integration
Cloudflare Workers runtime environment
Token-based authentication and CORS handling
Background job processing with status tracking
Plan Management
Plan Creation & Updates: AI-powered plan generation from natural language descriptions
Plan Storage: Persistent storage in Cloudflare KV
Search URL Generation: Automatic LinkedIn search URL creation
Plan Feedback: AI analysis and recommendations for plan improvement
Job Scanning & Analysis
LinkedIn Scraping: Full Playwright-based job page scraping
Deep Scanning: Individual job analysis with OpenAI integration
Background Processing: Non-blocking scan operations with status tracking
Error Handling: Comprehensive error reporting and failed job analysis
Fallback Matching: Keyword-based matching when AI is unavailable
Job Index Management
Persistent Storage: Cloudflare KV-based job index with deduplication
Job Tracking: Scan status, match scores, and metadata storage
Index Inspection: Detailed job index viewing with filtering options
Index Reset: Complete job index cleanup functionality
Email Digest System
SMTP Integration: Nodemailer-based email sending
HTML Email Generation: Rich formatting with job details and links
Auto-digest: Automatic email sending after scan completion
Job Tracking: Mark jobs as sent to avoid duplicates
Debugging & Monitoring
Manual Deep Scan: Test individual job URLs for debugging
Failed Jobs Report: Detailed analysis of scan failures with error categorization
Status Monitoring: Real-time background job status tracking
Authentication
API endpoints are protected with token-based authentication. Include your ACCESS_TOKEN
in requests:
Troubleshooting
Common Issues
LinkedIn Authentication: Ensure your LinkedIn credentials are correct and the account isn't locked
OpenAI API: Verify your API key has sufficient credits and proper permissions
Email Delivery: Check SMTP settings and ensure the sender email is authorized
Environment Variables: Verify all required variables are set in your
.env
file
Known Warnings
When testing email functionality, you may see network-related warnings in the Cloudflare Workers environment:
"Failed to resolve IPv4 addresses with current network"
"Possible EventEmitter memory leak detected"
These are environmental warnings and don't prevent functionality from working correctly.
Development Tips
Use the
/health
endpoint to verify the worker is runningCheck the browser console for detailed error messages
Use the mock data endpoints for testing without external dependencies
Test plan creation with natural language descriptions before implementing complex JSON structures
Architecture Notes
Data Storage
The worker uses Cloudflare KV for persistent storage of job indexes, search plans, and scan history.
CORS Handling
Comprehensive CORS support is included for cross-origin requests from web applications.
SSE Support
Server-Sent Events are supported for real-time updates during long-running operations like job scanning.
Limitations
The Worker implementation has some limitations compared to the Node.js version:
No Raw File Storage: The Worker cannot store raw HTML or job extraction files due to lack of filesystem access.
No Screenshots: Screenshot capture is not supported in the Worker environment.
Limited Storage: Job data is stored in Cloudflare KV, which has size limitations.
This server cannot be installed
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
Helps users find suitable LinkedIn job opportunities by automatically scraping listings, analyzing compatibility with user profiles using AI, and sending custom match reports via email.
Related MCP Servers
- AsecurityAlicenseAqualityEnables AI assistants to interact with LinkedIn data through the Model Context Protocol, allowing profile searches, job discovery, messaging, and network analytics.Last updated -282217MIT License
- AsecurityAlicenseAqualityProvides tools to interact with the HireBase Job API, enabling users to search for jobs using various criteria and retrieve detailed job information through natural language.Last updated -27MIT License
- -securityFlicense-qualityA server that enables AI assistants to interact with LinkedIn programmatically for job searching, resume/cover letter generation, and managing job applications through standardized JSON-RPC requests.Last updated -10
- -securityAlicense-qualityEnables users to fetch, analyze, and manage LinkedIn posts data through tools that retrieve profiles, search posts by keywords, filter by date, and identify top-performing content based on engagement metrics.Last updated -MIT License