MCP Job Search

README.md•11.1 kB

# MCP Job Search [![Deploy to Cloudflare Workers](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/adamd9/mcp-jobsearch) This project implements a LinkedIn job scraper with persistent job indexing, deep scanning, and filtering capabilities using Cloudflare Workers. It scrapes LinkedIn job listings, performs detailed analysis of each job against a candidate profile using OpenAI, stores matches in Cloudflare KV, and exposes MCP-compatible HTTP endpoints. ## Architecture This implementation uses: - **Cloudflare Workers** for serverless execution - **Cloudflare's Playwright fork** for web scraping - **Cloudflare KV** for persistent data storage - **OpenAI API** for job analysis and matching - **MCP (Model Context Protocol)** for tool integration ## Current Status The worker is fully functional and can be run locally. It includes a complete MCP server with authentication, CORS handling, a health check endpoint, and SSE endpoints for real-time updates. Some tools are currently implemented as stubs returning mock data, allowing for testing the end-to-end flow. The plan management tools (`get_plan`, `create_plan`, `update_plan`) and email digest functionality (`send_digest`) are fully implemented. ## Setup ### Prerequisites - Node.js (v18 or later) - npm or yarn - Cloudflare account (for deployment) ### Environment Variables Create a `.env` file in the project root with the following variables: ```env # OpenAI Configuration OPENAI_API_KEY=your-openai-api-key OPENAI_MODEL=gpt-4o # LinkedIn Credentials (for scraping) LINKEDIN_EMAIL=your-linkedin-email@example.com LINKEDIN_PASSWORD=your-linkedin-password # Email Configuration (for digests) SMTP_HOST=smtp.example.com SMTP_PORT=587 SMTP_USER=your-smtp-username SMTP_PASS=your-smtp-password DIGEST_FROM=jobs@example.com DIGEST_TO=you@example.com # Application Settings TIMEZONE=Australia/Sydney ACCESS_TOKEN=your-secure-random-token DEEP_SCAN_CONCURRENCY=2 ``` The `ACCESS_TOKEN` is used for API authentication and should be a secure random string. ### Installation 1. Install dependencies: ```bash npm install ``` 2. Set up your environment variables in `.env` 3. Create a job search plan (see Plan Management section below) ### Running the Worker Start the development server: ```bash npm run dev ``` The worker will be available at `http://localhost:8787`. ## Core Features ### Plan-Driven Search Define your profile, search terms, and scan criteria in a job search plan. The system uses this plan to: - Target relevant job searches on LinkedIn - Analyze job matches against your profile using OpenAI - Score jobs based on fit and requirements ### Persistent Job Index All scraped jobs are stored persistently in Cloudflare KV with: - Job details and metadata - Match scores and analysis - Scan history and timestamps - Deduplication to avoid processing the same job twice ### Deep Scanning Visits each job posting to extract comprehensive details: - Full job description and requirements - Company information and culture - Salary and benefits information - AI-powered analysis against your profile ### Email Digests Automated email summaries of your best job matches: - Configurable match score thresholds - Rich HTML formatting with job details - Direct links to job postings - Scheduled delivery options ## API Reference ### MCP Tools The following tools are available via the MCP server: #### Plan Management - **`get_plan`**: Get the current job search plan - **`create_plan`**: Create a new job search plan from a description - **`update_plan`**: Update the existing job search plan #### Job Scanning & Analysis - **`scan`**: Scan LinkedIn job pages using Playwright - if URL provided, scans that page; otherwise uses plan URLs - **`rescan`**: Rescan existing jobs using URLs from the last scan or current plan - **`deep_scan_job`**: Manually deep scan a specific LinkedIn job URL for testing and debugging - **`failed_jobs`**: Get a report of jobs that failed during deep scanning with error analysis #### Job Index Management - **`get_job_index`**: Get the current raw job index data for inspection (with filtering options) - **`reset_job_index`**: Reset the job index to start fresh - removes all stored jobs #### System Operations - **`status`**: Check the status of background jobs (scan progress, errors, etc.) - **`send_digest`**: Send digest email with job matches to specified email address ### HTTP Endpoints The worker exposes HTTP endpoints for direct API access: #### Core Endpoints - `GET /health` - Health check endpoint (no authentication required) - `POST /mcp` - MCP server endpoint (handles all tool calls with authentication) **Note**: All MCP tools are accessed via the `/mcp` endpoint using the MCP protocol. The worker uses token-based authentication for the MCP endpoint. ## Plan Management The job search plan is the core configuration that drives the entire system. It defines: ### Plan Structure ```json { "profile": { "name": "Your Name", "experience": "Senior Software Engineer with 8+ years...", "skills": ["JavaScript", "React", "Node.js", "AWS"], "preferences": { "remote": true, "location": "Sydney, Australia", "salary_min": 120000 } }, "searches": [ { "keywords": "Senior Software Engineer React", "location": "Sydney", "filters": { "experience_level": "mid_senior", "job_type": "full_time" } } ], "scan_prompt": "Analyze this job posting for a senior software engineer..." } ``` ### Creating a Plan You can create a plan in several ways: 1. **Via MCP Tool**: Use the `create_plan` tool with a natural language description 2. **Via HTTP API**: POST to `/plan` with either JSON or description 3. **Direct File**: Create a `plan.json` file in the project root ### Plan Examples **Natural Language Description**: ``` I'm a senior full-stack developer with 8 years experience in React, Node.js, and AWS. I'm looking for remote senior engineer roles in fintech or healthcare, preferably $120k+ with equity options. ``` **Structured JSON**: ```json { "profile": { "name": "Senior Developer", "experience": "8+ years full-stack development", "skills": ["React", "Node.js", "AWS", "TypeScript"] }, "searches": [ { "keywords": "Senior Full Stack Engineer", "location": "Remote" } ] } ``` ## Deployment ### Local Development For local development, the worker runs using Wrangler: ```bash npm run dev ``` This starts a local development server at `http://localhost:8787`. ### Production Deployment To deploy to Cloudflare Workers: 1. **Configure Wrangler**: Ensure you have a `wrangler.toml` file configured 2. **Set Environment Variables**: Configure secrets in Cloudflare Workers dashboard 3. **Deploy**: Run the deployment command ```bash npm run deploy ``` ### Environment Variables in Production Set these as secrets in your Cloudflare Workers environment: ```bash wrangler secret put OPENAI_API_KEY wrangler secret put LINKEDIN_EMAIL wrangler secret put LINKEDIN_PASSWORD wrangler secret put SMTP_HOST wrangler secret put SMTP_USER wrangler secret put SMTP_PASS wrangler secret put ACCESS_TOKEN ``` ## Implementation Status ### ✅ Fully Implemented Features #### Core Infrastructure - MCP server with complete tool integration - Cloudflare Workers runtime environment - Token-based authentication and CORS handling - Background job processing with status tracking #### Plan Management - **Plan Creation & Updates**: AI-powered plan generation from natural language descriptions - **Plan Storage**: Persistent storage in Cloudflare KV - **Search URL Generation**: Automatic LinkedIn search URL creation - **Plan Feedback**: AI analysis and recommendations for plan improvement #### Job Scanning & Analysis - **LinkedIn Scraping**: Full Playwright-based job page scraping - **Deep Scanning**: Individual job analysis with OpenAI integration - **Background Processing**: Non-blocking scan operations with status tracking - **Error Handling**: Comprehensive error reporting and failed job analysis - **Fallback Matching**: Keyword-based matching when AI is unavailable #### Job Index Management - **Persistent Storage**: Cloudflare KV-based job index with deduplication - **Job Tracking**: Scan status, match scores, and metadata storage - **Index Inspection**: Detailed job index viewing with filtering options - **Index Reset**: Complete job index cleanup functionality #### Email Digest System - **SMTP Integration**: Nodemailer-based email sending - **HTML Email Generation**: Rich formatting with job details and links - **Auto-digest**: Automatic email sending after scan completion - **Job Tracking**: Mark jobs as sent to avoid duplicates #### Debugging & Monitoring - **Manual Deep Scan**: Test individual job URLs for debugging - **Failed Jobs Report**: Detailed analysis of scan failures with error categorization - **Status Monitoring**: Real-time background job status tracking ### Authentication API endpoints are protected with token-based authentication. Include your `ACCESS_TOKEN` in requests: ```bash curl -H "Authorization: Bearer your-access-token" http://localhost:8787/status ``` ## Troubleshooting ### Common Issues 1. **LinkedIn Authentication**: Ensure your LinkedIn credentials are correct and the account isn't locked 2. **OpenAI API**: Verify your API key has sufficient credits and proper permissions 3. **Email Delivery**: Check SMTP settings and ensure the sender email is authorized 4. **Environment Variables**: Verify all required variables are set in your `.env` file ### Known Warnings When testing email functionality, you may see network-related warnings in the Cloudflare Workers environment: - "Failed to resolve IPv4 addresses with current network" - "Possible EventEmitter memory leak detected" These are environmental warnings and don't prevent functionality from working correctly. ### Development Tips - Use the `/health` endpoint to verify the worker is running - Check the browser console for detailed error messages - Use the mock data endpoints for testing without external dependencies - Test plan creation with natural language descriptions before implementing complex JSON structures ## Architecture Notes ### Data Storage The worker uses Cloudflare KV for persistent storage of job indexes, search plans, and scan history. ### CORS Handling Comprehensive CORS support is included for cross-origin requests from web applications. ### SSE Support Server-Sent Events are supported for real-time updates during long-running operations like job scanning. ## Limitations The Worker implementation has some limitations compared to the Node.js version: 1. **No Raw File Storage**: The Worker cannot store raw HTML or job extraction files due to lack of filesystem access. 2. **No Screenshots**: Screenshot capture is not supported in the Worker environment. 3. **Limited Storage**: Job data is stored in Cloudflare KV, which has size limitations.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/adamd9/mcp-jobsearch'

If you have feedback or need assistance with the MCP directory API, please join our Discord server