get_new_job_ids
Retrieve new job IDs from a specified LinkedIn URL by exploring a defined number of pages. Extracts and returns a list of job IDs for further analysis or tracking.
Instructions
Gets the new job ids retrieved from the LinkedIn url passed as a parameter, exploring
the number of pages specified.
Args:
url: The URL to search for jobs in LinkedIn
num_pages: The number of pages to retrieve ids from
Returns:
A list with the new job IDs retrieved from the explored pages from the URL
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| num_pages | No | ||
| url | Yes |
Implementation Reference
- src/linkedin_mcp_server/main.py:109-136 (handler)MCP tool handler that scrapes job listings from given LinkedIn URL across multiple pages, filters previously scraped job IDs using cache, and returns new job IDs as comma-separated string.@mcp.tool() def get_new_job_ids(url: str, num_pages: int = 1) -> str: """ Gets the new job ids retrieved from the LinkedIn url passed as a parameter, exploring the number of pages specified. Args: url: The URL to search for jobs in LinkedIn (required) num_pages: The number of pages to retrieve ids from (1-5 recommended) Returns: str: Comma-separated list of new job IDs retrieved from the explored pages """ if not isinstance(num_pages, int) or num_pages < 1 or num_pages > 10: logger.warning(f"Invalid num_pages {num_pages}, using default 1") num_pages = 1 logger.info(f"Fetching job listings from LinkedIn URL: {url[:100]}...") all_job_ids = extractor.retrieve_job_ids_from_linkedin(base_url=url, max_pages=num_pages) new_job_ids = extractor.get_new_job_ids(all_job_ids) logger.info(f"Found {len(new_job_ids)} new jobs to process") if not new_job_ids: return "No new job IDs found. All jobs may have been previously processed." return ",".join(new_job_ids)
- Helper method in JobPostingExtractor that identifies new job IDs by excluding those already present in the scrape cache.def get_new_job_ids(self, job_ids: List[str]) -> List[str]: """ Filter out job IDs that have already been scraped. Args: job_ids: List of job IDs to check Returns: List of job IDs that haven't been scraped yet """ scraped_ids = set(self.get_scraped_job_ids()) logger.info(f"Found {len(scraped_ids)} scraped job IDs") logger.debug(f"Scraped job IDs: {scraped_ids}") new_job_ids = [job_id for job_id in job_ids if job_id not in scraped_ids] logger.info(f"Found {len(new_job_ids)} new jobs out of {len(job_ids)} total") return new_job_ids
- Helper method in JobPostingExtractor that scrapes job listing pages from LinkedIn to extract job IDs using requests and BeautifulSoup parsing.def retrieve_job_ids_from_linkedin(self, base_url: str = JOB_RETRIEVAL_URL, max_pages: int = 5) -> List[str]: """ Retrieve job IDs from LinkedIn using requests and BeautifulSoup. Args: max_pages: Maximum number of pages to scrape Returns: List of job IDs found """ logger.info(f"Starting job retrieval from LinkedIn\n({JOB_RETRIEVAL_URL})") start_time = time.time() all_job_ids: Set[str] = set() jobs_per_page = 10 url_with_pagination = base_url + "&start={}" for page in range(max_pages): try: start_idx = page * jobs_per_page url = url_with_pagination.format(start_idx) logger.info(f"Scraping job listings page {page + 1}: {url}") # Add random delay between requests time.sleep(random.uniform(1, 3)) res = requests.get(url) soup = BeautifulSoup(res.text, 'html.parser') for element in soup.find_all(attrs={"data-entity-urn": True}): if not isinstance(element, bs4.element.Tag): continue entity_urn = element.attrs.get("data-entity-urn") if isinstance(entity_urn, str) and entity_urn.startswith("urn:li:jobPosting:"): job_id = entity_urn.split(":")[-1] if job_id.isdigit(): all_job_ids.add(job_id) logger.info(f"Found job ID: {job_id}") except Exception as e: logger.error(f"Error scraping job listings page {page + 1}: {e}") continue duration = time.time() - start_time logger.info(f"Found {len(all_job_ids)} unique job IDs in {duration:.2f} seconds") return list(all_job_ids)
- src/linkedin_mcp_server/main.py:109-109 (registration)Registration of the 'get_new_job_ids' tool using FastMCP @mcp.tool() decorator.@mcp.tool()