Skip to main content
Glama

get_new_job_ids

Retrieve new job IDs from a specified LinkedIn URL by exploring a defined number of pages. Extracts and returns a list of job IDs for further analysis or tracking.

Instructions

Gets the new job ids retrieved from the LinkedIn url passed as a parameter, exploring the number of pages specified. Args: url: The URL to search for jobs in LinkedIn num_pages: The number of pages to retrieve ids from Returns: A list with the new job IDs retrieved from the explored pages from the URL

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
num_pagesNo
urlYes

Implementation Reference

  • MCP tool handler that scrapes job listings from given LinkedIn URL across multiple pages, filters previously scraped job IDs using cache, and returns new job IDs as comma-separated string.
    @mcp.tool() def get_new_job_ids(url: str, num_pages: int = 1) -> str: """ Gets the new job ids retrieved from the LinkedIn url passed as a parameter, exploring the number of pages specified. Args: url: The URL to search for jobs in LinkedIn (required) num_pages: The number of pages to retrieve ids from (1-5 recommended) Returns: str: Comma-separated list of new job IDs retrieved from the explored pages """ if not isinstance(num_pages, int) or num_pages < 1 or num_pages > 10: logger.warning(f"Invalid num_pages {num_pages}, using default 1") num_pages = 1 logger.info(f"Fetching job listings from LinkedIn URL: {url[:100]}...") all_job_ids = extractor.retrieve_job_ids_from_linkedin(base_url=url, max_pages=num_pages) new_job_ids = extractor.get_new_job_ids(all_job_ids) logger.info(f"Found {len(new_job_ids)} new jobs to process") if not new_job_ids: return "No new job IDs found. All jobs may have been previously processed." return ",".join(new_job_ids)
  • Helper method in JobPostingExtractor that identifies new job IDs by excluding those already present in the scrape cache.
    def get_new_job_ids(self, job_ids: List[str]) -> List[str]: """ Filter out job IDs that have already been scraped. Args: job_ids: List of job IDs to check Returns: List of job IDs that haven't been scraped yet """ scraped_ids = set(self.get_scraped_job_ids()) logger.info(f"Found {len(scraped_ids)} scraped job IDs") logger.debug(f"Scraped job IDs: {scraped_ids}") new_job_ids = [job_id for job_id in job_ids if job_id not in scraped_ids] logger.info(f"Found {len(new_job_ids)} new jobs out of {len(job_ids)} total") return new_job_ids
  • Helper method in JobPostingExtractor that scrapes job listing pages from LinkedIn to extract job IDs using requests and BeautifulSoup parsing.
    def retrieve_job_ids_from_linkedin(self, base_url: str = JOB_RETRIEVAL_URL, max_pages: int = 5) -> List[str]: """ Retrieve job IDs from LinkedIn using requests and BeautifulSoup. Args: max_pages: Maximum number of pages to scrape Returns: List of job IDs found """ logger.info(f"Starting job retrieval from LinkedIn\n({JOB_RETRIEVAL_URL})") start_time = time.time() all_job_ids: Set[str] = set() jobs_per_page = 10 url_with_pagination = base_url + "&start={}" for page in range(max_pages): try: start_idx = page * jobs_per_page url = url_with_pagination.format(start_idx) logger.info(f"Scraping job listings page {page + 1}: {url}") # Add random delay between requests time.sleep(random.uniform(1, 3)) res = requests.get(url) soup = BeautifulSoup(res.text, 'html.parser') for element in soup.find_all(attrs={"data-entity-urn": True}): if not isinstance(element, bs4.element.Tag): continue entity_urn = element.attrs.get("data-entity-urn") if isinstance(entity_urn, str) and entity_urn.startswith("urn:li:jobPosting:"): job_id = entity_urn.split(":")[-1] if job_id.isdigit(): all_job_ids.add(job_id) logger.info(f"Found job ID: {job_id}") except Exception as e: logger.error(f"Error scraping job listings page {page + 1}: {e}") continue duration = time.time() - start_time logger.info(f"Found {len(all_job_ids)} unique job IDs in {duration:.2f} seconds") return list(all_job_ids)
  • Registration of the 'get_new_job_ids' tool using FastMCP @mcp.tool() decorator.
    @mcp.tool()

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/francisco-perez-sorrosal/linkedin-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server