Skip to main content
Glama
francisco-perez-sorrosal

LinkedIn MCP Server

get_new_job_ids

Retrieve new job IDs from a specified LinkedIn URL by exploring a defined number of pages. Extracts and returns a list of job IDs for further analysis or tracking.

Instructions

Gets the new job ids retrieved from the LinkedIn url passed as a parameter, exploring
the number of pages specified.

Args:
    url: The URL to search for jobs in LinkedIn
    num_pages: The number of pages to retrieve ids from
    
Returns:
    A list with the new job IDs retrieved from the explored pages from the URL

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
num_pagesNo
urlYes

Implementation Reference

  • MCP tool handler that scrapes job listings from given LinkedIn URL across multiple pages, filters previously scraped job IDs using cache, and returns new job IDs as comma-separated string.
    @mcp.tool()
    def get_new_job_ids(url: str, num_pages: int = 1) -> str:
        """
        Gets the new job ids retrieved from the LinkedIn url passed as a parameter, exploring
        the number of pages specified.
        
        Args:
            url: The URL to search for jobs in LinkedIn (required)
            num_pages: The number of pages to retrieve ids from (1-5 recommended)
            
        Returns:
            str: Comma-separated list of new job IDs retrieved from the explored pages
        """    
        if not isinstance(num_pages, int) or num_pages < 1 or num_pages > 10:
            logger.warning(f"Invalid num_pages {num_pages}, using default 1")
            num_pages = 1
            
        logger.info(f"Fetching job listings from LinkedIn URL: {url[:100]}...")
        
        all_job_ids = extractor.retrieve_job_ids_from_linkedin(base_url=url, max_pages=num_pages)
        new_job_ids = extractor.get_new_job_ids(all_job_ids)
        
        logger.info(f"Found {len(new_job_ids)} new jobs to process")
        
        if not new_job_ids:
            return "No new job IDs found. All jobs may have been previously processed."
        
        return ",".join(new_job_ids)
  • Helper method in JobPostingExtractor that identifies new job IDs by excluding those already present in the scrape cache.
    def get_new_job_ids(self, job_ids: List[str]) -> List[str]:
        """
        Filter out job IDs that have already been scraped.
        
        Args:
            job_ids: List of job IDs to check
            
        Returns:
            List of job IDs that haven't been scraped yet
        """            
        scraped_ids = set(self.get_scraped_job_ids())
        logger.info(f"Found {len(scraped_ids)} scraped job IDs")
        logger.debug(f"Scraped job IDs: {scraped_ids}")
        new_job_ids = [job_id for job_id in job_ids if job_id not in scraped_ids]
        
        logger.info(f"Found {len(new_job_ids)} new jobs out of {len(job_ids)} total")
        return new_job_ids
  • Helper method in JobPostingExtractor that scrapes job listing pages from LinkedIn to extract job IDs using requests and BeautifulSoup parsing.
    def retrieve_job_ids_from_linkedin(self, base_url: str = JOB_RETRIEVAL_URL, max_pages: int = 5) -> List[str]:
        """
        Retrieve job IDs from LinkedIn using requests and BeautifulSoup.
        
        Args:
            max_pages: Maximum number of pages to scrape
            
        Returns:
            List of job IDs found
        """
        logger.info(f"Starting job retrieval from LinkedIn\n({JOB_RETRIEVAL_URL})")
        start_time = time.time()
        
        all_job_ids: Set[str] = set()
        jobs_per_page = 10
        
        url_with_pagination = base_url + "&start={}"
        for page in range(max_pages):
            try:
                start_idx = page * jobs_per_page                
                url = url_with_pagination.format(start_idx)
                logger.info(f"Scraping job listings page {page + 1}: {url}")
                
                # Add random delay between requests
                time.sleep(random.uniform(1, 3))
                
                res = requests.get(url)
                soup = BeautifulSoup(res.text, 'html.parser')
                
                for element in soup.find_all(attrs={"data-entity-urn": True}):
                    if not isinstance(element, bs4.element.Tag):
                        continue
                    entity_urn = element.attrs.get("data-entity-urn")
                    if isinstance(entity_urn, str) and entity_urn.startswith("urn:li:jobPosting:"):
                        job_id = entity_urn.split(":")[-1]
                        if job_id.isdigit():
                            all_job_ids.add(job_id)
                            logger.info(f"Found job ID: {job_id}")
                            
            except Exception as e:
                logger.error(f"Error scraping job listings page {page + 1}: {e}")
                continue
        
        duration = time.time() - start_time
        logger.info(f"Found {len(all_job_ids)} unique job IDs in {duration:.2f} seconds")
        return list(all_job_ids)
  • Registration of the 'get_new_job_ids' tool using FastMCP @mcp.tool() decorator.
    @mcp.tool()

Tool Definition Quality

Score is being calculated. Check back soon.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/francisco-perez-sorrosal/linkedin-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server