Skip to main content
Glama
gscfwid

NCCN Guidelines MCP Server

by gscfwid

download_pdf

Download PDF files from URLs with optional NCCN authentication for accessing clinical guidelines.

Instructions

Download a PDF file from the specified URL, with optional NCCN login credentials.

Args:
    url: The URL of the PDF file to download
    filename: Optional custom filename for the downloaded file
    username: Optional NCCN username/email for authentication (defaults to NCCN_USERNAME env var)
    password: Optional NCCN password for authentication (defaults to NCCN_PASSWORD env var)

Returns:
    String indicating success/failure and the path to the downloaded file

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • MCP tool handler for 'download_pdf': orchestrates authentication, directory setup, and delegates download to NCCNDownloader instance.
    @mcp.tool()
    async def download_pdf(url: str) -> str:
        """
        Download a PDF file from the specified URL, with optional NCCN login credentials.
        
        Args:
            url: The URL of the PDF file to download
            filename: Optional custom filename for the downloaded file
            username: Optional NCCN username/email for authentication (defaults to NCCN_USERNAME env var)
            password: Optional NCCN password for authentication (defaults to NCCN_PASSWORD env var)
        
        Returns:
            String indicating success/failure and the path to the downloaded file
        """
        try:
            # Ensure download directory exists
            download_path = current_dir / DOWNLOAD_DIR
            download_path.mkdir(exist_ok=True)
            
            # Use provided credentials or fall back to global configuration
            auth_username = NCCN_USERNAME
            auth_password = NCCN_PASSWORD
            
            # Create downloader instance with credentials if available
            if auth_username and auth_password:
                downloader_instance = NCCNDownloader(auth_username, auth_password)
                logger.info(f"Using NCCN authentication for user: {auth_username}")
            else:
                downloader_instance = downloader
                logger.info("No NCCN authentication configured - attempting anonymous download")
            
            # Download the PDF
            success, actual_filename = await downloader_instance.download_pdf(
                pdf_url=url,
                download_dir=str(download_path),
                username=auth_username,
                password=auth_password,
                skip_if_exists=True
            )
            
            # Update the full path with the actual filename used
            actual_full_path = download_path / actual_filename
            
            if success:
                logger.info(f"PDF downloaded successfully: {actual_full_path}")
                return f"PDF downloaded successfully: {actual_full_path} (filename: {actual_filename})"
            else:
                error_msg = f"Failed to download PDF from {url} (attempted filename: {actual_filename})."
                if not (auth_username and auth_password):
                    error_msg += " You may need to provide NCCN login credentials via environment variables (NCCN_USERNAME, NCCN_PASSWORD) or function parameters."
                logger.error(error_msg)
                return error_msg
        
        except Exception as e:
            logger.error(f"Error downloading PDF: {str(e)}")
            return f"Error downloading PDF: {str(e)}"
  • Core implementation of PDF downloading in NCCNDownloader class, including caching check, HTTP requests, automatic login detection and handling, and recursive re-download after login.
    async def download_pdf(self, pdf_url, download_dir=None, username=None, password=None, skip_if_exists=True, max_cache_age_days=PDF_CACHE_MAX_AGE_DAYS):
        """
        Downloads a PDF file, automatically logging in if required.
        
        Args:
            pdf_url (str): URL of the PDF file.
            download_dir (str, optional): Directory to save the PDF. Defaults to current directory.
            username (str, optional): Username (email address), required if not already logged in.
            password (str, optional): Password, required if not already logged in.
            skip_if_exists (bool): Whether to skip download if the file already exists. Defaults to True.
            max_cache_age_days (int): Maximum cache file validity period (days). Defaults to PDF_CACHE_MAX_AGE_DAYS.
        
        Returns:
            tuple: (success (bool), saved_filename (str))
        """
        try:
            # Automatically extract filename from URL
            filename = os.path.basename(pdf_url)
            if not filename or not filename.endswith('.pdf'):
                filename = 'nccn_guideline.pdf'
            
            if download_dir:
                os.makedirs(download_dir, exist_ok=True)
            else:
                download_dir = os.getcwd() # Use current working directory if not specified
            
            save_path = os.path.join(download_dir, filename)
            
            # Check if file already exists and is still valid (not too old)
            if skip_if_exists:
                cache_info = check_pdf_cache_age(save_path, max_cache_age_days)
                if cache_info['exists']:
                    if cache_info['is_valid']:
                        logger.info(f"Using valid cached PDF: {save_path}")
                        logger.info(f"File size: {cache_info['size']} bytes, age: {cache_info['age_days']} days")
                        return True, filename
                    else:
                        logger.info(f"PDF cache expired ({cache_info['age_days']} days > {max_cache_age_days} days) or corrupted, re-downloading...")
                else:
                    logger.info(f"PDF not found in cache, downloading: {save_path}")
            
            logger.info(f"Downloading PDF: {pdf_url}")
            
            # Set request headers for PDF download
            pdf_headers = {
                'Accept': 'application/pdf,*/*',
                'Referer': 'https://www.nccn.org/',
            }
            
            # First, make a regular GET request to check the response
            response = await self.session.get(pdf_url, headers=pdf_headers, follow_redirects=True)
            
            logger.info(f"Response status: {response.status_code}")
            logger.info(f"Final URL: {response.url}")
            
            # Check if we were redirected to a login page
            if response.status_code == 200:
                content_type = response.headers.get('Content-Type', '')
                logger.info(f"Content-Type: {content_type}")
                
                # Check if this is actually a PDF
                if 'application/pdf' in content_type:
                    # This is a PDF, save it directly
                    with open(save_path, 'wb') as f:
                        f.write(response.content)
                    
                    file_size = os.path.getsize(save_path)
                    logger.info(f"PDF file saved to: {save_path}")
                    logger.info(f"File size: {file_size} bytes")
                    return True, filename
                
                elif 'text/html' in content_type:
                    # This is HTML, likely a login page
                    response_text = response.text
                    
                    if 'login' in response_text.lower() or 'log in' in response_text.lower():
                        logger.info("Login required detected, attempting automatic login...")
                        
                        # If login credentials are provided, attempt to log in
                        login_username = username or self.username
                        login_password = password or self.password
                        
                        if login_username and login_password:
                            if await self.login(login_username, login_password, pdf_url):
                                logger.info("Login successful, re-downloading PDF...")
                                time.sleep(1)  # Wait for login state to stabilize
                                # Recursive call, but do not pass login credentials to avoid infinite loop
                                return await self.download_pdf(pdf_url, download_dir=download_dir, skip_if_exists=skip_if_exists, max_cache_age_days=max_cache_age_days)
                            else:
                                logger.error("Automatic login failed.")
                                return False, filename
                        else:
                            logger.error("Login required but username and password not provided.")
                            return False, filename
                    else:
                        logger.warning("Received HTML response but no login form detected.")
                        logger.debug(f"Response preview: {response_text[:500]}...")
                        return False, filename
                else:
                    logger.warning(f"Unexpected content type: {content_type}")
                    return False, filename
            
            elif response.status_code == 302:
                # Handle redirect manually if needed
                redirect_url = response.headers.get('Location')
                logger.info(f"Received redirect to: {redirect_url}")
                
                # Check if redirect is to login page
                if redirect_url and 'login' in redirect_url.lower():
                    logger.info("Redirected to login page, attempting automatic login...")
                    
                    login_username = username or self.username
                    login_password = password or self.password
                    
                    if login_username and login_password:
                        if await self.login(login_username, login_password, pdf_url):
                            logger.info("Login successful, re-downloading PDF...")
                            time.sleep(1)
                            return await self.download_pdf(pdf_url, download_dir=download_dir, skip_if_exists=skip_if_exists, max_cache_age_days=max_cache_age_days)
                        else:
                            logger.error("Automatic login failed.")
                            return False, filename
                    else:
                        logger.error("Login required but username and password not provided.")
                        return False, filename
                else:
                    logger.error(f"Unexpected redirect to: {redirect_url}")
                    return False, filename
            
            else:
                logger.error(f"Download failed, status code: {response.status_code}")
                return False, filename
                
        except Exception as e:
            logger.error(f"An error occurred during download: {str(e)}")
            return False, filename
  • server.py:167-167 (registration)
    The @mcp.tool() decorator registers the download_pdf function as an MCP tool.
    @mcp.tool()
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions authentication requirements (NCCN credentials with environment variable fallbacks) and the return value format, which adds useful context beyond basic functionality. However, it lacks details on error handling, network timeouts, file size limits, or whether the operation is idempotent, which are important for a download tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a clear opening sentence followed by organized 'Args' and 'Returns' sections. Each sentence adds value, though the parameter explanations could be more concise. The front-loaded purpose statement is effective, but the bullet-point style might be slightly verbose for a tool with schema documentation gaps.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (download with optional auth), no annotations, low schema coverage (0%), but presence of an output schema, the description is partially complete. It covers authentication needs and return format, but lacks details on error cases, performance characteristics, or how it interacts with sibling tools. The output schema existence reduces the need to fully document returns, but other behavioral aspects remain underspecified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage with only one parameter ('url') documented structurally. The description adds semantic meaning for 'url' and introduces three additional parameters ('filename', 'username', 'password') not present in the schema, creating inconsistency. While it explains these parameters' purposes and defaults, the mismatch between schema and description reduces reliability, and it doesn't fully compensate for the schema's lack of documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Download a PDF file') and resource ('from the specified URL'), with additional context about NCCN login credentials. It distinguishes from sibling tools like 'extract_content' and 'get_index' by focusing on file retrieval rather than content processing or indexing. However, it doesn't explicitly differentiate from potential similar download tools beyond the PDF-specific mention.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for downloading PDFs from URLs, particularly mentioning NCCN authentication contexts, but doesn't provide explicit guidance on when to use this versus alternatives like 'extract_content' (which might handle PDF content extraction) or general file download tools. No explicit when-not-to-use scenarios or prerequisite conditions are stated beyond the optional authentication parameters.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gscfwid/NCCN_guidelines_MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server