Skip to main content
Glama

download_pdf

Download a PDF from a URL and save it to the PDF agent directory. Returns the full path of the saved file.

Instructions

Download a PDF from a URL and save it to the PDF agent home directory. Downloads to a specified subfolder (default: 'downloads') and returns the full path of the downloaded PDF.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL of the PDF to download. Must be a valid HTTP/HTTPS URL.
subfolderNoSubfolder within ~/pdf-agent/ to save the PDF (default: 'downloads'). Will be created if it doesn't exist.downloads
filenameNoOptional filename for the downloaded PDF. If not provided, will be derived from URL. Extension .pdf will be added if missing.

Implementation Reference

  • Zod schema defining the input validation for download_pdf tool. Accepts 'url' (required, valid HTTP/HTTPS URL), 'subfolder' (default 'downloads'), and 'filename' (optional).
    const DownloadPdfSchema = z.object({
      url: z.string().url(),
      subfolder: z.string().default("downloads"),
      filename: z.string().optional(),
    });
  • src/index.ts:1667-1690 (registration)
    Registration of the 'download_pdf' tool in the MCP ListTools handler. Defines tool name, description, and input schema (url, subfolder, filename).
    {
      name: "download_pdf",
      description: "Download a PDF from a URL and save it to the PDF agent home directory. Downloads to a specified subfolder (default: 'downloads') and returns the full path of the downloaded PDF.",
      inputSchema: {
        type: "object",
        properties: {
          url: {
            type: "string",
            format: "uri",
            description: "The URL of the PDF to download. Must be a valid HTTP/HTTPS URL.",
          },
          subfolder: {
            type: "string",
            description: "Subfolder within ~/pdf-agent/ to save the PDF (default: 'downloads'). Will be created if it doesn't exist.",
            default: "downloads",
          },
          filename: {
            type: "string",
            description: "Optional filename for the downloaded PDF. If not provided, will be derived from URL. Extension .pdf will be added if missing.",
          },
        },
        required: ["url"],
      },
    },
  • Handler for the 'download_pdf' tool in the CallToolRequestSchema switch statement. Parses args with DownloadPdfSchema, calls downloadPdfFromUrl(), and returns success/error response.
    case "download_pdf": {
      const { url, subfolder, filename } = DownloadPdfSchema.parse(args);
      
      try {
        const result = await downloadPdfFromUrl(url, subfolder, filename);
        
        if (result.success && result.filePath) {
          return {
            content: [
              {
                type: "text",
                text: JSON.stringify({
                  success: true,
                  file_path: result.filePath,
                  metadata: result.metadata
                }, null, 2),
              },
            ],
          };
        } else {
          return {
            content: [
              {
                type: "text",
                text: JSON.stringify({
                  success: false,
                  error: result.error
                }),
              },
            ],
          };
        }
      } catch (e) {
        return {
          content: [
            {
              type: "text",
              text: JSON.stringify({
                success: false,
                error: `Download failed: ${e instanceof Error ? e.message : 'Unknown error'}`
              }),
            },
          ],
        };
      }
    }
  • Core implementation function downloadPdfFromUrl() that downloads a PDF from a URL. Handles directory creation, filename generation, HTTP fetch with timeout, streaming to file, size validation, PDF header verification, and returns metadata on success.
    async function downloadPdfFromUrl(
      url: string, 
      subfolder: string = "downloads", 
      filename?: string
    ): Promise<{ success: boolean; filePath?: string; error?: string; metadata?: any }> {
      try {
        log('info', `Starting PDF download from URL: ${url}`);
        
        // Ensure PDF agent home directory exists
        const pdfAgentHome = await ensurePdfAgentHome();
        const downloadDir = join(pdfAgentHome, subfolder);
        
        // Create download directory if it doesn't exist
        await mkdir(downloadDir, { recursive: true });
        
        // Generate filename if not provided
        let finalFilename = filename;
        if (!finalFilename) {
          try {
            const urlObj = new URL(url);
            finalFilename = basename(urlObj.pathname) || `download_${Date.now()}.pdf`;
            
            // Ensure .pdf extension
            if (!finalFilename.toLowerCase().endsWith('.pdf')) {
              finalFilename += '.pdf';
            }
          } catch {
            finalFilename = `download_${Date.now()}.pdf`;
          }
        } else {
          // Ensure .pdf extension for provided filename
          if (!finalFilename.toLowerCase().endsWith('.pdf')) {
            finalFilename += '.pdf';
          }
        }
        
        const filePath = join(downloadDir, finalFilename);
        
        // Check if file already exists
        if (await fileExists(filePath)) {
          return {
            success: false,
            error: `File already exists at ${filePath}. Please provide a different filename or delete the existing file.`
          };
        }
        
        log('info', `Downloading PDF to: ${filePath}`);
        
        // Download the file with timeout
        const controller = new AbortController();
        const timeoutId = setTimeout(() => controller.abort(), OPERATION_TIMEOUT);
        
        try {
          const response = await fetch(url, { 
            signal: controller.signal,
            headers: {
              'User-Agent': 'PDF-Agent-MCP/1.0.0'
            }
          });
          
          clearTimeout(timeoutId);
          
          if (!response.ok) {
            return {
              success: false,
              error: `HTTP ${response.status}: ${response.statusText}`
            };
          }
          
          // Check content type
          const contentType = response.headers.get('content-type') || '';
          if (!contentType.includes('application/pdf') && !contentType.includes('application/octet-stream')) {
            log('warn', `Content-Type is not PDF: ${contentType}`);
          }
          
          // Get content length for size check
          const contentLength = response.headers.get('content-length');
          if (contentLength && parseInt(contentLength) > MAX_FILE_SIZE) {
            return {
              success: false,
              error: `File too large: ${(parseInt(contentLength) / 1024 / 1024).toFixed(1)}MB (max ${MAX_FILE_SIZE / 1024 / 1024}MB)`
            };
          }
          
          // Stream the response to file
          const fileStream = createWriteStream(filePath);
          
          if (!response.body) {
            return {
              success: false,
              error: 'Empty response body'
            };
          }
          
          await pipeline(response.body as any, fileStream);
          
          // Verify the downloaded file
          const stats = await stat(filePath);
          if (stats.size === 0) {
            return {
              success: false,
              error: 'Downloaded file is empty'
            };
          }
          
          if (stats.size > MAX_FILE_SIZE) {
            // Clean up oversized file
            try {
              await stat(filePath);
              await import('fs').then(fs => fs.promises.unlink(filePath));
            } catch {}
            return {
              success: false,
              error: `Downloaded file too large: ${(stats.size / 1024 / 1024).toFixed(1)}MB (max ${MAX_FILE_SIZE / 1024 / 1024}MB)`
            };
          }
          
          // Try to validate it's a PDF by reading the header
          try {
            const buffer = await readFile(filePath, { encoding: null });
            if (!buffer.subarray(0, 4).toString('ascii').startsWith('%PDF')) {
              log('warn', 'Downloaded file does not appear to be a valid PDF (missing PDF header)');
            }
          } catch (error) {
            log('warn', 'Could not validate PDF header', { error });
          }
          
          log('info', `PDF downloaded successfully: ${stats.size} bytes`);
          
          return {
            success: true,
            filePath: filePath,
            metadata: {
              filename: finalFilename,
              subfolder: subfolder,
              size_bytes: stats.size,
              size_mb: Number((stats.size / (1024 * 1024)).toFixed(2)),
              url: url,
              content_type: contentType,
              downloaded_at: new Date().toISOString()
            }
          };
          
        } catch (error) {
          clearTimeout(timeoutId);
          
          // Clean up partial file on error
          try {
            if (await fileExists(filePath)) {
              await import('fs').then(fs => fs.promises.unlink(filePath));
            }
          } catch {}
          
          if (error instanceof Error && error.name === 'AbortError') {
            return {
              success: false,
              error: `Download timeout after ${OPERATION_TIMEOUT / 1000} seconds`
            };
          }
          
          return {
            success: false,
            error: `Download failed: ${error instanceof Error ? error.message : 'Unknown error'}`
          };
        }
        
      } catch (error) {
        log('error', 'PDF download failed', { error });
        return {
          success: false,
          error: `Download failed: ${error instanceof Error ? error.message : 'Unknown error'}`
        };
      }
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden for behavioral disclosure. It mentions default subfolder and filename extension handling but omits critical behaviors such as overwrite policy, error handling for invalid URLs, or required permissions. For a download tool, more detail is needed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no redundancy. Front-loaded with the primary action. Every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description appropriately states the return value (full path). It also covers home directory and default subfolder. Missing details on size limits, supported protocols, and overwrite policy, but these are secondary for a tool whose siblings are text/metadata extractors.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds context about the home directory (~/pdf-agent/) not in the schema. However, it largely restates schema descriptions (e.g., default subfolder, filename extension). No additional semantic depth beyond this.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (download a PDF from a URL) and the resource (save to PDF agent home directory), with specific details about subfolder and return path. It effectively distinguishes from sibling tools that focus on extracting content from already-downloaded PDFs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Usage context is implied: this tool is a prerequisite for sibling tools. However, there is no explicit guidance on when to use it vs. alternatives, nor any mention of when not to use it (e.g., for non-PDF URLs).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/vlad-ds/pdf-agent-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server