fetch_and_save
Fetches web content from a URL and saves it as a file, using Jina Reader for markdown conversion with fallback to standard fetch. Automatically generates filenames when not specified.
Instructions
Fetches a URL from the internet using Jina Reader API (with fallback to standard fetch) and saves the content to a file.
This tool first tries to fetch content using Jina Reader API for better markdown conversion, and falls back to the standard fetch method if Jina fails. Files are saved in the configured working directory. If no file path is specified, an automatic filename will be generated based on the URL.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL to fetch | |
| file_path | No | File path to save the content (optional, will auto-generate if not provided) | |
| raw | No | Get the actual HTML content of the requested page, without simplification. |
Implementation Reference
- Pydantic schema defining input parameters for the fetch_and_save tool: url (required), optional file_path, and raw flag.class FetchAndSave(BaseModel): """Parameters for fetching a URL and saving to file.""" url: Annotated[AnyUrl, Field(description="URL to fetch")] file_path: Annotated[Optional[str], Field(default=None, description="File path to save the content (optional, will auto-generate if not provided)")] raw: Annotated[ bool, Field( default=False, description="Get the actual HTML content of the requested page, without simplification.", ), ]
- src/context_mcp_server/server.py:232-238 (registration)Registration of the 'fetch_and_save' tool in the server's list_tools() method, including name, description, and input schema reference.Tool( name="fetch_and_save", description="""Fetches a URL from the internet using Jina Reader API (with fallback to standard fetch) and saves the content to a file. This tool first tries to fetch content using Jina Reader API for better markdown conversion, and falls back to the standard fetch method if Jina fails. Files are saved in the configured working directory. If no file path is specified, an automatic filename will be generated based on the URL.""", inputSchema=FetchAndSave.model_json_schema(), )
- src/context_mcp_server/server.py:287-334 (handler)Core handler logic within call_tool: validates input, determines output file path (auto-generates if needed), fetches content via Jina fallback, saves to file, returns success message with preview.elif name == "fetch_and_save": try: args = FetchAndSave(**arguments) except ValueError as e: raise McpError(ErrorData(code=INVALID_PARAMS, message=str(e))) url = str(args.url) if not url: raise McpError(ErrorData(code=INVALID_PARAMS, message="URL is required")) # Debug: Log received arguments debug_info = f"Received arguments: url={url}, file_path={args.file_path}, raw={args.raw}" # Generate file path if not provided if args.file_path and args.file_path.strip(): # Use provided file path, but ensure it's within work_dir provided_path = args.file_path.strip() if os.path.isabs(provided_path): # If absolute path, use as-is (user responsibility) file_path = provided_path debug_info += f"\nUsing absolute path: {file_path}" else: # If relative path, make it relative to work_dir file_path = os.path.join(work_dir, provided_path) debug_info += f"\nUsing relative path in work_dir: {file_path}" else: # Auto-generate filename filename = generate_filename_from_url(url) file_path = os.path.join(work_dir, filename) debug_info += f"\nAuto-generated filename: {file_path}" try: content, prefix = await fetch_with_jina_fallback( url, user_agent_autonomous, force_raw=args.raw, proxy_url=proxy_url ) # Create directory if it doesn't exist os.makedirs(os.path.dirname(file_path), exist_ok=True) # Save content to file with open(file_path, 'w', encoding='utf-8') as f: f.write(content) return [TextContent(type="text", text=f"Successfully fetched content from {url} and saved to {file_path}\n\nDebug info: {debug_info}\n\n{prefix}Content preview (first 500 chars):\n{content[:500]}{'...' if len(content) > 500 else ''}")] except Exception as e: raise McpError(ErrorData(code=INTERNAL_ERROR, message=f"Failed to fetch and save: {str(e)}"))
- Helper function to generate a safe, unique filename from the URL when file_path is not provided.def generate_filename_from_url(url: str) -> str: """Generate a safe filename from URL.""" # Extract domain and path parsed = urlparse(url) domain = parsed.netloc.replace('www.', '') path = parsed.path.strip('/') # Create base filename if path: # Use last part of path filename_base = path.split('/')[-1] # Remove file extension if present if '.' in filename_base: filename_base = filename_base.rsplit('.', 1)[0] else: filename_base = domain # Clean filename - remove invalid characters filename_base = re.sub(r'[^\w\-_.]', '_', filename_base) # Add timestamp to ensure uniqueness timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") return f"{filename_base}_{timestamp}.md"
- Helper function for fetching content, preferring Jina Reader API for better markdown extraction with fallback to standard fetch_url.async def fetch_with_jina_fallback( url: str, user_agent: str, force_raw: bool = False, proxy_url: str | None = None ) -> Tuple[str, str]: """ Fetch URL using Jina Reader API first, fallback to original fetch logic if failed. """ from httpx import AsyncClient, HTTPError # Try Jina Reader API first jina_url = f"https://r.jina.ai/{url}" async with AsyncClient(proxies=proxy_url) as client: try: response = await client.get( jina_url, follow_redirects=True, headers={"User-Agent": user_agent}, timeout=30, ) if response.status_code == 200: content = response.text # Check if it's a Jina error response try: error_data = json.loads(content) if "code" in error_data and error_data.get("data") is None: # This is an error response, fallback to original logic raise Exception(f"Jina API error: {error_data.get('message', 'Unknown error')}") except json.JSONDecodeError: # Not JSON, assume it's valid content pass # Jina Reader already returns markdown content return content, "Content fetched via Jina Reader API:\n" except Exception: # Jina failed, fallback to original logic pass # Fallback to original fetch logic return await fetch_url(url, user_agent, force_raw, proxy_url)