fetch_website
Convert website content to Markdown format for AI processing, with automatic cleanup and ad removal.
Instructions
Fetch specified website and convert to markdown format
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | Website URL to fetch |
Implementation Reference
- src/index.ts:443-526 (handler)Core handler function that fetches website content, detects if it's an OpenAPI spec, processes HTML to markdown using enhancedFetcher, and returns structured data including title, markdown, summary, etc.async function fetchWebsiteContent(url: string): Promise<{ title: string; content: string; markdown: string; wordCount?: number; readingTime?: number; summary?: string; language?: string; isOpenAPI?: boolean; openAPIData?: { spec: OpenAPISpec; formatted: string; summary: string; isValid: boolean; errors?: string[]; }; }> { try { console.error(`正在獲取網站: ${url}`); // 首先檢查是否是 OpenAPI 內容 const response = await axios.get(url, { timeout: 30000, headers: { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8,application/json,application/yaml', 'Accept-Language': 'zh-TW,zh;q=0.9,en;q=0.8', 'Accept-Encoding': 'gzip, deflate, br', 'Cache-Control': 'no-cache', 'Pragma': 'no-cache' } }); // Check if it's a direct OpenAPI/JSON/YAML file const contentType = response.headers['content-type'] || ''; const rawContent = response.data; if (contentType.includes('application/json') || contentType.includes('application/yaml') || contentType.includes('text/yaml') || url.endsWith('.json') || url.endsWith('.yaml') || url.endsWith('.yml') || (typeof rawContent === 'string' && isOpenAPIContent(rawContent))) { console.error(`檢測到 OpenAPI 規範文件: ${url}`); const content = typeof rawContent === 'string' ? rawContent : JSON.stringify(rawContent, null, 2); const openAPIData = await parseOpenAPISpec(content); return { title: openAPIData.spec.info?.title || 'Unknown API', content: openAPIData.summary, markdown: openAPIData.formatted, isOpenAPI: true, openAPIData }; } // 使用增強的獲取器處理一般網頁內容 const processedContent = await enhancedFetcher.fetchAndProcess(url, { removeAds: true, removeNavigation: true, extractMainContent: true, timeout: 30000 }); console.error(`成功獲取網站: ${processedContent.title} (${processedContent.wordCount} 字, ${processedContent.readingTime} 分鐘閱讀)`); return { title: processedContent.title, content: processedContent.content.slice(0, 1000), // 限制純文本內容長度 markdown: processedContent.markdown, wordCount: processedContent.wordCount, readingTime: processedContent.readingTime, summary: processedContent.summary, language: processedContent.language }; } catch (error) { console.error(`獲取網站 ${url} 失敗:`, error); throw new McpError(ErrorCode.InternalError, `無法獲取網站 ${url}: ${error instanceof Error ? error.message : String(error)}`); } }
- src/index.ts:546-559 (registration)Tool registration in the ListTools response: defines name 'fetch_website', description, and input schema requiring 'url'.{ name: 'fetch_website', description: 'Fetch specified website and convert to markdown format', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'Website URL to fetch' } }, required: ['url'] } },
- src/index.ts:669-713 (handler)Execution handler within CallToolRequestSchema: checks for 'fetch_website', validates URL argument, invokes fetchWebsiteContent, formats the response as markdown with metadata, and returns as text content.if (name === 'fetch_website') { const url = args?.url as string; if (!url) { throw new McpError(ErrorCode.InvalidParams, 'Missing required parameter: url'); } const result = await fetchWebsiteContent(url); let output = `# ${result.title}\n\n**來源**: ${url}\n`; if (result.summary) { output += `**摘要**: ${result.summary}\n`; } if (result.readingTime) { output += `**預估閱讀時間**: ${result.readingTime} 分鐘\n`; } if (result.wordCount) { output += `**字數統計**: ${result.wordCount}\n`; } if (result.language) { output += `**語言**: ${result.language === 'zh' ? '中文' : '英文'}\n`; } if (result.isOpenAPI) { output += `**類型**: OpenAPI/Swagger 規範\n`; if (result.openAPIData?.isValid === false) { output += `**驗證警告**: 規範可能有問題\n`; } } output += '\n---\n\n'; output += result.markdown; return { content: [ { type: 'text', text: output } ] }; }
- src/index.ts:549-558 (schema)Input schema definition for the fetch_website tool: object with required 'url' string property.inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'Website URL to fetch' } }, required: ['url'] }