fetch_website
Convert website content to Markdown format by fetching URLs, removing ads, and cleaning up content for easy processing by AI assistants.
Instructions
Fetch specified website and convert to markdown format
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | Website URL to fetch |
Implementation Reference
- src/index.ts:443-526 (handler)Core handler function that fetches website content using axios, detects OpenAPI specifications, processes with enhancedFetcher for markdown conversion, and returns structured data including title, markdown, summary, etc.async function fetchWebsiteContent(url: string): Promise<{ title: string; content: string; markdown: string; wordCount?: number; readingTime?: number; summary?: string; language?: string; isOpenAPI?: boolean; openAPIData?: { spec: OpenAPISpec; formatted: string; summary: string; isValid: boolean; errors?: string[]; }; }> { try { console.error(`正在獲取網站: ${url}`); // 首先檢查是否是 OpenAPI 內容 const response = await axios.get(url, { timeout: 30000, headers: { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8,application/json,application/yaml', 'Accept-Language': 'zh-TW,zh;q=0.9,en;q=0.8', 'Accept-Encoding': 'gzip, deflate, br', 'Cache-Control': 'no-cache', 'Pragma': 'no-cache' } }); // Check if it's a direct OpenAPI/JSON/YAML file const contentType = response.headers['content-type'] || ''; const rawContent = response.data; if (contentType.includes('application/json') || contentType.includes('application/yaml') || contentType.includes('text/yaml') || url.endsWith('.json') || url.endsWith('.yaml') || url.endsWith('.yml') || (typeof rawContent === 'string' && isOpenAPIContent(rawContent))) { console.error(`檢測到 OpenAPI 規範文件: ${url}`); const content = typeof rawContent === 'string' ? rawContent : JSON.stringify(rawContent, null, 2); const openAPIData = await parseOpenAPISpec(content); return { title: openAPIData.spec.info?.title || 'Unknown API', content: openAPIData.summary, markdown: openAPIData.formatted, isOpenAPI: true, openAPIData }; } // 使用增強的獲取器處理一般網頁內容 const processedContent = await enhancedFetcher.fetchAndProcess(url, { removeAds: true, removeNavigation: true, extractMainContent: true, timeout: 30000 }); console.error(`成功獲取網站: ${processedContent.title} (${processedContent.wordCount} 字, ${processedContent.readingTime} 分鐘閱讀)`); return { title: processedContent.title, content: processedContent.content.slice(0, 1000), // 限制純文本內容長度 markdown: processedContent.markdown, wordCount: processedContent.wordCount, readingTime: processedContent.readingTime, summary: processedContent.summary, language: processedContent.language }; } catch (error) { console.error(`獲取網站 ${url} 失敗:`, error); throw new McpError(ErrorCode.InternalError, `無法獲取網站 ${url}: ${error instanceof Error ? error.message : String(error)}`); } }
- src/index.ts:546-559 (registration)Registration of the 'fetch_website' tool in the ListToolsRequestSchema handler, including name, description, and input schema.{ name: 'fetch_website', description: 'Fetch specified website and convert to markdown format', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'Website URL to fetch' } }, required: ['url'] } },
- src/index.ts:549-558 (schema)Input schema definition for the 'fetch_website' tool, requiring a 'url' string parameter.inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'Website URL to fetch' } }, required: ['url'] }
- src/index.ts:669-713 (handler)Tool dispatch handler in CallToolRequestSchema that executes 'fetch_website' by calling fetchWebsiteContent, formats the output markdown with metadata, and returns it in MCP content format.if (name === 'fetch_website') { const url = args?.url as string; if (!url) { throw new McpError(ErrorCode.InvalidParams, 'Missing required parameter: url'); } const result = await fetchWebsiteContent(url); let output = `# ${result.title}\n\n**來源**: ${url}\n`; if (result.summary) { output += `**摘要**: ${result.summary}\n`; } if (result.readingTime) { output += `**預估閱讀時間**: ${result.readingTime} 分鐘\n`; } if (result.wordCount) { output += `**字數統計**: ${result.wordCount}\n`; } if (result.language) { output += `**語言**: ${result.language === 'zh' ? '中文' : '英文'}\n`; } if (result.isOpenAPI) { output += `**類型**: OpenAPI/Swagger 規範\n`; if (result.openAPIData?.isValid === false) { output += `**驗證警告**: 規範可能有問題\n`; } } output += '\n---\n\n'; output += result.markdown; return { content: [ { type: 'text', text: output } ] }; }