fetch_test_site
Convert website content from https://test.com into clean Markdown format for AI processing, removing ads and unnecessary elements.
Instructions
Fetch test-site (https://test.com) and convert to markdown format
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||
Implementation Reference
- src/index.ts:571-583 (registration)Dynamically registers the 'fetch_test_site' tool (and others) in the ListTools response based on configured websites. The tool name is generated as 'fetch_' + sanitized site name.config.websites.forEach(site => { tools.push({ name: `fetch_${site.name.replace(/[^a-zA-Z0-9]/g, '_')}`, description: `Fetch ${site.name} (${site.url}) and convert to markdown format${site.description ? ` - ${site.description}` : ''}`, inputSchema: { type: 'object', properties: {} } }); }); return { tools }; });
- src/index.ts:731-778 (handler)Handler logic for 'fetch_test_site' and similar dedicated tools. Matches tool name to website config, fetches content, formats with metadata and markdown, returns as MCP content.const websiteMatch = config.websites.find(site => name === `fetch_${site.name.replace(/[^a-zA-Z0-9]/g, '_')}` ); if (websiteMatch) { const result = await fetchWebsiteContent(websiteMatch.url); let output = `# ${result.title}\n\n**來源**: ${websiteMatch.url}\n**網站**: ${websiteMatch.name}\n`; if (websiteMatch.description) { output += `**描述**: ${websiteMatch.description}\n`; } if (result.summary) { output += `**摘要**: ${result.summary}\n`; } if (result.readingTime) { output += `**預估閱讀時間**: ${result.readingTime} 分鐘\n`; } if (result.wordCount) { output += `**字數統計**: ${result.wordCount}\n`; } if (result.language) { output += `**語言**: ${result.language === 'zh' ? '中文' : '英文'}\n`; } if (result.isOpenAPI) { output += `**類型**: OpenAPI/Swagger 規範\n`; if (result.openAPIData?.isValid === false) { output += `**驗證警告**: 規範可能有問題\n`; } } output += '\n---\n\n'; output += result.markdown; return { content: [ { type: 'text', text: output } ] }; }
- src/index.ts:576-579 (schema)Input schema for dedicated fetch tools like 'fetch_test_site': empty object since no parameters required (uses hardcoded site from config).type: 'object', properties: {} } });
- src/index.ts:443-526 (helper)Helper function called by tool handlers to fetch and process website content into markdown, with OpenAPI detection and stats.async function fetchWebsiteContent(url: string): Promise<{ title: string; content: string; markdown: string; wordCount?: number; readingTime?: number; summary?: string; language?: string; isOpenAPI?: boolean; openAPIData?: { spec: OpenAPISpec; formatted: string; summary: string; isValid: boolean; errors?: string[]; }; }> { try { console.error(`正在獲取網站: ${url}`); // 首先檢查是否是 OpenAPI 內容 const response = await axios.get(url, { timeout: 30000, headers: { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8,application/json,application/yaml', 'Accept-Language': 'zh-TW,zh;q=0.9,en;q=0.8', 'Accept-Encoding': 'gzip, deflate, br', 'Cache-Control': 'no-cache', 'Pragma': 'no-cache' } }); // Check if it's a direct OpenAPI/JSON/YAML file const contentType = response.headers['content-type'] || ''; const rawContent = response.data; if (contentType.includes('application/json') || contentType.includes('application/yaml') || contentType.includes('text/yaml') || url.endsWith('.json') || url.endsWith('.yaml') || url.endsWith('.yml') || (typeof rawContent === 'string' && isOpenAPIContent(rawContent))) { console.error(`檢測到 OpenAPI 規範文件: ${url}`); const content = typeof rawContent === 'string' ? rawContent : JSON.stringify(rawContent, null, 2); const openAPIData = await parseOpenAPISpec(content); return { title: openAPIData.spec.info?.title || 'Unknown API', content: openAPIData.summary, markdown: openAPIData.formatted, isOpenAPI: true, openAPIData }; } // 使用增強的獲取器處理一般網頁內容 const processedContent = await enhancedFetcher.fetchAndProcess(url, { removeAds: true, removeNavigation: true, extractMainContent: true, timeout: 30000 }); console.error(`成功獲取網站: ${processedContent.title} (${processedContent.wordCount} 字, ${processedContent.readingTime} 分鐘閱讀)`); return { title: processedContent.title, content: processedContent.content.slice(0, 1000), // 限制純文本內容長度 markdown: processedContent.markdown, wordCount: processedContent.wordCount, readingTime: processedContent.readingTime, summary: processedContent.summary, language: processedContent.language }; } catch (error) { console.error(`獲取網站 ${url} 失敗:`, error); throw new McpError(ErrorCode.InternalError, `無法獲取網站 ${url}: ${error instanceof Error ? error.message : String(error)}`); } }
- src/enhanced-fetcher.ts:305-350 (helper)Core fetching and content processing logic used by fetchWebsiteContent. Handles retries, ad removal, main content extraction, markdown conversion, and statistics.async fetchAndProcess(url: string, options: Partial<ContentOptions> = {}): Promise<ProcessedContent> { const mergedOptions = { ...this.defaultOptions, ...options }; const retryOptions: RetryOptions = { maxRetries: 3, delay: 1000, backoffFactor: 2 }; try { const startTime = Date.now(); // 獲取原始內容 const html = await this.fetchWithRetry(url, mergedOptions, retryOptions); // 解析和清理 const $ = cheerio.load(html); const { content, title } = this.cleanAndExtractContent($, mergedOptions); // 轉換為 Markdown const rawMarkdown = this.turndownService.turndown(content); const markdown = this.postProcessMarkdown(rawMarkdown); // 計算統計信息 const wordCount = this.countWords(markdown); const readingTime = this.calculateReadingTime(wordCount); const summary = this.generateSummary(markdown); const language = this.detectLanguage(markdown); const duration = Date.now() - startTime; console.error(`成功處理 ${url},用時 ${duration}ms,字數 ${wordCount}`); return { title, content, markdown, wordCount, readingTime, summary, language }; } catch (error) { console.error(`處理 ${url} 時發生錯誤:`, error); throw error; } }