Skip to main content
Glama

crawl_docs

Crawl and index documentation to enable search functionality within the MCP server. Use the force parameter to re-crawl all documents when updates occur.

Instructions

Start crawling enabled docs

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
forceNoWhether to force re-crawl all docs, ignoring previous crawl records

Implementation Reference

  • The handler logic for the 'crawl_docs' tool call. It parses the optional 'force' boolean from arguments and invokes the core crawlAndSaveDocs function, returning a success message.
    case "crawl_docs": { const force = Boolean(request.params.arguments?.force); await crawlAndSaveDocs(force); return { content: [{ type: "text", text: "Crawling completed" }] }; }
  • src/index.ts:435-447 (registration)
    Registration of the 'crawl_docs' tool in the ListToolsRequestSchema response. Includes name, description, and input schema.
    { name: "crawl_docs", description: "Start crawling enabled docs", inputSchema: { type: "object", properties: { force: { type: "boolean", description: "Whether to force re-crawl all docs, ignoring previous crawl records" } } } },
  • Input schema definition for 'crawl_docs' tool.
    inputSchema: { type: "object", properties: { force: { type: "boolean", description: "Whether to force re-crawl all docs, ignoring previous crawl records" } } }
  • Core helper function that executes the document crawling. For each enabled doc, launches Puppeteer, finds relevant links from start page, scrapes each page to Markdown, saves files locally, and tracks crawl status. Respects 'force' flag and skips already crawled.
    async function crawlAndSaveDocs(force: boolean = false): Promise<void> { await fs.ensureDir(docDir); console.error('========== START CRAWLING =========='); for (const doc of docs) { if (!docConfig[doc.name]) { console.error(`Skipping doc ${doc.name} - not enabled`); continue; } // Skip if already crawled and not forcing re-crawl if (!force && await fs.pathExists(configPath)) { const config = await fs.readJson(configPath); if (config.crawledDocs && config.crawledDocs[doc.name]) { console.error(`Skipping doc ${doc.name} - already crawled at ${config.crawledDocs[doc.name]}`); continue; } } try { // Create doc directory - FIX: use the correct path from docDir parameter const docDirPath = path.join(docDir, doc.name); await fs.ensureDir(docDirPath); // Launch browser and open new page const browser = await puppeteer.launch({ // WSL-friendly options to avoid GPU issues args: ['--no-sandbox', '--disable-setuid-sandbox', '--disable-gpu'], headless: true }); try { const page = await browser.newPage(); // Navigate to start page console.error(`Processing doc: ${doc.name}`); console.error(`Crawler start: ${doc.crawlerStart}, Crawler prefix: ${doc.crawlerPrefix}`); await page.goto(doc.crawlerStart, { waitUntil: 'networkidle2' }); // Extract all links const links = Array.from(new Set( await page.evaluate((prefix) => { const anchors = Array.from(document.querySelectorAll('a[href]')); return anchors .map(a => { const href = a.getAttribute('href'); if (!href) return null; try { const url = new URL(href, window.location.origin); return url.toString(); } catch (error) { console.error(`Failed to parse href ${href}:`, error); return null; } }) .filter(link => link && link.startsWith(prefix)); }, doc.crawlerPrefix) )); if (links.length > 0) { console.error(`Found ${links.length} valid links to process`); for (const link of links) { if (!link) continue; try { console.log(`Processing link: ${link}`); const newPage = await browser.newPage(); await newPage.goto(link, { waitUntil: 'networkidle2' }); // Extract content as Markdown const content = await newPage.evaluate(() => { // Get page title const title = document.title; // Find main content element const main = document.querySelector('main') || document.querySelector('article') || document.querySelector('.main-content') || document.body; // Convert content to Markdown let markdown = `# ${title}\n\n`; // Convert headings main.querySelectorAll('h1, h2, h3, h4, h5, h6').forEach(heading => { const level = parseInt(heading.tagName[1]); const text = heading.textContent?.trim(); if (text) { markdown += '#'.repeat(level) + ' ' + text + '\n\n'; } }); // Convert paragraphs main.querySelectorAll('p').forEach(p => { const text = p.textContent?.trim(); if (text) { markdown += text + '\n\n'; } }); // Convert code blocks main.querySelectorAll('pre').forEach(pre => { const text = pre.textContent?.trim(); if (text) { markdown += '```\n' + text + '\n```\n\n'; } }); // Convert lists main.querySelectorAll('ul, ol').forEach(list => { const isOrdered = list.tagName === 'OL'; list.querySelectorAll('li').forEach((li, index) => { const text = li.textContent?.trim(); if (text) { markdown += isOrdered ? `${index + 1}. ` : '- '; markdown += text + '\n'; } }); markdown += '\n'; }); return markdown.trim(); }); await newPage.close(); // Save Markdown file // Create safe file name from URL path const url = new URL(link); const pathParts = url.pathname.split('/').filter(part => part.length > 0); let fileName = pathParts.join('_'); // Add extension if not present if (!fileName.endsWith('.md')) { fileName += '.md'; } // FIX: Use docDirPath instead of docDir const filePath = path.join(docDirPath, fileName); await fs.writeFile(filePath, content); console.log(`Successfully saved ${filePath}`); await updateCrawledDoc(doc.name); } catch (error) { console.error(`Failed to process page ${link}:`, error); } } } else { console.error('No valid links found'); } } finally { await browser.close(); } } catch (error) { console.error(`Failed to process doc ${doc.name}:`, error); } } }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/askme765cs/open-docs-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server