Skip to main content
Glama

rag

Search the web and extract content using intelligent RAG techniques to retrieve, process, and summarize information from web pages for research and analysis.

Instructions

Busca web com extração inteligente de conteúdo (igual Apify RAG Web Browser)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYesTermo de busca para pesquisar na web
maxResultsNoMáximo de páginas a processar
outputFormatNoFormato de saída do conteúdomarkdown
contentModeNoModo de conteúdo: preview=resumo truncado (~300 chars), full=conteúdo completo, summary=sumarização inteligente via LLMfull
summaryModelNoModelo Ollama para sumarização (default: llama3.2:1b). Ex: mistral:7b, qwen2.5:0.5b
useJavascriptNoRenderizar JavaScript nas páginas
timeoutNoTimeout para scraping em milissegundos

Implementation Reference

  • The core implementation of the 'rag' tool handler. Searches the web, scrapes and extracts content from top results, applies caching, processes content based on modes (preview/full/summary), and returns formatted results.
    export async function rag(params: RagParams): Promise<RagResult> { const { query, maxResults = 5, outputFormat = "markdown", contentMode = "full", summaryModel, useJavascript = false, timeout = 30000, } = params; let searchResults: Array<{ url: string; title: string; description: string; }>; try { searchResults = await searchWeb(query, maxResults); } catch (error) { return { query, error: `Busca falhou: ${(error as Error).message}. Tente novamente em alguns minutos.`, results: [], totalResults: 0, searchedAt: new Date().toISOString(), }; } const pages = await Promise.all( searchResults.map(async (result) => { const cached = getFromCache(result.url); if (cached) { return { url: result.url, title: cached.title, markdown: cached.markdown, text: cached.content, html: cached.content, excerpt: "", fromCache: true, }; } try { const { html } = await scrapePage(result.url, { javascript: useJavascript, timeout, }); const extracted = await extractContent(html, result.url); if (extracted) { saveToCache(result.url, { content: extracted.textContent, markdown: extracted.markdown, title: extracted.title, }); return { url: result.url, title: extracted.title, markdown: extracted.markdown, text: extracted.textContent, html: extracted.content, excerpt: extracted.excerpt, fromCache: false, }; } } catch (e) { console.error(`Failed to scrape ${result.url}:`, e); } return null; }) ); const validPages = pages.filter(Boolean) as PageResult[]; const formattedResults = await Promise.all( validPages.map(async (p) => { const result: any = { url: p.url, title: p.title, fromCache: p.fromCache, }; if (contentMode === "preview") { result.contentHandle = generateContentHandle(p.url); } if (outputFormat === "markdown") { result.markdown = await applyContentMode(p.markdown, contentMode, summaryModel); } else if (outputFormat === "text") { result.text = await applyContentMode(p.text, contentMode, summaryModel); } else if (outputFormat === "html") { result.html = await applyContentMode(p.html, contentMode, summaryModel); } if (p.excerpt && contentMode === "full") { result.excerpt = p.excerpt; } return result; }) ); return { query, results: formattedResults, totalResults: validPages.length, searchedAt: new Date().toISOString(), }; }
  • TypeScript interfaces defining the input parameters (RagParams), intermediate PageResult, and output RagResult for the rag tool.
    interface RagParams { query: string; maxResults?: number; outputFormat?: "markdown" | "text" | "html"; contentMode?: "preview" | "full" | "summary"; summaryModel?: string; useJavascript?: boolean; timeout?: number; } interface PageResult { url: string; title: string; markdown?: string; text?: string; html?: string; excerpt?: string; contentHandle?: string; fromCache: boolean; } interface RagResult { query: string; results: PageResult[]; totalResults: number; searchedAt: string; error?: string; }
  • src/index.ts:17-59 (registration)
    MCP server registration of the 'rag' tool, including name, description, Zod input schema for validation, and a thin wrapper handler that invokes the rag function and returns the result as MCP-formatted content.
    server.tool( "rag", "Busca web com extração inteligente de conteúdo (igual Apify RAG Web Browser)", { query: z .string() .describe("Termo de busca para pesquisar na web"), maxResults: z .number() .int() .min(1) .max(10) .default(5) .describe("Máximo de páginas a processar"), outputFormat: z .enum(["markdown", "text", "html"]) .default("markdown") .describe("Formato de saída do conteúdo"), contentMode: z .enum(["preview", "full", "summary"]) .default("full") .describe("Modo de conteúdo: preview=resumo truncado (~300 chars), full=conteúdo completo, summary=sumarização inteligente via LLM"), summaryModel: z .string() .optional() .describe("Modelo Ollama para sumarização (default: llama3.2:1b). Ex: mistral:7b, qwen2.5:0.5b"), useJavascript: z .boolean() .default(false) .describe("Renderizar JavaScript nas páginas"), timeout: z .number() .int() .default(30000) .describe("Timeout para scraping em milissegundos"), }, async (params) => { const result = await rag(params); return { content: [{ type: "text", text: JSON.stringify(result, null, 2) }], }; } );
  • Runtime Zod schema for input validation of the 'rag' tool parameters in the MCP server registration.
    { query: z .string() .describe("Termo de busca para pesquisar na web"), maxResults: z .number() .int() .min(1) .max(10) .default(5) .describe("Máximo de páginas a processar"), outputFormat: z .enum(["markdown", "text", "html"]) .default("markdown") .describe("Formato de saída do conteúdo"), contentMode: z .enum(["preview", "full", "summary"]) .default("full") .describe("Modo de conteúdo: preview=resumo truncado (~300 chars), full=conteúdo completo, summary=sumarização inteligente via LLM"), summaryModel: z .string() .optional() .describe("Modelo Ollama para sumarização (default: llama3.2:1b). Ex: mistral:7b, qwen2.5:0.5b"), useJavascript: z .boolean() .default(false) .describe("Renderizar JavaScript nas páginas"), timeout: z .number() .int() .default(30000) .describe("Timeout para scraping em milissegundos"), },

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/alucardeht/isis-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server