Skip to main content
Glama

mcp-server-webcrawl

installation.html10.4 kB
<!DOCTYPE html> <html class="writer-html5" lang="en" data-content_root="./"> <head> <meta charset="utf-8" /><meta name="viewport" content="width=device-width, initial-scale=1" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <title>Installation &mdash; mcp-server-webcrawl documentation</title> <link rel="stylesheet" type="text/css" href="_static/pygments.css?v=80d5e7a1" /> <link rel="stylesheet" type="text/css" href="_static/css/theme.css?v=e59714d7" /> <script src="_static/jquery.js?v=5d32c60e"></script> <script src="_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script> <script src="_static/documentation_options.js?v=5929fcd5"></script> <script src="_static/doctools.js?v=888ff710"></script> <script src="_static/sphinx_highlight.js?v=dc90522c"></script> <script src="_static/js/theme.js"></script> <link rel="index" title="Index" href="genindex.html" /> <link rel="search" title="Search" href="search.html" /> <link rel="next" title="Setup Guides" href="guides.html" /> <link rel="prev" title="mcp-server-webcrawl" href="index.html" /> </head> <body class="wy-body-for-nav"> <div class="wy-grid-for-nav"> <nav data-toggle="wy-nav-shift" class="wy-nav-side"> <div class="wy-side-scroll"> <div class="wy-side-nav-search" > <a href="index.html" class="icon icon-home"> mcp-server-webcrawl </a> <div role="search"> <form id="rtd-search-form" class="wy-form" action="search.html" method="get"> <input type="text" name="q" placeholder="Search docs" aria-label="Search docs" /> <input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="area" value="default" /> </form> </div> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <ul class="current"> <li class="toctree-l1 current"><a class="current reference internal" href="#">Installation</a><ul> <li class="toctree-l2"><a class="reference internal" href="#requirements">Requirements</a></li> <li class="toctree-l2"><a class="reference internal" href="#mcp-configuration">MCP Configuration</a></li> <li class="toctree-l2"><a class="reference internal" href="#multiple-configurations">Multiple Configurations</a></li> <li class="toctree-l2"><a class="reference internal" href="#references">References</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="guides.html">Setup Guides</a></li> <li class="toctree-l1"><a class="reference internal" href="usage.html">Usage</a></li> <li class="toctree-l1"><a class="reference internal" href="prompts.html">Prompt Routines</a></li> <li class="toctree-l1"><a class="reference internal" href="interactive.html">Interactive Mode</a></li> <li class="toctree-l1"><a class="reference internal" href="modules.html">mcp_server_webcrawl</a></li> </ul> </div> </div> </nav> <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" > <i data-toggle="wy-nav-top" class="fa fa-bars"></i> <a href="index.html">mcp-server-webcrawl</a> </nav> <div class="wy-nav-content"> <div class="rst-content"> <div role="navigation" aria-label="Page navigation"> <ul class="wy-breadcrumbs"> <li><a href="index.html" class="icon icon-home" aria-label="Home"></a></li> <li class="breadcrumb-item active">Installation</li> <li class="wy-breadcrumbs-aside"> <a href="_sources/installation.rst.txt" rel="nofollow"> View page source</a> </li> </ul> <hr/> </div> <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article"> <div itemprop="articleBody"> <section id="installation"> <h1>Installation<a class="headerlink" href="#installation" title="Link to this heading"></a></h1> <p>Install the package via pip:</p> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>pip<span class="w"> </span>install<span class="w"> </span>mcp-server-webcrawl </pre></div> </div> <section id="requirements"> <h2>Requirements<a class="headerlink" href="#requirements" title="Link to this heading"></a></h2> <p>To use mcp-server-webcrawl effectively, you need:</p> <ul class="simple"> <li><p>An MCP-capable LLM host such as Claude Desktop [1]</p></li> <li><p>Python [2] installed on your command line interface</p></li> <li><p>Basic familiarity with running Python packages</p></li> </ul> <p>After ensuring these prerequisites are met, run the pip install command above to add the package to your environment.</p> </section> <section id="mcp-configuration"> <h2>MCP Configuration<a class="headerlink" href="#mcp-configuration" title="Link to this heading"></a></h2> <p>To enable your LLM host to access your web crawl data, you’ll need to add an MCP server configuration. From Claude’s developer settings, locate the MCP configuration section and add the appropriate configuration for your crawler type.</p> <p>Setup guides and videos are available for each supported crawler:</p> <ul class="simple"> <li><p><a class="reference internal" href="guides/archivebox.html"><span class="doc">ArchiveBox</span></a></p></li> <li><p><a class="reference internal" href="guides/httrack.html"><span class="doc">HTTrack</span></a></p></li> <li><p><a class="reference internal" href="guides/interrobot.html"><span class="doc">InterroBot</span></a></p></li> <li><p><a class="reference internal" href="guides/katana.html"><span class="doc">Katana</span></a></p></li> <li><p><a class="reference internal" href="guides/siteone.html"><span class="doc">SiteOne</span></a></p></li> <li><p><a class="reference internal" href="guides/warc.html"><span class="doc">WARC</span></a></p></li> <li><p><a class="reference internal" href="guides/wget.html"><span class="doc">Wget</span></a></p></li> </ul> </section> <section id="multiple-configurations"> <h2>Multiple Configurations<a class="headerlink" href="#multiple-configurations" title="Link to this heading"></a></h2> <p>You can set up multiple <strong>mcp-server-webcrawl</strong> connections under the <code class="docutils literal notranslate"><span class="pre">mcpServers</span></code> section if you want to access different crawler data sources simultaneously.</p> <div class="highlight-json notranslate"><div class="highlight"><pre><span></span><span class="p">{</span> <span class="w"> </span><span class="nt">&quot;mcpServers&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nt">&quot;webcrawl_warc&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nt">&quot;command&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;/path/to/mcp-server-webcrawl&quot;</span><span class="p">,</span> <span class="w"> </span><span class="nt">&quot;args&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">&quot;--crawler&quot;</span><span class="p">,</span><span class="w"> </span><span class="s2">&quot;warc&quot;</span><span class="p">,</span><span class="w"> </span><span class="s2">&quot;--datasrc&quot;</span><span class="p">,</span><span class="w"> </span><span class="s2">&quot;/path/to/warc/archives/&quot;</span><span class="p">]</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="nt">&quot;webcrawl_wget&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nt">&quot;command&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;/path/to/mcp-server-webcrawl&quot;</span><span class="p">,</span> <span class="w"> </span><span class="nt">&quot;args&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">&quot;--crawler&quot;</span><span class="p">,</span><span class="w"> </span><span class="s2">&quot;wget&quot;</span><span class="p">,</span><span class="w"> </span><span class="s2">&quot;--datasrc&quot;</span><span class="p">,</span><span class="w"> </span><span class="s2">&quot;/path/to/wget/archives/&quot;</span><span class="p">]</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> </div> <p>After adding the configuration, save the file and restart your LLM host to apply the changes.</p> </section> <section id="references"> <h2>References<a class="headerlink" href="#references" title="Link to this heading"></a></h2> <p>[1] Claude Desktop: <a class="reference external" href="https://claude.ai">https://claude.ai</a> [2] Python: <a class="reference external" href="https://python.org">https://python.org</a></p> </section> </section> </div> </div> <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer"> <a href="index.html" class="btn btn-neutral float-left" title="mcp-server-webcrawl" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a> <a href="guides.html" class="btn btn-neutral float-right" title="Setup Guides" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a> </div> <hr/> <div role="contentinfo"> <p>&#169; Copyright 2025, pragmar.</p> </div> Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>. </footer> </div> </div> </section> </div> <script> jQuery(function () { SphinxRtdTheme.Navigation.enable(true); }); </script> </body> </html>

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pragmar/mcp_server_webcrawl'

If you have feedback or need assistance with the MCP directory API, please join our Discord server