Crawl4AI MCP Server

crawl4ai-mcp
__pycache__

crawl4ai_mcp_server.cpython-313.pyc•21.9 KiB

� �0�h�M��SrSSKrSSKrSSKJrJr SSKJr SSKJ r J r SSKJr SSK Jr SSKJr \R""\R$S \R&S S9 \R("\5rSr\ "S SSS9r\R35S\ S\\\44Sj5r\R35S$S\\\"SS94S\\\"SSS94S\ S\4Sjj5r\R35S%S\\\"SS94S\\\"SS94S\ S\4Sjj5r\R35S%S\\\"S S94S\ S\4S!jj5rS"r\S#:Xa\"5 gg)&a� Crawl4AI MCP Server A FastMCP server that provides web scraping and crawling capabilities using Crawl4AI. This server exposes three core tools: 1. get_page_structure - The "eyes" for analyzing webpage structure 2. crawl_with_schema - The "hands" for executing precise extraction schemas 3. take_screenshot - Media capture for visual representation This server is designed to work with client-side AI that acts as the "brain" to analyze and command the scraping operations. Architecture: - FastMCP handles MCP protocol and tool registration - AsyncWebCrawler provides web scraping capabilities - Proper logging to stderr prevents MCP stdio corruption - All tools use async/await patterns for non-blocking operation �N)�Dict�Any)�Path)�FastMCP�Context)�Field)� Annotated)�AsyncWebCrawlerz4%(asctime)s - %(name)s - %(levelname)s - %(message)sT)�level�format�stream�forcec�:�[US[R0UD6 g)z4Safe printing that goes to stderr instead of stdout.�fileN)�print�sys�stderr)�args�kwargss �V/Users/jeremyparker/Desktop/Claude Coding Projects/crawl4ai-mcp/crawl4ai_mcp_server.py� safe_printr,s�� 4�+�c�j�j�+�F�+��Crawl4AI-MCP-Server�1.0.0u�This server provides web scraping capabilities using Crawl4AI. The server acts as the 'hands and eyes' while the client AI acts as the 'brain'. Available tools: • get_page_structure: Analyze webpage structure and content • crawl_with_schema: Execute precise data extraction using schemas • take_screenshot: Capture visual representation of webpages All tools support proper error handling and async operation.)�name�version�instructions�ctx�returnc ��# �URS5IShv�N [R"5nSSSS[U5/SQSS S S.SS .nURS5IShv�N U$NNN![aMnS[U53n[ R U5 UR U5IShv�N SUSS.sSnA$SnAff=f7f)a Get the current status and capabilities of the Crawl4AI MCP server. This tool provides comprehensive information about server health, available features, configuration status, and operational capabilities. Use this to verify server connectivity and understand what web scraping capabilities are available. Returns: dict: Server status information including: - server_name: The name of the MCP server - version: Current server version - status: Operational status (operational/error) - transport: Communication transport type (stdio) - working_directory: Current server working directory - capabilities: List of available server capabilities - dependencies: Status of key dependencies - message: Human-readable status message Example response: { "server_name": "Crawl4AI-MCP-Server", "version": "1.0.0", "status": "operational", "capabilities": ["web_crawling", "content_extraction", "screenshot_capture", "schema_based_extraction"] } zChecking server status...Nrr�operational�stdio)�web_crawling�content_extraction�screenshot_capture�schema_based_extraction� installed�ready� configured)�fastmcp�crawl4ai� playwrightz+Server is ready to accept crawling requests)�server_namer�status� transport�working_directory�capabilities�dependencies�messagez*Server status check completed successfullyzFailed to get server status: �error)r.r4r-)�infor�cwd�str� Exception�loggerr4)rr6r.�e� error_msgs r� server_statusr<?s��8% ��h�h�2�3�3�3��h�h�j��1��#� �!$�S��'�#�*�� E�# ��(�h�h�C�D�D�D�� 5 4�2 E�� 3�C��F�8�<� ��Y��i�i� �"�"�"��0� � �� sc�C�A+�A'�AA+� A)�!A+�&C�'A+�)A+�+ C�57B=�,B/�- B=�7C�8C�=C�C�urlz!The URL of the webpage to analyze)�descriptionrzEOutput format: 'html' for cleaned HTML or 'markdown' for raw markdownz^(html|markdown)$)r>�patternc��z# �U(aURSU35IShv�N URS5(d@SUS3nU(aURU5IShv�N [RU5 SU3$[ SS9IShv�NnU(aURS 5IShv�N U(aURS SSS 9IShv�N UR US9IShv�NnU(aURSSSS 9IShv�N UR(dfSUSUR=(d S3nU(aURU5IShv�N [RU5 SU3sSSS5IShv�N $US:Xa`UR(aURROSnU(dSnU(a&URS[U5S35IShv�N OKUR=(d SnU(dSnU(a&URS[U5S35IShv�N U(aURSSSS 9IShv�N SnUSUS3- nUSUS3- nUS [U5S!3- nUS"URS#3- nXv-sSSS5IShv�N $GN{GNAGNGN�GN�GN�GN�GNdGN=N�N�NqN&!,IShv�N(df g=f![aUnS$US[U53n[RUS%S&9 U(aURU5IShv�N SU3sSnA$SnAff=f7f)'ar Fetch and analyze the structural content of a webpage for AI analysis. This is the fundamental "eyes" tool that provides the raw material for client AI to understand webpage structure. It returns clean, structured content without executing any extraction schemas. Args: url: The URL of the webpage to crawl and analyze format: Output format - 'html' for cleaned HTML or 'markdown' for raw markdown ctx: MCP context for logging and progress reporting Returns: str: The webpage content in the requested format (HTML or Markdown) Raises: Exception: If the webpage cannot be accessed or processed zStarting webpage analysis for: N�zhttp://zhttps://�Invalid URL format: �). URL must start with http:// or https://zERROR: F��verbosezInitializing web crawler...��d�Fetching webpage...��progress�totalr3)r=�KzProcessing content...�Failed to crawl �: � Unknown error�markdown�zNo markdown content availablezReturning markdown content (� characters)zNo HTML content availablez Returning cleaned HTML content (zAnalysis completez" z  z  z zUnexpected error analyzing T��exc_info)r5� startswithr4r9r �report_progress�arun�success� error_messagerP�raw_markdown�len�cleaned_htmlr8r7) r=rrr;�crawler�result�content�metadatar:s r�get_page_structurera�s ��0��h�h�8��>�?�?�?�>%��~�~�5�6�6�.�s�e�3\�]�I��i�i� �*�*�*��L�L��#��Y�K�(�(�#�5�1�1�W��h�h�<�=�=�=��)�)�2�S�J_�)�`�`�`�"�<�<�C�<�0�0�F��)�)�2�S�Ja�)�b�b�b��>�>�.�s�e�2�f�6J�6J�6]�o�5^�_� ��)�)�I�.�.�.��Y�'� ��,�'2�1�1�,��#�:@�/�/�&�/�/�6�6�r��=�G��(�(�%A�#�g�,��|�#\�]�]�]�� -�-�3��9�G��(�(�%E�c�'�l�^�S_�#`�a�a�a��)�)�3�c�K^�)�_�_�_�=�H��*�S�E��0�0�H��-��x�v�6�6�H��/��G��~�=N�O�O�H��.��(8��A�A�H��%�[2�1�1� @�+� 2�=�a�0�c�/�#2�6^�b�`�I2�1�1�1��^�%�1�#��b��Q��A� ��Y��.��)�)�I�&�&�&��$�$��%�su�L;�J�L;�7K�J�K�<L;�= K� J"�K�J?�*J%�+J?� J(�J?�!J+�"J?�J.�AJ?� J1�J?�+K�7J4�8K�<L;�=A J?�J7�AJ?�)J9�*J?� J;� AJ?� K�J=�K�L;�K�"K�%J?�(J?�+J?�.J?�1J?�4K�7J?�9J?�;J?�=K�?K�K�K�K�L;�K� L8�#AL3�#L&�$ L3�-L8�.L;�3L8�8L;z5The URL of the webpage to crawl and extract data from�extraction_schemaz�JSON string containing the extraction schema with field names and CSS selectors. Example: '{"title": "h1", "price": ".price", "description": ".desc"}'c �� # �U(aURSU35IShv�N SSKJn SSKnUR S5(dNSUS3nU(aURU5IShv�N [RU5 URSU05$URU5n[U[5(d[S 5eU(d[S 5eU(a?URS[U5S [UR!5535IShv�N U"USS9n[#SS9IShv�Nn U(aURS5IShv�N U(aUR%SSSS9IShv�N U R'UUS9IShv�Nn U(aUR%SSSS9IShv�N U R((dtSUSU R*=(d S3nU(aURU5IShv�N [RU5 URSU05sSSS5IShv�N $U R,nU(aUR%SSSS9IShv�N U(dEU(aURS5IShv�N UR0SS.5sSSS5IShv�N $[U[5(aURU5nU(aVUR%SSS!S9IShv�N URS"[U[5(a[U5OS#S$35IShv�N UUUS%[/U S&5(aU R0OSS'.nURUS(SS)9sSSS5IShv�N $GNXGN!UR[4aanS[U53nU(aURU5IShv�N [RU5 URSU05sSnA$SnAff=fGN�GN�GN�GN�GNyGN\GNGN�GN�GN�GNo!URa S U0nGN\f=fGNEGNN�!,IShv�N(df g=f![2adnS*US[U53n[RUS%S+9 U(aURU5IShv�N WRUUSS,.S(S-9sSnA$SnAff=f7f).a� Execute precision data extraction using AI-generated schemas with JsonCssExtractionStrategy. This is the 'hands' tool that performs targeted data extraction based on schemas provided by the client AI. It uses CSS selectors to extract specific data points from webpages and returns structured JSON results. Args: url: The URL of the webpage to crawl and extract data from extraction_schema: JSON string defining field names and their CSS selectors ctx: MCP context for logging and progress reporting Returns: str: JSON string containing the extracted data according to the schema Raises: Exception: If the webpage cannot be accessed, schema is invalid, or extraction fails z&Starting schema-based extraction for: Nr)�JsonCssExtractionStrategyrArBrCr4zSchema must be a JSON objectzSchema cannot be emptyzInvalid extraction schema: zParsed extraction schema with z fields: FrDz4Initializing web crawler with extraction strategy...rFrGrHrI)r=�extraction_strategyrLzExtracting data with schema...rMrNrO�ZzProcessing extracted data...z*No data extracted - returning empty resultz%No data matched the extraction schema)�extracted_datar3�raw_extracted_contentzExtraction completezSuccessfully extracted �z data pointsT�success_timestamp)r=rbrgrX� timestamp�)�indent�ensure_asciiz/Unexpected error during schema extraction from rS�r4r=rX�rm)r5�crawl4ai.extraction_strategyrd�jsonrUr4r9�dumps�loads� isinstance�dict� ValueError�JSONDecodeErrorr7r[�list�keysr rVrWrXrY�extracted_content�hasattrrjr8) r=rbrrdrrr;�schema_dictr:rer]r^rg�responses r�crawl_with_schemar�s��0��h�h�?��u�E�F�F�F�f�J��~�~�5�6�6�.�s�e�3\�]�I��i�i� �*�*�*��L�L��#��:�:�w� �2�3�3� 4��*�*�%6�7�K��k�4�0�0� �!?�@�@�� !9�:�:��(�(�;�C��<L�;M�Y�W[�\g�\l�\l�\n�Wo�Vp�q�r�r�r�8��U�S��#�5�1�1�W��h�h�U�V�V�V��)�)�2�S�J_�)�`�`�`�"�<�<��$7�(��F� ��)�)�2�S�Jj�)�k�k�k��>�>�.�s�e�2�f�6J�6J�6]�o�5^�_� ��)�)�I�.�.�.��Y�'��z�z�7�I�"6�7�-2�1�1�2$�5�5�N��)�)�2�S�Jh�)�i�i�i�"��(�(�#O�P�P�P��z�z�R�Dk�"l�m�E2�1�1�J�.�#�.�.�O�%)�Z�Z��%?�N� ��)�)�3�c�K`�)�a�a�a��h�h�!8�PZ�[i�ko�Pp�Pp��^�9L�vw�8x�yE� F�G�G�G��%0�"0��9@��I\�9]�9]�V�5�5�cg��H��:�:�h�q�u�:�E�s2�1�1�I G�+��$�$�j�1� 4�5�c�!�f�X�>�I��i�i� �*�*�*��L�L��#��:�:�w� �2�3�3�� 4�� s�2�V�a��l�/�)2�8j� Q�C2��P�+�+�O�&=�~�%N�N�O�� b�G�]2�1�1�1��v� �E�c�U�"�S�QR�V�H�U� ��Y��.��)�)�I�&�&�&��z�z�� s�S�M9�S�AQ�(M<�)+Q�S�AM?�AQ�O9�Q�4O<�5Q�8P=�O?�P=�4P�5P=�P� P=�,P�-AP=�8P�9+P=�$Q�0P�1Q�5S�6'P=�P�&P=�P�P=�Q�(P�)Q�-S�.P=�P�P=�0P5�1<P=�-P8�.9P=�'Q�3P;�4Q�8S�<Q�?O6�)O1�>O�?,O1�+O6�,Q�0S�1O6�6Q�<Q�?P=�P=�P=�P=�P=�Q�P=�P=�Q�P2�.P=�1P2�2P=�8P=�;Q�=Q�Q�Q�Q�S�Q� S�!AS�!R$�"S�:S�;S�S�Sz1The URL of the webpage to capture as a screenshotc ��# �U(aURSU35IShv�N SSKnSSKnSSKJn URS5(dNSUS3nU(aUR U5IShv�N [R U5 URSU05$U"S S S9n[S S9IShv�NnU(aURS 5IShv�N U(aURSSSS9IShv�N URXS9IShv�NnU(aURSSSS9IShv�N UR(dtSUSUR=(d S3nU(aUR U5IShv�N [R U5 URSU05sSSS5IShv�N $URn U(aURSSSS9IShv�N U (d[SnU(aUR U5IShv�N [R U5 URSU05sSSS5IShv�N $[U [ 5(aU n O UR#U 5R%S5n U(a?URSSSS9IShv�N URS['U 5S35IShv�N UU SS [)US 5(aUR*OS['U 5S!S".S#.nURUS$S%9sSSS5IShv�N $GN�GN�GNfGNIGN,GNGN�GN�GN}GNSGN-N�N�N|N,!,IShv�N(df g=f![,adnS&US[!U53n[R US S'9 U(aUR U5IShv�N WRUUS S(.S$S%9sSnA$SnAff=f7f))a1 Capture a visual screenshot of a webpage for media representation. This is the visual capture tool that provides screenshot images of webpages for the client AI to analyze. It returns base64-encoded image data that can be processed by FastMCP's native image handling capabilities. Args: url: The URL of the webpage to capture ctx: MCP context for logging and progress reporting Returns: str: JSON string containing base64-encoded screenshot data and metadata Raises: Exception: If the webpage cannot be accessed or screenshot fails z!Starting screenshot capture for: Nr)�CrawlerRunConfigrArBrCr4TF)� screenshotrErDz2Initializing web crawler for screenshot capture...rFrGzLoading webpage...rI)r=�configrLzCapturing screenshot...z"Failed to capture screenshot from rNrOrfzProcessing screenshot data...z8No screenshot data captured - screenshot may have failedzutf-8zScreenshot capture completez"Successfully captured screenshot (rR� base64_pngrj�PNG)� data_size�image_format)r=�screenshot_datarrXrkr`rlrpz0Unexpected error during screenshot capture from rSro)r5rr�base64r+r�rUr4r9rsr rVrWrXrYr�rur7� b64encode�decoder[r|rjr8) r=rrrr�r�r;r�r]r^r��screenshot_base64r~r:s r�take_screenshotr�`s��,��h�h�:�3�%�@�A�A�A�\��-��~�~�5�6�6�.�s�e�3\�]�I��i�i� �*�*�*��L�L��#��:�:�w� �2�3�3�"�� #�5�1�1�W��h�h�S�T�T�T��)�)�2�S�J^�)�_�_�_�"�<�<�C�<�?�?�F��)�)�2�S�Jc�)�d�d�d��>�>�@��R��H\�H\�Ho�`o�Gp�q� ��)�)�I�.�.�.��Y�'��z�z�7�I�"6�7�'2�1�1�,%�/�/�O��)�)�2�S�Ji�)�j�j�j�#�V� ��)�)�I�.�.�.��Y�'��z�z�7�I�"6�7�C2�1�1�H�/�3�/�/�$3�!�%+�$4�$4�_�$E�$L�$L�W�$U�!��)�)�3�c�Kh�)�i�i�i��h�h�!C�C�HY�DZ�C[�[g�h�i�i�i��#4�&��9@��I\�9]�9]�V�5�5�cg�!$�%6�!7�$)�� H��:�:�h�q�:�1�y2�1�1�/ B�+�2�T�`�?�e�/�#2�2k�/�?2�Xj�i�[2�1�1�1��|� �F�s�e�2�c�RS�f�X�V� ��Y��.��)�)�I�&�&�&��z�z�� s��O�L �O�AM�,L�-+M�O�M�-L�.M�1L6� L�L6�-L�.L6�L�L6�$L�%AL6�0L"�1+L6�M�(L%�)M�-O�.'L6�L(�(L6�>L+�?+L6�*M�6L.�7M�;O�<AL6�L0�%L6�5L2�6AL6�;M�L4�M�O�M�M�L6�L6�L6�L6�"L6�%M�(L6�+L6�.M�0L6�2L6�4M�6M �<L?�=M � M�O� M� N>�AN9�N�N9�3N>�4O�9N>�>Oc��[RS5 [RS5 [RS[R35 [RS5 [RS5 [RS5 [RS5 [R SS 9 g![ a [RS 5 g[a'n[RS[U535 eSnAff=f) z-Main entry point for the Crawl4AI MCP server.z#Initializing Crawl4AI MCP Server...zServer configuration:z - Name: z% - Transport: stdio (MCP compatible)z - Logging: stderr (stdio-safe)zP - Tools: server_status, get_page_structure, crawl_with_schema, take_screenshotz'Starting server with stdio transport...r")r/z!Server shutdown requested by userzServer startup failed: N) r9r5�mcpr�run�KeyboardInterruptr8r4r7)r:s r�mainr��s��9�:��+�,��j�� +�,��;�<��6�7��f�g��=�>� ��'��"��9��7�8��.�s�1�v�h�7�8� ��s�B7B:�:D� D�$"D�D�__main__)�htmlN)N) �__doc__�loggingr�typingrr�pathlibrr*rr�pydanticr�typing_extensionsr r+r �basicConfig�INFOr� getLogger�__name__r9rr��toolr7r<rarr�r��rr�<module>r�s ��&� ��$��'�$�� ,�,�A��:�:� � � � � �8� $��,� � �� I� ��@ �W�@ ��c�3�h��@ ��@ �D��W]��X%� �3��*M�N�N� O�X%��c�5�-t�R�S�S� T�X%� �X%� � X%��X%�t��@� �3��*a�b�b� c�@� ��e�9]�'^�"^�_�@� �@� � @��@�D��t� �3��*]�^�^� _�t� �t� �t��t�n�*�z��F�r

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Nexus-Digital-Automations/crawl4ai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

crawl4ai_mcp_server.cpython-313.pyc•21.9 KiB