Skip to main content
Glama
crawl4ai_mcp_server.cpython-313.pyc22.4 kB
� �0�h�M� ��SrSSKrSSKrSSKJrJr SSKJr SSKJ r J r SSK J r SSK Jr SSKJr \R""\R$S \R&S S 9 \R("\5rS r\ "S SSS9r\R35S\ S\\\44Sj5r\R35S$S\\\ "SS94S\\\ "SSS94S\ S\4Sjj5r\R35S%S\\\ "SS94S\\\ "SS94S\ S\4Sjj5r\R35S%S\\\ "S S94S\ S\4S!jj5rS"r\S#:Xa\"5 gg)&a� Crawl4AI MCP Server A FastMCP server that provides web scraping and crawling capabilities using Crawl4AI. This server exposes three core tools: 1. get_page_structure - The "eyes" for analyzing webpage structure 2. crawl_with_schema - The "hands" for executing precise extraction schemas 3. take_screenshot - Media capture for visual representation This server is designed to work with client-side AI that acts as the "brain" to analyze and command the scraping operations. Architecture: - FastMCP handles MCP protocol and tool registration - AsyncWebCrawler provides web scraping capabilities - Proper logging to stderr prevents MCP stdio corruption - All tools use async/await patterns for non-blocking operation �N)�Dict�Any)�Path)�FastMCP�Context)�Field)� Annotated)�AsyncWebCrawlerz4%(asctime)s - %(name)s - %(levelname)s - %(message)sT)�level�format�stream�forcec�:�[US[R0UD6 g)z4Safe printing that goes to stderr instead of stdout.�fileN)�print�sys�stderr)�args�kwargss �V/Users/jeremyparker/Desktop/Claude Coding Projects/crawl4ai-mcp/crawl4ai_mcp_server.py� safe_printr,s�� �4�+�c�j�j�+�F�+��Crawl4AI-MCP-Server�1.0.0u�This server provides web scraping capabilities using Crawl4AI. The server acts as the 'hands and eyes' while the client AI acts as the 'brain'. Available tools: • get_page_structure: Analyze webpage structure and content • crawl_with_schema: Execute precise data extraction using schemas • take_screenshot: Capture visual representation of webpages All tools support proper error handling and async operation.)�name�version� instructions�ctx�returnc ��# �URS5IShv�N [R"5nSSSS[U5/SQSS S S .S S .nURS5IShv�N U$NNN![aMnS[U53n[ R U5 UR U5IShv�N SUSS.sSnA$SnAff=f7f)a Get the current status and capabilities of the Crawl4AI MCP server. This tool provides comprehensive information about server health, available features, configuration status, and operational capabilities. Use this to verify server connectivity and understand what web scraping capabilities are available. Returns: dict: Server status information including: - server_name: The name of the MCP server - version: Current server version - status: Operational status (operational/error) - transport: Communication transport type (stdio) - working_directory: Current server working directory - capabilities: List of available server capabilities - dependencies: Status of key dependencies - message: Human-readable status message Example response: { "server_name": "Crawl4AI-MCP-Server", "version": "1.0.0", "status": "operational", "capabilities": ["web_crawling", "content_extraction", "screenshot_capture", "schema_based_extraction"] } zChecking server status...Nrr� operational�stdio)� web_crawling�content_extraction�screenshot_capture�schema_based_extraction� installed�ready� configured)�fastmcp�crawl4ai� playwrightz+Server is ready to accept crawling requests)� server_namer�status� transport�working_directory� capabilities� dependencies�messagez*Server status check completed successfullyzFailed to get server status: �error)r.r4r-)�infor�cwd�str� Exception�loggerr4)rr6r.�e� error_msgs r� server_statusr<?s����8% ��h�h�2�3�3�3��h�h�j��1��#� �!$�S���'�#�*�� E�# ��(�h�h�C�D�D�D�� �5 4�2 E�� � �3�C��F�8�<� �� � �Y���i�i� �"�"�"���0� � ��  �sc�C�A+�A'�AA+� A)�!A+�&C�'A+�)A+�+ C�57B=�,B/�- B=�7C�8C�=C�C�urlz!The URL of the webpage to analyze)� descriptionr zEOutput format: 'html' for cleaned HTML or 'markdown' for raw markdownz^(html|markdown)$)r>�patternc��z# �U(aURSU35IShv�N URS5(d@SUS3nU(aURU5IShv�N [RU5 SU3$[ SS9IShv�N nU(aURS 5IShv�N U(aUR S S S S 9IShv�N UR US9IShv�N nU(aUR SS SS 9IShv�N UR(dfSUSUR=(d S3nU(aURU5IShv�N [RU5 SU3sSSS5IShv�N $US:Xa`UR(aURROSnU(dSnU(a&URS[U5S35IShv�N OKUR=(d SnU(dSnU(a&URS[U5S35IShv�N U(aUR S S SS 9IShv�N SnUSUS3- nUSUS3- nUS [U5S!3- nUS"URS#3- nXv-sSSS5IShv�N $GN{GNAGNGN�GN�GN�GN�GNdGN=N�N�NqN&!,IShv�N (df  g=f![aUnS$US[U53n[RUS%S&9 U(aURU5IShv�N SU3sSnA$SnAff=f7f)'ar Fetch and analyze the structural content of a webpage for AI analysis. This is the fundamental "eyes" tool that provides the raw material for client AI to understand webpage structure. It returns clean, structured content without executing any extraction schemas. Args: url: The URL of the webpage to crawl and analyze format: Output format - 'html' for cleaned HTML or 'markdown' for raw markdown ctx: MCP context for logging and progress reporting Returns: str: The webpage content in the requested format (HTML or Markdown) Raises: Exception: If the webpage cannot be accessed or processed zStarting webpage analysis for: N�zhttp://zhttps://�Invalid URL format: �). URL must start with http:// or https://zERROR: F��verbosezInitializing web crawler...��d�Fetching webpage...��progress�totalr3)r=�KzProcessing content...�Failed to crawl �: � Unknown error�markdown�zNo markdown content availablezReturning markdown content (� characters)zNo HTML content availablez Returning cleaned HTML content (zAnalysis completez"<!-- Webpage Analysis Results --> z <!-- URL: z --> z <!-- Format: z<!-- Content Length: z characters --> z<!-- Success: z --> zUnexpected error analyzing T��exc_info)r5� startswithr4r9r �report_progress�arun�success� error_messagerP� raw_markdown�len� cleaned_htmlr8r7) r=r rr;�crawler�result�content�metadatar:s r�get_page_structurera�s ���0 ��h�h�8���>�?�?�?�>%��~�~�5�6�6�.�s�e�3\�]�I���i�i� �*�*�*� �L�L�� #��Y�K�(� (�#�5�1�1�W���h�h�<�=�=�=���)�)�2�S�J_�)�`�`�`�"�<�<�C�<�0�0�F���)�)�2�S�Ja�)�b�b�b��>�>�.�s�e�2�f�6J�6J�6]�o�5^�_� ���)�)�I�.�.�.�� � �Y�'� �� �,�'2�1�1�,��#�:@�/�/�&�/�/�6�6�r���=�G���(�(�%A�#�g�,��|�#\�]�]�]�� �-�-�3����9�G���(�(�%E�c�'�l�^�S_�#`�a�a�a���)�)�3�c�K^�)�_�_�_�=�H� �*�S�E��0� 0�H� �-��x�v�6� 6�H� �/��G� �~�=N�O� O�H� �.����(8��A� A�H��%�[2�1�1� @�+� 2�=�a�0�c� /�#2�6^� b�`�I2�1�1�1��^ �%�1�#��b��Q���A� �� � �Y�� �.� ��)�)�I�&� &� &��� �$�$�� %�su�L;�J�L;�7K�J�K�<L;�= K� J"� K�J?�*J%�+J?� J(� J?�!J+�"J?�J.�A J?� J1�J?�+ K�7J4�8K�<L;�=A J?�J7�A J?�)J9�*J?� J;� AJ?� K�J=�K�L;�K�"K�%J?�(J?�+J?�.J?�1J?�4K�7J?�9J?�;J?�=K�?K�K � K�K�L;�K� L8�#AL3�#L&�$ L3�-L8�.L;�3L8�8L;z5The URL of the webpage to crawl and extract data from�extraction_schemaz�JSON string containing the extraction schema with field names and CSS selectors. Example: '{"title": "h1", "price": ".price", "description": ".desc"}'c �� # �U(aURSU35IShv�N SSKJn SSKnUR S5(dNSUS3nU(aUR U5IShv�N [ R U5 URSU05$URU5n[U[5(d [S 5eU(d [S 5eU(a?URS [U5S [UR!5535IShv�N U"USS9n[#SS9IShv�N n U(aURS5IShv�N U(aUR%SSSS9IShv�N U R'UUS9IShv�N n U(aUR%SSSS9IShv�N U R((dtSUSU R*=(d S3nU(aUR U5IShv�N [ R U5 URSU05sSSS5IShv�N $U R,n U(aUR%SSSS9IShv�N U (dEU(aURS5IShv�N UR0SS.5sSSS5IShv�N $[U [5(aURU 5n U(aVUR%SSS!S9IShv�N URS"[U [5(a [U 5OS#S$35IShv�N UUU S%[/U S&5(a U R0OSS'.n URU S(SS)9sSSS5IShv�N $GNXGN!UR[4aanS [U53nU(aUR U5IShv�N [ R U5 URSU05sSnA$SnAff=fGN�GN�GN�GN�GNyGN\GNGN�GN�GN�GNo!URa S U 0n GN\f=fGNEGN N�!,IShv�N (df  g=f![2adnS*US[U53n[ R US%S+9 U(aUR U5IShv�N WRUUSS,.S(S-9sSnA$SnAff=f7f).a� Execute precision data extraction using AI-generated schemas with JsonCssExtractionStrategy. This is the 'hands' tool that performs targeted data extraction based on schemas provided by the client AI. It uses CSS selectors to extract specific data points from webpages and returns structured JSON results. Args: url: The URL of the webpage to crawl and extract data from extraction_schema: JSON string defining field names and their CSS selectors ctx: MCP context for logging and progress reporting Returns: str: JSON string containing the extracted data according to the schema Raises: Exception: If the webpage cannot be accessed, schema is invalid, or extraction fails z&Starting schema-based extraction for: Nr)�JsonCssExtractionStrategyrArBrCr4zSchema must be a JSON objectzSchema cannot be emptyzInvalid extraction schema: zParsed extraction schema with z fields: FrDz4Initializing web crawler with extraction strategy...rFrGrHrI)r=�extraction_strategyrLzExtracting data with schema...rMrNrO�ZzProcessing extracted data...z*No data extracted - returning empty resultz%No data matched the extraction schema)�extracted_datar3�raw_extracted_contentzExtraction completezSuccessfully extracted �z data pointsT�success_timestamp)r=rbrgrX� timestamp�)�indent� ensure_asciiz/Unexpected error during schema extraction from rS�r4r=rX�rm)r5�crawl4ai.extraction_strategyrd�jsonrUr4r9�dumps�loads� isinstance�dict� ValueError�JSONDecodeErrorr7r[�list�keysr rVrWrXrY�extracted_content�hasattrrjr8) r=rbrrdrrr;� schema_dictr:rer]r^rg�responses r�crawl_with_schemar�s����0 ��h�h�?��u�E�F�F�F�f�J���~�~�5�6�6�.�s�e�3\�]�I���i�i� �*�*�*� �L�L�� #��:�:�w� �2�3� 3� 4��*�*�%6�7�K��k�4�0�0� �!?�@�@�� �!9�:�:�� ��(�(�;�C� �<L�;M�Y�W[�\g�\l�\l�\n�Wo�Vp�q�r� r� r�8� �U�S��#�5�1�1�W���h�h�U�V�V�V���)�)�2�S�J_�)�`�`�`�"�<�<��$7�(���F� ��)�)�2�S�Jj�)�k�k�k��>�>�.�s�e�2�f�6J�6J�6]�o�5^�_� ���)�)�I�.�.�.�� � �Y�'��z�z�7�I�"6�7�-2�1�1�2$�5�5�N���)�)�2�S�Jh�)�i�i�i�"���(�(�#O�P�P�P��z�z�R�Dk�"l�m�E2�1�1�J�.�#�.�.�O�%)�Z�Z��%?�N� ��)�)�3�c�K`�)�a�a�a��h�h�!8�PZ�[i�ko�Pp�Pp��^�9L�vw�8x�yE� F�G�G�G��%0�"0��9@��I\�9]�9]�V�5�5�cg� �H��:�:�h�q�u�:�E�s2�1�1�I G�+���$�$�j�1� 4�5�c�!�f�X�>�I���i�i� �*�*�*� �L�L�� #��:�:�w� �2�3� 3��  4�� s� 2�V�a�� l� /�)2�8j� Q�C2��P�+�+�O�&=�~�%N�N�O�� b�G�]2�1�1�1��v � �E�c�U�"�S�QR�V�H�U� �� � �Y�� �.� ��)�)�I�&� &� &��z�z���� �� �� �� �s�S�M9�S�AQ�(M<�)+Q�S�AM?�AQ�O9�Q�4O<�5Q�8P=�O?�P=�4P�5P=� P� P=�,P�-A P=�8P �9+P=�$ Q�0P�1Q�5S�6'P=�P�&P=�P�P=� Q�(P�)Q�-S�.P=�P�P=�0P5�1<P=�-P8�.9P=�' Q�3P;�4Q�8S�<Q�?O6�)O1�>O�?,O1�+O6�,Q�0S�1O6�6Q�<Q�?P=�P=�P=�P=� P=�Q�P=�P=�Q�P2�.P=�1P2�2P=�8P=�;Q�=Q�Q � Q�Q�S�Q� S�!AS�!R$�"S�:S�;S�S�Sz1The URL of the webpage to capture as a screenshotc ��# �U(aURSU35IShv�N SSKnSSKnSSKJn UR S5(dNSUS3nU(aUR U5IShv�N [R U5 URSU05$U"S S S 9n[S S 9IShv�N nU(aURS 5IShv�N U(aURSSSS9IShv�N URXS9IShv�N nU(aURSSSS9IShv�N UR(dtSUSUR=(d S3nU(aUR U5IShv�N [R U5 URSU05sSSS5IShv�N $URn U(aURSSSS9IShv�N U (d[SnU(aUR U5IShv�N [R U5 URSU05sSSS5IShv�N $[U [ 5(aU n O UR#U 5R%S5n U(a?URSSSS9IShv�N URS['U 5S35IShv�N UU SS [)US 5(a UR*OS['U 5S!S".S#.n URU S$S%9sSSS5IShv�N $GN�GN�GNfGNIGN,GNGN�GN�GN}GNSGN-N�N�N|N,!,IShv�N (df  g=f![,adn S&US[!U 53n[R US S'9 U(aUR U5IShv�N WRUUS S(.S$S%9sSn A $Sn A ff=f7f))a1 Capture a visual screenshot of a webpage for media representation. This is the visual capture tool that provides screenshot images of webpages for the client AI to analyze. It returns base64-encoded image data that can be processed by FastMCP's native image handling capabilities. Args: url: The URL of the webpage to capture ctx: MCP context for logging and progress reporting Returns: str: JSON string containing base64-encoded screenshot data and metadata Raises: Exception: If the webpage cannot be accessed or screenshot fails z!Starting screenshot capture for: Nr)�CrawlerRunConfigrArBrCr4TF)� screenshotrErDz2Initializing web crawler for screenshot capture...rFrGzLoading webpage...rI)r=�configrLzCapturing screenshot...z"Failed to capture screenshot from rNrOrfzProcessing screenshot data...z8No screenshot data captured - screenshot may have failedzutf-8zScreenshot capture completez"Successfully captured screenshot (rR� base64_pngrj�PNG)� data_size� image_format)r=�screenshot_datar rXrkr`rlrpz0Unexpected error during screenshot capture from rSro)r5rr�base64r+r�rUr4r9rsr rVrWrXrYr�rur7� b64encode�decoder[r|rjr8) r=rrrr�r�r;r�r]r^r��screenshot_base64r~r:s r�take_screenshotr�`s����, ��h�h�:�3�%�@�A�A�A�\���-��~�~�5�6�6�.�s�e�3\�]�I���i�i� �*�*�*� �L�L�� #��:�:�w� �2�3� 3�"��� �� #�5�1�1�W���h�h�S�T�T�T���)�)�2�S�J^�)�_�_�_�"�<�<�C�<�?�?�F���)�)�2�S�Jc�)�d�d�d��>�>�@���R��H\�H\�Ho�`o�Gp�q� ���)�)�I�.�.�.�� � �Y�'��z�z�7�I�"6�7�'2�1�1�,%�/�/�O���)�)�2�S�Ji�)�j�j�j�#�V� ���)�)�I�.�.�.�� � �Y�'��z�z�7�I�"6�7�C2�1�1�H�/�3�/�/�$3�!�%+�$4�$4�_�$E�$L�$L�W�$U�!���)�)�3�c�Kh�)�i�i�i��h�h�!C�C�HY�DZ�C[�[g�h�i�i�i��#4�&��9@��I\�9]�9]�V�5�5�cg�!$�%6�!7�$)�� �H��:�:�h�q�:�1�y2�1�1�/ B�+�2�T�`�?�e� /�#2�2k� /�?2�Xj�i�[2�1�1�1��| � �F�s�e�2�c�RS�f�X�V� �� � �Y�� �.� ��)�)�I�&� &� &��z�z���� �� �� �� �s��O�L �O�AM�,L�-+M�O�M�-L�.M�1L6� L�L6�-L�.L6�L�L6�$L�%A L6�0L"�1+L6� M�(L%�)M�-O�.'L6�L(�(L6�>L+�?+L6�* M�6L.�7M�;O�<AL6�L0�%L6�5L2�6AL6�; M�L4�M� O�M�M�L6�L6�L6�L6�"L6�%M�(L6�+L6�.M�0L6�2L6�4M�6M �<L? �= M � M� O� M� N>�AN9�N�N9�3N>�4O�9N>�>Oc��[RS5 [RS5 [RS[R35 [RS5 [RS5 [RS5 [RS5 [R SS 9 g ![ a [RS 5 g [ a'n[RS [U535 eS nAff=f) z-Main entry point for the Crawl4AI MCP server.z#Initializing Crawl4AI MCP Server...zServer configuration:z - Name: z% - Transport: stdio (MCP compatible)z - Logging: stderr (stdio-safe)zP - Tools: server_status, get_page_structure, crawl_with_schema, take_screenshotz'Starting server with stdio transport...r")r/z!Server shutdown requested by userzServer startup failed: N) r9r5�mcpr�run�KeyboardInterruptr8r4r7)r:s r�mainr��s����� � �9�:�� � �+�,�� � �j���� �+�,�� � �;�<�� � �6�7�� � �f�g�� � �=�>� ���'��"�� �9�� � �7�8� ��� � �.�s�1�v�h�7�8� ���s�B7B:�:D � D �$"D�D �__main__)�htmlN)N) �__doc__�loggingr�typingrr�pathlibrr*rr�pydanticr�typing_extensionsr r+r � basicConfig�INFOr� getLogger�__name__r9rr��toolr7r<rarr�r��rr�<module>r�s ���&� ���$��'�$���� �,�,� A� �:�:� � � � � �8� $��,� � � � I� ������@ �W�@ ��c�3�h��@ � �@ �D����W]��X%� �3��*M�N�N� O�X%� �c�5�-t�R�S�S� T�X%� �X%� � X%� �X%�t�����@� �3��*a�b�b� c�@� ��e�9]�'^�"^�_�@� �@� � @� �@�D�����t� �3��*]�^�^� _�t� �t� �t� �t�n�* �z���F�r

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Nexus-Digital-Automations/crawl4ai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server