Crawl4AI MCP Server

crawl4ai-mcp
__pycache__

crawl4ai_mcp_server.cpython-311.pyc•24.3 kB

� �0�h�M�� dZddlZddlZddlmZmZddlmZddlm Z m Z ddlmZddl mZddlmZejejd ejd ��eje��Zd�Ze d dd��Ze��de deeeffd��Ze�� d%deeed��fdeeedd��fde defd��Ze�� d&deeed��fdeeed��fde defd ��Ze�� d&deeed!��fde defd"��Zd#�Zed$kre��dSdS)'a� Crawl4AI MCP Server A FastMCP server that provides web scraping and crawling capabilities using Crawl4AI. This server exposes three core tools: 1. get_page_structure - The "eyes" for analyzing webpage structure 2. crawl_with_schema - The "hands" for executing precise extraction schemas 3. take_screenshot - Media capture for visual representation This server is designed to work with client-side AI that acts as the "brain" to analyze and command the scraping operations. Architecture: - FastMCP handles MCP protocol and tool registration - AsyncWebCrawler provides web scraping capabilities - Proper logging to stderr prevents MCP stdio corruption - All tools use async/await patterns for non-blocking operation �N)�Dict�Any)�Path)�FastMCP�Context)�Field)� Annotated)�AsyncWebCrawlerz4%(asctime)s - %(name)s - %(levelname)s - %(message)sT)�level�format�stream�forcec�6�t|dtji|��dS)z4Safe printing that goes to stderr instead of stdout.�fileN)�print�sys�stderr)�args�kwargss �V/Users/jeremyparker/Desktop/Claude Coding Projects/crawl4ai-mcp/crawl4ai_mcp_server.py� safe_printr,s"�� 4�+�c�j�+�F�+�+�+�+�+��Crawl4AI-MCP-Server�1.0.0u�This server provides web scraping capabilities using Crawl4AI. The server acts as the 'hands and eyes' while the client AI acts as the 'brain'. Available tools: • get_page_structure: Analyze webpage structure and content • crawl_with_schema: Execute precise data extraction using schemas • take_screenshot: Capture visual representation of webpages All tools support proper error handling and async operation.)�name�version�instructions�ctx�returnc ��K� |�d��d{V��tj��}ddddt|��gd�dd d d�dd �}|�d��d{V��|S#t$rW}dt|��}t �|��|�|��d{V��d|dd�cYd}~Sd}~wwxYw)a} Get the current status and capabilities of the Crawl4AI MCP server. This tool provides comprehensive information about server health, available features, configuration status, and operational capabilities. Use this to verify server connectivity and understand what web scraping capabilities are available. Returns: dict: Server status information including: - server_name: The name of the MCP server - version: Current server version - status: Operational status (operational/error) - transport: Communication transport type (stdio) - working_directory: Current server working directory - capabilities: List of available server capabilities - dependencies: Status of key dependencies - message: Human-readable status message Example response: { "server_name": "Crawl4AI-MCP-Server", "version": "1.0.0", "status": "operational", "capabilities": ["web_crawling", "content_extraction", "screenshot_capture", "schema_based_extraction"] } zChecking server status...Nrr�operational�stdio)�web_crawling�content_extraction�screenshot_capture�schema_based_extraction� installed�ready� configured)�fastmcp�crawl4ai� playwrightz+Server is ready to accept crawling requests)�server_namer�status� transport�working_directory�capabilities�dependencies�messagez*Server status check completed successfullyzFailed to get server status: �error)r.r4r-)�infor�cwd�str� Exception�loggerr4)rr6r.�e� error_msgs r� server_statusr<?sO��8% ��h�h�2�3�3�3�3�3�3�3�3�3��h�j�j��1��#� �!$�S��'�#�*�� E�# � ��(�h�h�C�D�D�D�D�D�D�D�D�D�� <�C��F�F�<�<� ��Y��i�i� �"�"�"�"�"�"�"�"�"��0� � � � � � � � �� s�A(A-�- C�7AC �C� C�html�urlz!The URL of the webpage to analyze)�descriptionrzEOutput format: 'html' for cleaned HTML or 'markdown' for raw markdownz^(html|markdown)$)r?�patternc��K�|r|�d|��d{V�� |�d��sBd|�d�}|r|�|��d{V��t�|��d|��St d��4�d{V��}|r|�d ��d{V��|r|�d dd� ��d{V��|�|��d{V��}|r|�ddd� ��d{V��|js]d|�d|jpd��}|r|�|��d{V��t�|��d|��cddd��d{V��S|dkrH|j r|j j nd}|sd}|r,|�dt|��d��d{V��n;|jpd}|sd}|r,|�dt|��d��d{V��|r|�ddd� ��d{V��d}|d|�d�z }|d|�d�z }|d t|��d!�z }|d"|j�d#�z }||zcddd��d{V��S#1�d{V��swxYwYdS#t$r]}d$|�dt|��}t�|d%�&��|r|�|��d{V��d|��cYd}~Sd}~wwxYw)'a� Fetch and analyze the structural content of a webpage for AI analysis. This is the fundamental "eyes" tool that provides the raw material for client AI to understand webpage structure. It returns clean, structured content without executing any extraction schemas. Args: url: The URL of the webpage to crawl and analyze format: Output format - 'html' for cleaned HTML or 'markdown' for raw markdown ctx: MCP context for logging and progress reporting Returns: str: The webpage content in the requested format (HTML or Markdown) Raises: Exception: If the webpage cannot be accessed or processed zStarting webpage analysis for: N�zhttp://zhttps://�Invalid URL format: �). URL must start with http:// or https://zERROR: F��verbosezInitializing web crawler...��d�Fetching webpage...��progress�totalr3)r>�KzProcessing content...�Failed to crawl �: � Unknown error�markdown�zNo markdown content availablezReturning markdown content (� characters)zNo HTML content availablez Returning cleaned HTML content (zAnalysis completez" z  z  z zUnexpected error analyzing T��exc_info)r5� startswithr4r9r �report_progress�arun�success� error_messagerQ�raw_markdown�len�cleaned_htmlr8r7) r>rrr;�crawler�result�content�metadatar:s r�get_page_structurerb�s��0�@��h�h�>��>�>�?�?�?�?�?�?�?�?�?�>%��~�~�5�6�6� )�]�s�]�]�]�I�� +��i�i� �*�*�*�*�*�*�*�*�*��L�L��#�#�#�(�Y�(�(�(�#�5�1�1�1�- &�- &�- &�- &�- &�- &�- &�W�� >��h�h�<�=�=�=�=�=�=�=�=�=�� a��)�)�2�S�J_�)�`�`�`�`�`�`�`�`�`�"�<�<�C�<�0�0�0�0�0�0�0�0�F�� c��)�)�2�S�Ja�)�b�b�b�b�b�b�b�b�b��>� -�_�s�_�_�f�6J�6]�o�_�_� ��/��)�)�I�.�.�.�.�.�.�.�.�.��Y�'�'�'�,��,�,�'- &�- &�- &�- &�- &�- &�- &�- &�- &�- &�- &�- &�- &�- &�,��#�#�:@�/�Q�&�/�6�6�r��>�=�G��^��(�(�#\�#�g�,�,�#\�#\�#\�]�]�]�]�]�]�]�]�]�� -�3��:�9�G��b��(�(�#`�c�'�l�l�#`�#`�#`�a�a�a�a�a�a�a�a�a�� `��)�)�3�c�K^�)�_�_�_�_�_�_�_�_�_�=�H��0�S�0�0�0�0�H��6��6�6�6�6�H��O��G��O�O�O�O�H��A��A�A�A�A�H��g�%�[- &�- &�- &�- &�- &�- &�- &�- &�- &�- &�- &�- &�- &�- &�- &�- &�- &�- &�- &�- &�- &�- &�- &�- &��- &�- &�- &�- &�- &�- &��^�%�%�%�A�#�A�A��Q��A�A� ��Y��.�.�.�� '��)�)�I�&�&�&�&�&�&�&�&�&�$��$�$�$�$�$�$�$�$��%��sW�AI:�;I:�CI'�I:�/C%I'�I:�' I1�1I:�4I1�5I:�: K!�AK�K!�K!z5The URL of the webpage to crawl and extract data from�extraction_schemaz�JSON string containing the extraction schema with field names and CSS selectors. Example: '{"title": "h1", "price": ".price", "description": ".desc"}'c ��f K�|r|�d|��d{V�� ddlm}ddl}|�d��sTd|�d�}|r|�|��d{V��t�|��|�d|i��S |�|��}t|t��std ��|std ��n~#|jtf$rj}dt|��}|r|�|��d{V��t�|��|�d|i��cYd}~Sd}~wwxYw|rM|�dt|��d t|��d{V��||d��}t#d��4�d{V��} |r|�d��d{V��|r|�ddd��d{V��| �||��d{V��} |r|�ddd��d{V��| jsod|�d| jpd��}|r|�|��d{V��t�|��|�d|i��cddd��d{V��S| j}|r|�ddd��d{V��|sG|r|�d��d{V��|�idd��cddd��d{V��St|t��r+ |�|��}n#|j$rd |i}YnwxYw|ra|�ddd!��d{V��|�d"t|t��rt|��nd#�d$��d{V��|||d%t/| d&��r| jndd'�}|�|d(d�)��cddd��d{V��S#1�d{V��swxYwYdS#t2$rs}d*|�dt|��}t�|d%�+��|r|�|��d{V��|�||dd,�d(�-��cYd}~Sd}~wwxYw).a Execute precision data extraction using AI-generated schemas with JsonCssExtractionStrategy. This is the 'hands' tool that performs targeted data extraction based on schemas provided by the client AI. It uses CSS selectors to extract specific data points from webpages and returns structured JSON results. Args: url: The URL of the webpage to crawl and extract data from extraction_schema: JSON string defining field names and their CSS selectors ctx: MCP context for logging and progress reporting Returns: str: JSON string containing the extracted data according to the schema Raises: Exception: If the webpage cannot be accessed, schema is invalid, or extraction fails z&Starting schema-based extraction for: Nr)�JsonCssExtractionStrategyrBrCrDr4zSchema must be a JSON objectzSchema cannot be emptyzInvalid extraction schema: zParsed extraction schema with z fields: FrEz4Initializing web crawler with extraction strategy...rGrHrIrJ)r>�extraction_strategyrMzExtracting data with schema...rNrOrP�ZzProcessing extracted data...z*No data extracted - returning empty resultz%No data matched the extraction schema)�extracted_datar3�raw_extracted_contentzExtraction completezSuccessfully extracted �z data pointsT�success_timestamp)r>rcrhrY� timestamp�)�indent�ensure_asciiz/Unexpected error during schema extraction from rT�r4r>rY�rn)r5�crawl4ai.extraction_strategyre�jsonrVr4r9�dumps�loads� isinstance�dict� ValueError�JSONDecodeErrorr7r\�list�keysr rWrXrYrZ�extracted_content�hasattrrkr8) r>rcrrersr;�schema_dictr:rfr^r_rh�responses r�crawl_with_schemar��s��0�G��h�h�E��E�E�F�F�F�F�F�F�F�F�F�f�J�J�J�J�J�J��~�~�5�6�6� 4�]�s�]�]�]�I�� +��i�i� �*�*�*�*�*�*�*�*�*��L�L��#�#�#��:�:�w� �2�3�3�3� 4��*�*�%6�7�7�K��k�4�0�0� A� �!?�@�@�@�� ;� �!9�:�:�:� ;��$�j�1� 4� 4� 4�>�c�!�f�f�>�>�I�� +��i�i� �*�*�*�*�*�*�*�*�*��L�L��#�#�#��:�:�w� �2�3�3�3�3�3�3�3�3�� 4�� s��(�(�q�C��<L�<L�q�q�W[�\g�\l�\l�\n�\n�Wo�Wo�q�q�r�r�r�r�r�r�r�r�r�8�7��U�S�S�S��#�5�1�1�1�9 F�9 F�9 F�9 F�9 F�9 F�9 F�W�� W��h�h�U�V�V�V�V�V�V�V�V�V�� a��)�)�2�S�J_�)�`�`�`�`�`�`�`�`�`�"�<�<��$7�(��F� � l��)�)�2�S�Jj�)�k�k�k�k�k�k�k�k�k��>� 8�_�s�_�_�f�6J�6]�o�_�_� ��/��)�)�I�.�.�.�.�.�.�.�.�.��Y�'�'�'��z�z�7�I�"6�7�7�-9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�2$�5�N�� j��)�)�2�S�Jh�)�i�i�i�i�i�i�i�i�i�"� n��Q��(�(�#O�P�P�P�P�P�P�P�P�P��z�z�R�Dk�"l�"l�m�m�E9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�J�.�#�.�.� O�O�%)�Z�Z��%?�%?�N�N��+�O�O�O�&=�~�%N�N�N�N�O�� G��)�)�3�c�K`�)�a�a�a�a�a�a�a�a�a��h�h� F�PZ�[i�ko�Pp�Pp�9w��^�9L�9L�9L�vw� F� F� F�G�G�G�G�G�G�G�G�G��%0�"0��9@��I\�9]�9]�g�V�5�5�cg��H��:�:�h�q�u�:�E�E�s9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F�9 F��9 F�9 F�9 F�9 F�9 F�9 F��v� � � �U�c�U�U�S�QR�V�V�U�U� ��Y��.�.�.�� '��)�)�I�&�&�&�&�&�&�&�&�&��z�z�� s��A2P3�A C#�"P3�#E�4AE�E�P3�E�A5P3�CP �1P3�AP �!P3�4P � M �P � M1�.P �0M1�1BP � P3� P*�*P3�-P*�.P3�3 R0�=A(R+�%R0�+R0z1The URL of the webpage to capture as a screenshotc ��dK�|r|�d|��d{V�� ddl}ddl}ddlm}|�d��sTd|�d�}|r|�|��d{V��t�|��|�d|i��S|d d ��}td ��4�d{V��}|r|�d ��d{V��|r|� ddd��d{V��|�||��d{V��}|r|� ddd��d{V��|jsod|�d|j pd��}|r|�|��d{V��t�|��|�d|i��cddd��d{V��S|j} |r|� ddd��d{V��| sbd}|r|�|��d{V��t�|��|�d|i��cddd��d{V��St| t ��r| } n(|�| ��d��} |rJ|� ddd��d{V��|�dt'| ��d��d{V��|| dd t)|d ��r|jndt'| ��d!d"�d#�}|�|d$�%��cddd��d{V��S#1�d{V��swxYwYdS#t,$rs}d&|�dt!|��}t�|d �'��|r|�|��d{V��|�||d d(�d$�%��cYd}~Sd}~wwxYw))aq Capture a visual screenshot of a webpage for media representation. This is the visual capture tool that provides screenshot images of webpages for the client AI to analyze. It returns base64-encoded image data that can be processed by FastMCP's native image handling capabilities. Args: url: The URL of the webpage to capture ctx: MCP context for logging and progress reporting Returns: str: JSON string containing base64-encoded screenshot data and metadata Raises: Exception: If the webpage cannot be accessed or screenshot fails z!Starting screenshot capture for: Nr)�CrawlerRunConfigrBrCrDr4TF)� screenshotrFrEz2Initializing web crawler for screenshot capture...rGrHzLoading webpage...rJ)r>�configrMzCapturing screenshot...z"Failed to capture screenshot from rOrPrgzProcessing screenshot data...z8No screenshot data captured - screenshot may have failedzutf-8zScreenshot capture completez"Successfully captured screenshot (rS� base64_pngrk�PNG)� data_size�image_format)r>�screenshot_datarrYrlrarmrqz0Unexpected error during screenshot capture from rTrp)r5rs�base64r+r�rVr4r9rtr rWrXrYrZr�rvr7� b64encode�decoder\r}rkr8) r>rrsr�r�r;r�r^r_r��screenshot_base64rr:s r�take_screenshotr�`s%��,�B��h�h�@�3�@�@�A�A�A�A�A�A�A�A�A�\�� -�-�-�-�-�-��~�~�5�6�6� 4�]�s�]�]�]�I�� +��i�i� �*�*�*�*�*�*�*�*�*��L�L��#�#�#��:�:�w� �2�3�3�3�"�!�� #�5�1�1�1�< 2�< 2�< 2�< 2�< 2�< 2�< 2�W�� U��h�h�S�T�T�T�T�T�T�T�T�T�� `��)�)�2�S�J^�)�_�_�_�_�_�_�_�_�_�"�<�<�C��<�?�?�?�?�?�?�?�?�F�� e��)�)�2�S�Jc�)�d�d�d�d�d�d�d�d�d��>� 8�q��q�q��H\�Ho�`o�q�q� ��/��)�)�I�.�.�.�.�.�.�.�.�.��Y�'�'�'��z�z�7�I�"6�7�7�'< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�,%�/�O�� k��)�)�2�S�Ji�)�j�j�j�j�j�j�j�j�j�#� 8�V� ��/��)�)�I�.�.�.�.�.�.�.�.�.��Y�'�'�'��z�z�7�I�"6�7�7�C< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�H�/�3�/�/� V�$3�!�!�%+�$4�$4�_�$E�$E�$L�$L�W�$U�$U�!�� j��)�)�3�c�Kh�)�i�i�i�i�i�i�i�i�i��h�h�h�C�HY�DZ�DZ�h�h�h�i�i�i�i�i�i�i�i�i��#4�&��9@��I\�9]�9]�g�V�5�5�cg�!$�%6�!7�!7�$)�� H��:�:�h�q�:�1�1�y< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2�< 2��< 2�< 2�< 2�< 2�< 2�< 2��|� � � �V�s�V�V�c�RS�f�f�V�V� ��Y��.�.�.�� '��)�)�I�&�&�&�&�&�&�&�&�&��z�z�� sd�A6L2�#L2�>CL�L2�/A8L�'L2�:CL�L2� L)�)L2�,L)�-L2�2 N/�<A(N*�$N/�*N/c�� t�d��t�d��t�dtj��t�d��t�d��t�d��t�d��t�d� ��dS#t $rt�d ��YdSt$r0}t�dt|��d}~wwxYw) z-Main entry point for the Crawl4AI MCP server.z#Initializing Crawl4AI MCP Server...zServer configuration:z - Name: z% - Transport: stdio (MCP compatible)z - Logging: stderr (stdio-safe)zP - Tools: server_status, get_page_structure, crawl_with_schema, take_screenshotz'Starting server with stdio transport...r")r/z!Server shutdown requested by userzServer startup failed: N) r9r5�mcpr�run�KeyboardInterruptr8r4r7)r:s r�mainr��s#��9�:�:�:��+�,�,�,��+��+�+�,�,�,��;�<�<�<��6�7�7�7��f�g�g�g��=�>�>�>� ��'��"�"�"�"�"��9�9�9��7�8�8�8�8�8�8��7�s�1�v�v�7�7�8�8�8� ��s�CC"�"$E� E�+D=�=E�__main__)r=N)N) �__doc__�loggingr�typingrr�pathlibrr*rr�pydanticr�typing_extensionsr r+r �basicConfig�INFOr� getLogger�__name__r9rr��toolr7r<rbr�r�r��rr�<module>r�sB��&�� $�$�$�$�$�$�$�$��'�'�'�'�'�'�$�$�$�$�$�$�� ,�A��:� � �� 8� $� $��,�,�,� �g� �� I� ��@ �W�@ ��c�3�h��@ �@ �@ ��@ �D��W]��X%�X%� �3��*M�N�N�N�N� O�X%��c�5�5�-t�R�S�S�S�S� T�X%� �X%� � X%�X%�X%��X%�t��@�@� �3��*a�b�b�b�b� c�@� ��e�e�9]�'^�'^�'^�"^�_�@� �@� � @�@�@��@�D��t�t� �3��*]�^�^�^�^� _�t� �t� �t�t�t��t�n��*�z��D�F�F�F�F�F��r

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Nexus-Digital-Automations/crawl4ai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server