Skip to main content
Glama

MCP RAG Server

github_extractor.cpython-313.pyc17.4 kB
� ���g_A��J�SSKrSSKrSSKrSSKrSSKrSSKJrJrJrJ r J r SSK J r SSK JrJr SSKJr SSKJr SSKJr \R*"\R,SS 9 \R."\5r\"5 "S S 5rSS \S \S\ \S\S\S\ \\\44 Sjjrg)�N)�List�Dict�Any�Optional�Tuple)� BeautifulSoup)�Github�RateLimitExceededException)� ContentFile)� load_dotenv)�tqdmz4%(asctime)s - %(name)s - %(levelname)s - %(message)s)�level�formatc�<�\rSrSrSS\\S\4SjjrSS\S\S\S \S \\ \\ 44 S jjr SS\S \\ \\ 44S jjr SS\S \S\S \\ \\ 44Sjjr S\\ \\ 44SjrSS\S\S \S \\ \\ 44SjjrS\\ \\ 4S \\4SjrSrSrg)�GitHubExtractor�N�token� output_dirc�Z�U=(d [R"S5UlUR(d[R S5 UR(a[ UR5O [ 5UlX l[R"URSS9 SUl g)z� Initialize GitHub extractor. Args: token: GitHub personal access token (from env var GITHUB_TOKEN if not provided) output_dir: Directory to save extracted files � GITHUB_TOKENz9No GitHub token provided. Rate limits will be restricted.T��exist_okzhttps://github.com/searchN) �os�getenvr�logger�warningr �githubr�makedirs�base_url)�selfrrs �9/Users/kz/vcs/sui-ai/mcp_server/utils/github_extractor.py�__init__�GitHubExtractor.__init__sk���7�b�i�i��7�� ��z�z� �N�N�V� W�-1�J�J�f�T�Z�Z�(�F�H�� �%�� � � �D�O�O�d�3�4�� ��query�language� extension� max_results�returnc �6�UnU(aUSU3- nU(aUSU3- n[RSU35 US-nURRUSS5n[RSURS35 /nS n [ [ S [URU555Hwn Xzn U Rn [RS U 35 URX�S 9n U H nX�:�a OURU5 U S - n M" [R"S 5 My [RS[#U5S35 U$![a/ [RS5 [R"S5 M�[a0n[RSU S[!U535 SnAGM SnAff=f![a [R%S5 /s$[a-n[R%S[!U535 /sSnA$SnAff=f)a Search GitHub for code using the GitHub API. Args: query: Search query string language: Filter by programming language extension: Filter by file extension max_results: Maximum number of results to return Returns: List of code search results with metadata z language:z extension:z!Searching GitHub API with query: z in:readme+in:description�stars�desc�Found z repositoriesrzScanning repository: )r'��z-Rate limit exceeded. Sleeping for 60 seconds.�<zError processing repository �: Nz code resultszRGitHub API rate limit exceeded. Try again later or use a token with higher limits.zError searching GitHub API: )r�infor�search_repositories� totalCountr �range�min� full_name�_find_move_files_in_repo�append�time�sleepr r� Exception�str�len�error)r r%r&r'r(� search_query� full_query�search_results�results�count�repo� repo_data� repo_name� move_files� file_data�es r!�search_code_with_api�$GitHubExtractor.search_code_with_api,s���9 � �L���*�X�J� 7�7� ���+�i�[� 9�9� � �K�K�;�L�>�J� K�&�(C�C�J�!�[�[�<�<�Z��RX�Y�N� �K�K�&��!:�!:� ;�=�I� J��G��E��U�1�c�.�*C�*C�[�&Q�R�S��� .� 4�I� )� 3� 3�I��K�K�"7� �{� C�D�"&�!>�!>�y�!>�!^�J�%/� � �/�!����y�1��� �� &0��J�J�q�M�'T�: �K�K�&��W���m�<� =��N��2���N�N�#R�S��J�J�r�N�� ���N�N�%A�$��r�#�a�&��#R�S�����*� � �L�L�m� n��I�� � �L�L�7��A��x�@� A��I�� �s[�B-F?�0A3E �#'F?� 5F<�F?� F<� $F7�0F?�7F<�<F?�?!H�" H�+"H� H�Hc ��/nURS5nU(GaURS5nURS:Xa+URURUR55 O�UR R SU35(a�URRS5nUS:XaSUR5;aM�UR URURURUS.nURU5 [RS US UR35 U(aGMU$![a8n[R!S URS [#U535 S nANIS nAff=f![a9n[R!SURS [#U535 S nAU$S nAff=f)z� Find all Move files in a repository. Args: repo: GitHub repository object extension: File extension to search for Returns: List of file data dictionaries �r�dir�.�utf-8�move�use sui)�name�pathrE�url�contentr-z file: zError getting content for r1NzError traversing repository )� get_contents�pop�type�extendrUrT�endswith�decoded_content�decode�lowerr7�html_urlr9r�debugr<rr=) r rEr'rC�contents� file_contentrW�resultrJs r!r8�(GitHubExtractor._find_move_files_in_repous�����# V��(�(��,�H��'�|�|�A�� ��$�$��-��O�O�D�$5�$5�l�6G�6G�$H�I�$�(�(�1�1�A�i�[�/�B�B�k�&2�&B�&B�&I�&I�'�&R�G� )�F�2�y�� � ��7W� (�)5�(9�(9�(4�(9�(9�(,���'3�'<�'<�+2� &�F�$�N�N�6�2�"�L�L�6�)��G�L�DU�DU�CV�)W�X�5�(�D��� )�k�"�N�N�-G� �H]�H]�G^�^`�ad�ef�ag�`h�+i�j�j��k��� V� �N�N�9�$�.�.�9I��C�PQ�F�8�T� U� U���� V�sO�BE;� 5D6�E;�A&D6�*E;�6 E8�.E3�.E;�3E8�8E;�; F>�.F9�9F>� path_pattern� max_pagesc ���/nUSS.nU(aUS==SU3- ss'[RSUS35 SnXc::GaXeS'SS S S .n[R"URXWS 9nUR S :wa$[R SUR 35 GO![URS5n U RS5n U (d[RSU35 O�U H�n U RS5n U (dMU RR5n U RS5nU(dMRURR5nSU SU3nSU SU3nUR[RRU5UU UUSS.5 M� US- n[&R("S5 Xc::aGM[RS[+U5S35 UR-U5 U$![ a-n[R#S[%U535 SnAGM?SnAff=f![ a/n[R SUS[%U535 SnAM�SnAff=f) aZ Search GitHub for code by scraping the GitHub search results page. This is a fallback when API doesn't work or limits are exceeded. Args: query: Search query string path_pattern: Path pattern to filter results (e.g., "*.move") max_pages: Maximum number of pages to scrape Returns: List of code search results with metadata �code)�qrZrjz path:z#Scraping GitHub search with query: r.�pzsMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36z/text/html,application/xhtml+xml,application/xmlzen-US,en;q=0.9)z User-Agent�AcceptzAccept-Language)�params�headers��z0Failed to retrieve search results. Status code: z html.parserzdiv.code-list-itemzNo more results found on page za.Link--secondaryza.Link--primaryzhttps://github.com/z /blob/master/z"https://raw.githubusercontent.com/z/master/N)rTrUrErV�raw_urlrWzError parsing code block: r/z"Error scraping GitHub search page r1r-z code results through scraping)rr2�requests�getr� status_coder?r�text�select� select_one�stripr9rrU�basenamer<rr=r:r;r>�#_fetch_contents_for_scraped_results)r r%rfrgrCrm�pagern�response�soup� code_blocks�block� repo_elementrG� file_element� file_path�file_urlrprJs r!�search_code_with_scraping�)GitHubExtractor.search_code_with_scraping�st������ �� � �3�K�V�L�>�2� 2�K�� � �9�&��+��G�H�����D �"�s� �#X�O�'7��� $�<�<�� � �f�V���'�'�3�.��L�L�#S�T\�Th�Th�Si�!j�k��%�X�]�]�M�B��#�k�k�*>�?� �"��K�K�"@��� G�H��)�E�N�',�'7�'7�8K�'L� �+�$�$0�$5�$5�$;�$;�$=� �(-�'7�'7�8I�'J� �+�$�$0�$5�$5�$;�$;�$=� �%8�� �=�QZ�P[�#\��%G�y�k�QY�Zc�Yd�"e�����$&�G�G�$4�$4�Y�$?�$-�$-�#+�'.�'+� (��))�B�� ��� � �1� �C��N � � �f�S��\�N�*H�I�J� �0�0��9����%%�N����)C�C��F�8�'L�M�M��N��� �� � �A�$��r�#�a�&��R�S��� �si�AH0� AH0�'H0�-G6�H0�2G6�9H0�;AG6�H0�6 H-�!H(�!H0�(H-�-H0�0 I)�:$I$�$I)rCc ��[RS[U5S35 [U5H�nSU;d US(dM[R "US5nUR S:XaURUS'O([RSUSSUR 35 [R"S5 M� g ![a?n[RS UR SS 5S [U535 S nAM�S nAff=f) zu Fetch file contents for results obtained through scraping. Args: results: List of search results from scraping zFetching content for z files...rprorWzFailed to fetch content for z . Status: g�?zError fetching content for z unknown URLr1N) rr2r>r rqrrrsrtrr:r;r<r=)r rCrdr{rJs r!ry�3GitHubExtractor._fetch_contents_for_scraped_resultss��� � � �+�C��L�>��C�D��7�m�F� o��F�*�&��2C��#�<�<��y�(9�:���'�'�3�.�(0� � �F�9�%��N�N�%A�&��BS�AT�T^�_g�_s�_s�^t�#u�v�� � �3��$��� o����!<�V�Z�Z� �S`�=a�<b�bd�eh�ij�ek�dl�m�n�n�� o�s�C�A7C� D � 4D�D � use_scrapingc���/nUR(a-U(d&[RS5 URUSUS9nU(d-U(a&[RS5 UR USSS9nUVs/sHoUR S5(dMUPM nn[ U5[ U5:a.[RS [ U5[ U5- S 35 U$s snf) z� Extract .move files from GitHub based on search query. Args: query: Search query use_scraping: Whether to use web scraping as fallback max_results: Maximum number of results to extract Returns: List of extracted file data zAttempting to use GitHub API...rR)r'r(zUsing web scraping method...z*.move�)rfrgrWz Filtered out z results without content)rrr2rKr�rrr>r)r r%r�r(rC�r� valid_resultss r!�extract_move_files�"GitHubExtractor.extract_move_files1s����� �:�:�l� �K�K�9� :��/�/���U`�/�a�G��<� �K�K�6� 7��4�4�U��]^�4�_�G�%,�@�G�q�u�u�Y�/?��G� �@� �}� ��G� � ,� �N�N�]�3�w�<�#�m�:L�+L�*M�Me�f� g���� As �9C&�C&�filesc ���/n[RS[U5SUR35 [ U5H�nUSR SS5n[ RRURU5n[ R"USS9 USn[ RRXV5n[US S S 9nURUS 5 S S S 5 URU5 M� [RS[U5S35 U$!,(df  NG=f![a@n [RSURSS5S[!U 535 S n A GM3S n A ff=f)z� Download .move files to the output directory. Args: files: List of file data from extract_move_files Returns: List of paths to downloaded files z Downloading z .move files to rE�/�_TrrT�wrQ)�encodingrWNzError saving file �unknownr1zSuccessfully downloaded z files)rr2r>rr �replacerrU�joinr�open�writer9r<r?rrr=) r r��downloaded_pathsrI�repo_dir�file_dir� file_namer��frJs r!�download_move_files�#GitHubExtractor.download_move_filesRsE����� � �l�3�u�:�,�.>�t���>O�P�Q��e��I� `�$�V�,�4�4�S�#�>���7�7�<�<�����B��� � �H�t�4�&�f�-� ��G�G�L�L��=� ��)�S�7�;�q��G�G�I�i�0�1�<�!�'�'� �2�%�( � � �.�s�3C�/D�.E�V�L�M���<�;�� � `�� � �1�)�-�-�� �2R�1S�SU�VY�Z[�V\�U]�^�_�_�� `�s1�BD*�D�D*� D' �#D*�* E4�44E/�/E4c��UR(d[R"S5 gURR 5nUR R nUS:a{UR RR5n[R"5n[SX4- 5n[RSUSUSS35 [R"U5 gg) z1Check remaining rate limit and pause if necessaryr/N� r.zRate limit low (z remaining). Sleeping for z.1fz seconds) rr:r;r�get_rate_limit�search� remaining�reset� timestamp�maxrr)r � rate_limitr�� reset_time� current_time� sleep_times r!�_check_rate_limit�!GitHubExtractor._check_rate_limitws����z�z� �J�J�q�M� ��[�[�/�/�1� ��%�%�/�/� � �r�>�#�*�*�0�0�:�:�<�J��9�9�;�L��Q� � 9�:�J� �N�N�-�i�[�8R�S]�^a�Rb�bj�k� l� �J�J�z� "� r$)rrrr)N�docs/move_files)NN�d)rR)Nr�)rSTr�)�__name__� __module__� __qualname__�__firstlineno__rr=r"�intrrrrKr8r�ry�boolr�r�r��__static_attributes__�r$r!rrsK��4�h�s�m�4��4�.@D�DG�G�#�G��G�%(�G�>A�G�LP�QU�VY�[^�V^�Q_�L`�G�R3��3��d�SV�X[�S[�n�I]�3�jIM�01�j�s�j�#�j�*-�j�6:�4��S��>�6J�j�Xo�4��S�#�X��;O�o�6OS�+.����t��%(��37��S�#�X��3G��B# ��d�3��8�n�)=�# �$�s�)�# �J#r$rr%r� github_tokenr�r(r)c�t�[X!S9nURXU5nURU5n[U5U4$)aR Extract Move files from GitHub and return paths for indexing. Args: query: GitHub search query output_dir: Directory to save files github_token: GitHub API token use_scraping: Whether to use web scraping fallback max_results: Maximum files to extract Returns: Tuple of (number of files, list of file paths) )rr)rr�r�r>)r%rr�r�r(� extractorr�� file_pathss r!�extract_and_index_move_filesr��sC��& �l�J�I� � (� (��k� J�E��.�.�u�5�J� �z�?�J� &�&r$)rSr�NTr�)r�rer:�loggingrq�typingrrrrr�bs4rrr r �github.ContentFiler �dotenvr r � basicConfig�INFO� getLoggerr�rrr=r�r�r�r�r$r!�<module>r�s��� � � ���3�3��5�*������'�,�,�/e�f� � � �8� $�� � �s#�s#�l /8�1B�=A�48�25� '��'�+.�'�-5�c�]�'�.2�'�-0� '�;@��T�#�Y��:O� 'r$

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ProbonoBonobo/sui-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server