Skip to main content
Glama

MCP RAG Server

document_processor.cpython-313.pyc8.66 kB
� ��g� ��h�SSKrSSKrSSKrSSKrSSKJrJrJrJ r SSK J r SSK J r "SS5r g)�N)�List�Dict�Any�Optional)�SentenceTransformer)�tqdmc��\rSrSrSS\4SjjrS\S\R4SjrS\S\ \ \\ 44Sjr S \S \S\ \4S jr SS\S \S\ \4S jjrSrg)�DocumentProcessor� � model_namec�$�[U5Ulg)zj Initialize the document processor. Args: model_name: Name of the sentence-transformers model to use N)r�model)�selfr s �;/Users/kz/vcs/sui-ai/mcp_server/utils/document_processor.py�__init__�DocumentProcessor.__init__ s��)��4�� ��text�returnc�6�URRUSS9$)zw Get embedding for a text string. Args: text: The text to embed Returns: Embedding vector as numpy array F)�show_progress_bar)r�encode)rrs r� get_embedding�DocumentProcessor.get_embeddings���z�z� � ��� �?�?r� directoryc ��/n[R"[RRUS5SS9nUR [R"[RRUS5SS95 UR [R"[RRUS5SS95 U(d[ SU35 U$[ S[ U5S35 [U5GH)n[US S S 9nUR5R5nS S S 5 W(d[ S U35 MN[RRU5SR5nUS:XaURXd5nOURU5n[U5Hsup�U R5(dM[RR!U5SU 3UU U U(aUSS OSUR#U 5S.n UR%U 5 Mu GM, [ S[ U5S35 U$!,(df  GN =f![&a&n [ SUS[)U 535 S n A GM�S n A ff=f)z� Process all documents in a directory. Args: directory: Directory containing text documents Returns: List of document dictionaries with embeddings and metadata z**/*.txtT)� recursivez**/*.mdz **/*.movezNo text files found in z Processing z documents...�rzutf-8)�encodingNzSkipping empty file: �z.move�_�txt)�id�path� chunk_index�content� file_type� embeddingzError processing z: z Processed z document chunks)�glob�osr$�join�extend�print�lenr�open�read�strip�splitext�lower�_process_move_file� _chunk_text� enumerate�basenamer�append� Exception�str) rr� documents�files� file_path�fr&�ext�chunks�i�chunk�doc�es r�process_documents�#DocumentProcessor.process_documentss��� �� � �"�'�'�,�,�y�*�=��N�� � � �T�Y�Y�r�w�w�|�|�I�y�A�T�R�S� � � �T�Y�Y�r�w�w�|�|�I�{�C�t�T�U�� �+�I�;�7� 8�� � � �C��J�<�}�5�6��e��I�% A��)�S�7�;�q��f�f�h�n�n�.�G�<���1�)��=�>���g�g�&�&�y�1�!�4�:�:�<���'�>�!�4�4�W�H�F�"�-�-�g�6�F�!*�&� 1�H�A� �;�;�=�=� �"$���!1�!1�)�!<� =�Q�q�c�B� )�'(�#(�03�S���W��%)�%7�%7��%>� �C��$�$�S�)�!2�+%�P � �3�y�>�*�*:�;�<���M<�;��D� A��)�)��B�s�1�v�h�?�@�@�� A�s7�9 I�H<�#I�CI�< I �I� I>�I9�9I>r&r=c ��/nSU;aURS5SOUnSUS3n[R"SU5nU(aURS5OSnSn[R"X�5n U (a_U RS 5R S S 5n U R 5(a)US US U R 53n UR U 5 UR USU35 [R"SU5n U (a+US US3n U H nU SUS3- n M UR U 5 [R"SU5nUHQnURS5nURS5nUS USUSUR 53nUR U5 MS [R"SU5nUHbnURS 5nURS5nURS5nUS USUSUR 53nUR U5 Md U(d/URU5nUHnUR UU35 M U$)z� Process a Move language file, extracting modules, structs, and functions. Args: content: The file content file_path: Path to the file Returns: List of content chunks with semantic meaning zdocs/move_files/�����zFile: � zmodule\s+([a-zA-Z0-9_:]+)\s*{r �unknown_modulez'(\/\/.*?|\s*\/\*[\s\S]*?\*\/\s*)*moduler�module�zModule: z Header Comments: z Move Module: zuse\s+([^;]+);z Dependencies: zuse z; z3struct\s+([a-zA-Z0-9_]+)(?:\s*<[^>]+>)?\s*{([^}]+)}�z Struct: zd(public\s+)?(inline\s+)?(fun\s+([a-zA-Z0-9_]+)(?:\s*<[^>]+>)?\s*\([^)]*\)(?:\s*:[^{]+)?\s*{([^}]+)})��z Function: ) �split�re�search�group�replacer1r8�findall�finditerr5)rr&r=r@� repo_info� file_context� module_match� module_name�header_pattern� header_match�header� header_chunk�use_statements� use_chunk�stmt�struct_matches� struct_match� struct_name� struct_body� struct_chunk�function_matches� func_match� func_full� func_name� func_body� func_chunk� text_chunksrBs rr4�$DocumentProcessor._process_move_fileasv����@R�U^�?^�I�O�O�$6�7��;�dm� �� �{�$�/� ��y�y�!A�7�K� �/;�l�(�(��+�AQ� �D���y�y��9� � �!�'�'��*�2�2�8�R�@�F��|�|�~�~�".��x� �}�DZ�[a�[g�[g�[i�Zj�k� �� � �l�+� � � ���m�K�=�A�B����$5�w�?�� �'�.��� �=P�Q�I�&���t�D�6��-�-� �'� �M�M�)� $����%[�]d�e��*�L�&�,�,�Q�/�K�&�,�,�Q�/�K�*�^�8�K�=� �[�M�Y]�^i�^o�^o�^q�]r�s�L� �M�M�,� '� +��;�;�(O�QX�Y��*�J�"�(�(��+�I�"�(�(��+�I�"�(�(��+�I�(�>��+��n�Y�K�W[�\e�\k�\k�\m�[n�o�J� �M�M�*� %� +���*�*�7�3�K�$��� � ���u�g�6�7�%�� r�max_chunk_sizec�<�[R"SU5n/nSnUH�n[U5U:�a�U(aURU5 Sn[R"SU5nSnUHJn [U5[U 5-U::aX�(aSOSU -- nM0U(aURU5 U nML U(aUnM�M�[U5[U5-U::aXU(aSOSU-- nM�URU5 UnM� U(aURU5 U$)z� Split text into chunks based on paragraphs and size. Args: text: Text to split max_chunk_size: Maximum size of each chunk Returns: List of text chunks z\n\s*\nrLz (?<=[.!?])\s+� rI)rQrPr.r8) rrro� paragraphsr@� current_chunk�para� sentences� temp_chunk�sentences rr5�DocumentProcessor._chunk_text�s���X�X�j�$�/� ���� ��D��4�y�>�)� ��M�M�-�0�$&�M��H�H�%5�t�<� �� � )�H��:���X��6�.�H�"�j�s�b�H�&L�L� �%�"�M�M�*�5�%-� � !*��$.�M���}�%��D� �1�^�C�!� �f�2��%M�M�M��M�M�-�0�$(�M�7�: � �M�M�-� (�� r)rN)zall-MiniLM-L6-v2)i)�__name__� __module__� __qualname__�__firstlineno__r:r�np�ndarrayrrrrrEr4�intr5�__static_attributes__�rrr r s���5�3�5� @�#� @�"�*�*� @�@�3�@�4��S�#�X��3G�@�DB�#�B�#�B�$�s�)�B�H1��1�S�1�4��9�1�1rr )r*rQr)�numpyr}�typingrrrr�sentence_transformersrrr r�rr�<module>r�s)�� � � ��,�,�5��M�Mr

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ProbonoBonobo/sui-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server