MCP RAG Server

Overview Schema Related Servers Score Discussions

sui-mcp-server
utils
__pycache__

document_processor.cpython-313.pyc•8.46 KiB

� ��g� ��h�SSKrSSKrSSKrSSKrSSKJrJrJrJ r SSK Jr SSKJr "SS5r g)�N)�List�Dict�Any�Optional)�SentenceTransformer)�tqdmc��\rSrSrSS\4SjjrS\S\R4SjrS\S\ \ \\44SjrS \S \S\ \4Sjr SS\S\S\ \4S jjrSrg)�DocumentProcessor� � model_namec�$�[U5Ulg)zj Initialize the document processor. Args: model_name: Name of the sentence-transformers model to use N)r�model)�selfrs �;/Users/kz/vcs/sui-ai/mcp_server/utils/document_processor.py�__init__�DocumentProcessor.__init__ s��)��4�� text�returnc�6�URRUSS9$)zw Get embedding for a text string. Args: text: The text to embed Returns: Embedding vector as numpy array F)�show_progress_bar)r�encode)rrs r� get_embedding�DocumentProcessor.get_embeddings��z�z� � �� ?�?r� directoryc ��/n[R"[RRUS5SS9nUR [R"[RRUS5SS95 UR [R"[RRUS5SS95 U(d[SU35 U$[S[ U5S35 [U5GH)n[US S S9nUR5R5nSSS5 W(d[S U35 MN[RRU5SR5nUS:XaURXd5nOURU5n[U5Hsup�U R5(dM[RR!U5SU 3UU U U(aUSSOSUR#U 5S.nUR%U5 Mu GM, [S[ U5S35 U$!,(df GN =f![&a&n[SUS[)U535 SnAGM�SnAff=f)z� Process all documents in a directory. Args: directory: Directory containing text documents Returns: List of document dictionaries with embeddings and metadata z**/*.txtT)� recursivez**/*.mdz **/*.movezNo text files found in zProcessing z documents...�rzutf-8)�encodingNzSkipping empty file: �z.move�_�txt)�id�path�chunk_index�content� file_type� embeddingzError processing z: z Processed z document chunks)�glob�osr$�join�extend�print�lenr�open�read�strip�splitext�lower�_process_move_file�_chunk_text� enumerate�basenamer�append� Exception�str) rr� documents�files� file_path�fr&�ext�chunks�i�chunk�doc�es r�process_documents�#DocumentProcessor.process_documentss�� "�'�'�,�,�y�*�=��N�� T�Y�Y�r�w�w�|�|�I�y�A�T�R�S� ��T�Y�Y�r�w�w�|�|�I�{�C�t�T�U��+�I�;�7�8�� C��J�<�}�5�6��e��I�% A��)�S�7�;�q��f�f�h�n�n�.�G�<��1�)��=�>��g�g�&�&�y�1�!�4�:�:�<��'�>�!�4�4�W�H�F�"�-�-�g�6�F�!*�&� 1�H�A� �;�;�=�=� �"$��!1�!1�)�!<� =�Q�q�c�B� )�'(�#(�03�S��W��%)�%7�%7��%>� �C��$�$�S�)�!2�+%�P � �3�y�>�*�*:�;�<��M<�;��D� A��)�)��B�s�1�v�h�?�@�@�� A�s7�9I�H<�#I�CI�< I �I� I>�I9�9I>r&r=c ��/nSU;aURS5SOUnSUS3n[R"SU5nU(aURS5OSnSn[R"X�5n U (a_U RS 5R S S5n U R5(a)USUS U R53nUR U5 UR USU35 [R"SU5nU(a+USUS3n UHnU SUS3- n M UR U 5 [R"SU5nUHQnURS5nURS5nUSUSUSUR53nUR U5 MS [R"SU5nUHbnURS 5nURS5nURS5nUSUSUSUR53nUR U5 Md U(d/URU5nUHnUR UU35 M U$)z� Process a Move language file, extracting modules, structs, and functions. Args: content: The file content file_path: Path to the file Returns: List of content chunks with semantic meaning zdocs/move_files/��zFile: � zmodule\s+([a-zA-Z0-9_:]+)\s*{r �unknown_modulez'(\/\/.*?|\s*\/\*[\s\S]*?\*\/\s*)*moduler�module�zModule: z Header Comments: z Move Module: zuse\s+([^;]+);z Dependencies: zuse z; z3struct\s+([a-zA-Z0-9_]+)(?:\s*<[^>]+>)?\s*{([^}]+)}�z Struct: zd(public\s+)?(inline\s+)?(fun\s+([a-zA-Z0-9_]+)(?:\s*<[^>]+>)?\s*$[^)]*$(?:\s*:[^{]+)?\s*{([^}]+)})��z Function: ) �split�re�search�group�replacer1r8�findall�finditerr5)rr&r=r@� repo_info�file_context�module_match�module_name�header_pattern�header_match�header�header_chunk�use_statements� use_chunk�stmt�struct_matches�struct_match�struct_name�struct_body�struct_chunk�function_matches� func_match� func_full� func_name� func_body� func_chunk�text_chunksrBs rr4�$DocumentProcessor._process_move_fileasv��@R�U^�?^�I�O�O�$6�7��;�dm� �� {�$�/��y�y�!A�7�K��/;�l�(�(��+�AQ��D��y�y��9��!�'�'��*�2�2�8�R�@�F��|�|�~�~�".��x��}�DZ�[a�[g�[g�[i�Zj�k�� l�+� � � ��m�K�=�A�B��$5�w�?��'�.�� =P�Q�I�&��t�D�6��-�-� �'��M�M�)�$��%[�]d�e��*�L�&�,�,�Q�/�K�&�,�,�Q�/�K�*�^�8�K�=��[�M�Y]�^i�^o�^o�^q�]r�s�L��M�M�,�'�+��;�;�(O�QX�Y��*�J�"�(�(��+�I�"�(�(��+�I�"�(�(��+�I�(�>��+��n�Y�K�W[�\e�\k�\k�\m�[n�o�J��M�M�*�%� +��*�*�7�3�K�$�� u�g�6�7�%�� r�max_chunk_sizec�<�[R"SU5n/nSnUH�n[U5U:�a�U(aURU5 Sn[R"SU5nSnUHJn [U5[U 5-U::aX�(aSOSU -- nM0U(aURU5 U nML U(aUnM�M�[U5[U5-U::aXU(aSOSU-- nM�URU5 UnM� U(aURU5 U$)z� Split text into chunks based on paragraphs and size. Args: text: Text to split max_chunk_size: Maximum size of each chunk Returns: List of text chunks z\n\s*\nrLz (?<=[.!?])\s+� rI)rQrPr.r8) rrro� paragraphsr@� current_chunk�para� sentences� temp_chunk�sentences rr5�DocumentProcessor._chunk_text�s��X�X�j�$�/� �� D��4�y�>�)� ��M�M�-�0�$&�M��H�H�%5�t�<� �� )�H��:��X��6�.�H�"�j�s�b�H�&L�L� �%�"�M�M�*�5�%-� � !*��$.�M��}�%��D� �1�^�C�!� �f�2��%M�M�M��M�M�-�0�$(�M�7�:��M�M�-�(�� r)rN)zall-MiniLM-L6-v2)i)�__name__� __module__�__qualname__�__firstlineno__r:r�np�ndarrayrrrrrEr4�intr5�__static_attributes__�rrr r s��5�3�5� @�#� @�"�*�*� @�@�3�@�4��S�#�X��3G�@�DB�#�B�#�B�$�s�)�B�H1��1�S�1�4��9�1�1rr )r*rQr)�numpyr}�typingrrrr�sentence_transformersrrr r�rr�<module>r�s)�� ,�,�5��M�Mr

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ProbonoBonobo/sui-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

document_processor.cpython-313.pyc•8.46 KiB