helper.cpython-311.pyc•2.78 kB
�
^�h � �2 � d dl Z d dlZd dlZdedefd�Zdedefd�Zdedefd�Zdd
�Zedk rh e dd
d�� � 5 Z
e
� � � Zddd� � n# 1 swxY w Y ee� � Z
ede
� d�� � edde
z
� d�� � dS dS )� N�ticker�returnc �� � t j � t j � t � � � � }t j � |dd| � � }t
j |� d�� � }|S )Nz..�dataz/*.txt)�os�path�dirname�abspath�__file__�join�glob)r �
script_dir�data_dir�filess �#/Users/sharhad/mcp/ingest/helper.py�get_sec_filings_filesr sY � ����������!:�!:�;�;�J��w�|�|�J��f�f�=�=�H��I��)�)�)�*�*�E��L� � file_pathc �~ � t | d� � 5 }|� � � cd d d � � S # 1 swxY w Y d S )N�r)�open�read)r �fs r �get_sec_filings_file_contentr s~ � �
�i�� � � ���v�v�x�x�� � � � � � � � � � � ���� � � � � � s �2�6�6�summaryc � � t |d� � 5 }|� | � � d d d � � d S # 1 swxY w Y d S )N�w)r �write)r r r s r �
write_summaryr s� � �
�i�� � � �� �������� � � � � � � � � � � ���� � � � � � s �4�8�8�text-embedding-3-smallc �v � t j |� � }|� | � � }t |� � }|S )N)�tiktoken�encoding_for_model�encode�len)�text�model�encoding�tokens�
num_tokenss r �count_document_tokensr+ s4 � ��*�5�1�1�H�
�_�_�T�
"�
"�F��V���J��r �__main__z$../data/AAPL/AAPL_10K_2024-11-01.txtr zutf-8)r( z
Document has z tokenszMax chunk size available: i� z tokens remaining)r )r r
r"