�
)��h�* � � � d Z ddlZddlZddlZddlmZmZ ddlmZmZm Z m
Z
ddlmZ ddl
mZ ddlmZmZ e� G d� d � Zd
� Zedk( r e� yy)z�
Automated Dataset Processor - Watches Google Drive folder for new files
and automatically processes them without manual intervention.
� N)�datetime� timedelta)�Set�Dict�List�Any)�load_dotenv)�!process_dataset_with_organization)�get_drive_service�list_files_in_folderc � � e Zd Z ddededefd�Zdeeeeef f fd�Zd� Z d ede
fd
�Zdeeeef fd�Z
deeef de
fd
�Zd� Zdefd�Zd� Zd� Zd� Zy)�AutoDatasetProcessorN�server_folder_id�check_interval�processed_files_logc �� � |xs t j d� | _ || _ || _ d| _ | j
� | _ | j st d� �y)a
Initialize the auto processor.
Args:
server_folder_id: Google Drive folder ID to monitor
check_interval: How often to check for new files (seconds)
processed_files_log: File to track already processed files
�MCP_SERVER_FOLDER_IDNz7MCP_SERVER_FOLDER_ID not found in environment variables) �os�getenvr r r �
drive_service�_load_processed_files�processed_files�
ValueError)�selfr r r s �5C:\Users\Lokesh kumar\Documents\MCP\auto_processor.py�__init__zAutoDatasetProcessor.__init__ s^ � � !1� U�B�I�I�>T�4U���,���#6�� �!���#�9�9�;����$�$��V�W�W� %� �returnc �( � t j j | j � r6 t | j d� 5 }t j |� cddd� S i S # 1 sw Y i S xY w# t $ r}t d|� �� Y d}~i S d}~ww xY w)z)Load the list of already processed files.�rNz-Warning: Could not load processed files log: ) r �path�existsr �open�json�load� Exception�print�r �f�es r r z*AutoDatasetProcessor._load_processed_files( s� � �
�7�7�>�>�$�2�2�3�
K��$�2�2�C�8� (�A��9�9�Q�<�(� (� � � (� � �� �
K��E�a�S�I�J�J�� ��
K�s5 �A/ �A"� A/ �"A,�'A/ �,A/ �/ B�8B�Bc �� � t | j d� 5 }t j | j |d�� ddd� y# 1 sw Y yxY w# t
$ r}t
d|� �� Y d}~yd}~ww xY w)z!Save the list of processed files.�w� )�indentNz-Warning: Could not save processed files log: )r# r r$ �dumpr r&