Skip to main content
Glama
FiloHany

Video RAG MCP Server

by FiloHany

ingest_data_tool

Loads video data from a specified directory into the RAG index for natural language search and interaction with video content.

Instructions

Loads data from a directory into the Ragie index. Wait until the data is fully ingested before continuing.

Args:
    directory (str): The directory to load data from.

Returns:
    str: A message indicating that the data was loaded successfully.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
directoryYes

Implementation Reference

  • server.py:6-22 (handler)
    The handler function for 'ingest_data_tool', decorated with @mcp.tool() for registration. It clears the existing index, ingests data from the specified directory using helper functions, and returns a success or error message. The function signature and docstring define the input schema.
    @mcp.tool()
    def ingest_data_tool(directory: str) -> None:
        """
        Loads data from a directory into the Ragie index. Wait until the data is fully ingested before continuing.
    
        Args:
            directory (str): The directory to load data from.
    
        Returns:
            str: A message indicating that the data was loaded successfully.
        """
        try:
            clear_index()
            ingest_data(directory)
            return "Data loaded successfully"   
        except Exception as e:
            return f"Failed to load data: {str(e)}"
  • main.py:49-84 (helper)
    Helper function called by the tool handler to ingest files from the directory into the Ragie index. Processes each file, uploads as document with video/audio mode, and waits for readiness.
    def ingest_data(directory):
        # Get list of files in directory
        directory_path = Path(directory)
        files = os.listdir(directory_path)
        
        for file in files:
            try:
                file_path = directory_path / file
                # Read file content
                with open(file_path, mode='rb') as f:
                    file_content = f.read()   
                # Create document in Ragie
                response = ragie.documents.create(request={
                    "file": {
                        "file_name": file,
                        "content": file_content,
                    },
                    "mode": {
                        "video": "audio_video",
                        "audio": True
                    }
                })
                # Wait for document to be ready
                while True:
                    res = ragie.documents.get(document_id=response.id)
                    if res.status == "ready":
                        break
            
                    time.sleep(2)
    
                logger.info(f"Successfully uploaded {file}")
                
            except Exception as e:
                logger.error(f"Failed to process file {file}: {str(e)}")
                continue
  • main.py:21-47 (helper)
    Helper function called by the tool handler to clear all existing documents from the Ragie index before ingesting new data.
    def clear_index():
        while True:
            try:
                # List all documents
                response = ragie.documents.list()
                documents = response.result.documents
    
                # Process each document
                for document in documents:
                    try:
                        ragie.documents.delete(
                            document_id=document.id
                        )
                        logger.info(f"Deleted document {document.id}")
                    except Exception as e:
                        logger.error(f"Failed to delete document {document.id}: {str(e)}")
                        raise
    
                # Check if there are more documents
                if not response.result.pagination.next_cursor:
                    logger.warning("No more documents\n")
                    break
    
            except Exception as e:
                logger.error(f"Failed to retrieve or process documents: {str(e)}")
                raise
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It reveals that the operation is blocking ('Wait until the data is fully ingested') and indicates success/failure through a return message. However, it doesn't disclose critical behavioral traits such as what types of data are supported, whether the operation is idempotent, what happens to existing data in the index, error handling, or performance characteristics like rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, with the core purpose stated first followed by parameter and return details. The three-sentence structure is efficient, though the 'Args' and 'Returns' sections could be integrated more seamlessly into the narrative flow. There's no wasted text, but minor improvements in cohesion are possible.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of a data ingestion tool with no annotations, no output schema, and low schema description coverage (0%), the description is incomplete. It covers the basic operation and parameter but lacks details on data formats, indexing behavior, error scenarios, and what 'fully ingested' entails. For a tool that modifies an index, more comprehensive guidance is needed to ensure safe and effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds minimal semantic context beyond the input schema. It specifies that the 'directory' parameter is 'The directory to load data from,' which slightly clarifies the purpose but doesn't provide format requirements (e.g., local path, network path), supported directory structures, or examples. With 0% schema description coverage and only one parameter, this is adequate but leaves gaps in practical usage details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('Loads') and resource ('data from a directory into the Ragie index'). It distinguishes from sibling tools like 'retrieve_data_tool' and 'show_video_tool' by focusing on ingestion rather than retrieval or display. However, it doesn't explicitly differentiate from potential similar ingestion tools beyond the named siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides implied usage guidance through the instruction 'Wait until the data is fully ingested before continuing,' suggesting this is a blocking operation that should be used when immediate continuation isn't needed. However, it lacks explicit guidance on when to use this tool versus alternatives (e.g., batch vs. streaming ingestion) or any prerequisites for the directory structure.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/FiloHany/Video_RAG_MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server