Skip to main content
Glama
omd0
by omd0

detect_conversations

Analyzes SRT subtitle files to identify conversation boundaries, detect languages, and create optimized chunks for translation workflows.

Instructions

🚀 CHUNK-BASED TRANSLATION WORKFLOW INSTRUCTIONS 🚀

📋 OVERVIEW: This tool analyzes SRT files and creates intelligent chunks for efficient translation. It returns METADATA ONLY - use get_next_chunk() and translate_srt() for actual content.

🔍 WHAT IT DOES:

  • SMART INPUT: Auto-detects file paths vs SRT content

  • Creates small chunks (1-3 subtitles each) optimized for AI processing

  • Detects languages (Arabic, English, Spanish, French) per chunk

  • Identifies speakers and conversation boundaries

  • Provides translation priority rankings (high/medium/low)

  • Stores chunks in memory to avoid context limits

  • Creates individual TODO tasks for tracking progress

📊 WHAT IT RETURNS (SMALL RESPONSE):

  • chunkCount: Total number of chunks created

  • totalDuration: File duration in milliseconds

  • languageDistribution: Language counts (e.g., {"ar": 45, "en": 12})

  • previewChunk: Preview of first chunk metadata only

  • sessionId: For retrieving chunks later

  • message: Instructions for next steps

  • todos: Individual tasks for each chunk

🎯 RECOMMENDED WORKFLOW:

  1. Call detect_conversations with storeInMemory=true

  2. Review metadata to understand file structure (SMALL RESPONSE)

  3. Use get_next_chunk to process chunks one by one

  4. Use translate_srt() for actual translation

  5. Track progress with todo_management

💡 EXAMPLES:

File Path Input: {"content": "/path/to/file.srt", "storeInMemory": true, "createTodos": true}

SRT Content Input: {"content": "1\n00:00:02,000 --> 00:00:07,000\nHello world", "storeInMemory": true}

⚠️ IMPORTANT:

  • This returns METADATA ONLY - no actual text content

  • Response is SMALL to avoid context overflow

  • Use get_next_chunk() to retrieve individual chunks

  • Use translate_srt() for actual translation

  • Store chunks in memory for large files to avoid context limits

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
contentYesSRT file content OR file path to analyze (auto-detected)
storeInMemoryNoStore chunks in memory to avoid context limits (default: false)
sessionIdNoSession ID for memory storage (optional, auto-generated if not provided)
createTodosNoCreate individual TODO tasks for each chunk (default: false)

Implementation Reference

  • Main MCP tool handler for 'detect_conversations'. Parses SRT input (file path or content), detects conversation chunks using advanced algorithm, generates metadata summary (chunk count, languages, speakers), optionally stores chunks in memory with sessionId, creates TODOs, returns compact metadata without full content to avoid context limits.
    private async handleDetectConversations(args: any) {
      const { content: inputContent, storeInMemory = false, sessionId, createTodos = false } = args;
      
      // Smart input detection: check if content is a file path or SRT content
      let content = inputContent;
      let srtContent = content;
      
      // Smart file detection: check if content is a file path or SRT content
      let isFilePath = false;
      
      // Check if content looks like a file path
      if (content.endsWith('.srt')) {
        // Try as absolute path first
        if (content.startsWith('/') && existsSync(content)) {
          isFilePath = true;
        }
        // Try as relative path from current directory
        else if (existsSync(content)) {
          isFilePath = true;
        }
        // Try as relative path from project root
        else if (existsSync(join(process.cwd(), content))) {
          const fullPath = join(process.cwd(), content);
          content = fullPath;
          isFilePath = true;
        }
      }
      
      if (isFilePath) {
        try {
          srtContent = readFileSync(content, 'utf8');
          console.log(`📁 Reading SRT file from path: ${content}`);
        } catch (error) {
          throw new Error(`Failed to read file ${content}: ${error instanceof Error ? error.message : 'Unknown error'}`);
        }
      } else {
        // Check if content looks like SRT format (has subtitle blocks)
        if (content.trim().length === 0) {
          throw new Error('Empty content provided');
        }
        
        // If it's just a filename without path, try to find it
        if (content.endsWith('.srt') && !content.includes('/')) {
          const possiblePaths = [
            content,
            join(process.cwd(), content),
            join(process.cwd(), 'examples', content),
            join(process.cwd(), 'samples', content)
          ];
          
          for (const path of possiblePaths) {
            if (existsSync(path)) {
              try {
                srtContent = readFileSync(path, 'utf8');
                console.log(`📁 Found SRT file at: ${path}`);
                break;
              } catch (error) {
                continue;
              }
            }
          }
          
          if (!srtContent) {
            throw new Error(`File not found: ${content}. Searched in: ${possiblePaths.join(', ')}`);
          }
        } else {
          console.log(`📄 Processing SRT content directly (${content.length} characters)`);
          srtContent = content;
        }
      }
      
      const parseResult = parseSRTFile(srtContent);
    
      if (!parseResult.success || !parseResult.file) {
        const errorDetails = parseResult.errors?.map(e => `${e.type}: ${e.message}`).join(', ') || 'Unknown parsing error';
        throw new Error(`Failed to parse SRT file: ${errorDetails}`);
      }
    
      // Use advanced conversation detection with MUCH SMALLER chunks for AI processing
      const chunks = detectConversationsAdvanced(parseResult.file.subtitles, {
        boundaryThreshold: 0.1, // Very low threshold for maximum chunks
        minChunkSize: 1,        // Allow single subtitle chunks
        maxChunkSize: 3,        // VERY small max chunk size (1-3 subtitles)
        enableSpeakerDiarization: true,
        enableSemanticAnalysis: true,
      });
    
      // Create metadata for AI
      const chunkMetadata = chunks.map((chunk, index) => {
        const firstSubtitle = chunk.subtitles[0];
        const lastSubtitle = chunk.subtitles[chunk.subtitles.length - 1];
        
        const languageInfo = this.detectLanguage(chunk);
        
        return {
          id: chunk.id,
          startIndex: chunk.startIndex,
          endIndex: chunk.endIndex,
          startTime: firstSubtitle.startTime,
          endTime: lastSubtitle.endTime,
          duration: (lastSubtitle.endTime as unknown as number) - (firstSubtitle.startTime as unknown as number),
          subtitleCount: chunk.subtitles.length,
          speaker: chunk.context?.speaker || 'Unknown',
          languageInfo: languageInfo,
          translationPriority: 'medium',
          contentType: 'dialogue',
          topicKeywords: [],
          complexity: 'medium',
        };
      });
    
      // Return only metadata about chunks, not the actual chunk data
      const result = {
        content: [
          {
            type: 'text',
            text: JSON.stringify({
              chunkCount: chunks.length,
              totalDuration: parseResult.file.subtitles.reduce((total, sub) => total + ((sub.endTime as unknown as number) - (sub.startTime as unknown as number)), 0),
              languageDistribution: this.analyzeLanguageDistribution(chunks),
              speakerDistribution: this.analyzeSpeakerDistribution(chunks),
              // Return only first chunk metadata as preview, not all chunks
              previewChunk: chunkMetadata.length > 0 ? chunkMetadata[0] : null,
              message: `Detected ${chunks.length} chunks. Use get_next_chunk to retrieve individual chunks.`,
              validationStatus: {
                isValid: chunks.length > 0,
                errors: [],
                warnings: [],
              }
            }, null, 2),
          },
        ],
      };
    
      // Store chunks in memory if requested
      if (storeInMemory) {
        const actualSessionId = sessionId || `srt-session-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
        this.chunkMemory.set(actualSessionId, chunks);
        this.chunkIndex.set(actualSessionId, 0);
        
        // Add session info to response
        const responseData = JSON.parse(result.content[0].text);
        responseData.sessionId = actualSessionId;
        result.content[0].text = JSON.stringify(responseData, null, 2);
      }
    
      // Create TODO tasks if requested
      if (createTodos) {
        const todos = await this.todoManager.createSRTProcessingTodos(
          'srt_file', 
          chunks.length, 
          'translation'
        );
        
        // Add todos info to response
        const responseData = JSON.parse(result.content[0].text);
        responseData.todos = todos;
        result.content[0].text = JSON.stringify(responseData, null, 2);
      }
    
      return result;
    }
  • Input schema validation for detect_conversations tool defining parameters: content (SRT or path), storeInMemory, sessionId, createTodos.
    inputSchema: {
      type: 'object',
      properties: {
        content: {
          type: 'string',
          description: 'SRT file content OR file path to analyze (auto-detected)',
        },
        storeInMemory: {
          type: 'boolean',
          description: 'Store chunks in memory to avoid context limits (default: false)',
          default: false,
        },
        sessionId: {
          type: 'string',
          description: 'Session ID for memory storage (optional, auto-generated if not provided)',
        },
        createTodos: {
          type: 'boolean',
          description: 'Create individual TODO tasks for each chunk (default: false)',
          default: false,
        },
      },
      required: ['content'],
    },
  • Tool registration in ListTools handler including name, detailed description of workflow, and input schema.
                name: 'detect_conversations',
                description: `🚀 CHUNK-BASED TRANSLATION WORKFLOW INSTRUCTIONS 🚀
    
    📋 OVERVIEW:
    This tool analyzes SRT files and creates intelligent chunks for efficient translation.
    It returns METADATA ONLY - use get_next_chunk() and translate_srt() for actual content.
    
    🔍 WHAT IT DOES:
    - SMART INPUT: Auto-detects file paths vs SRT content
    - Creates small chunks (1-3 subtitles each) optimized for AI processing
    - Detects languages (Arabic, English, Spanish, French) per chunk
    - Identifies speakers and conversation boundaries
    - Provides translation priority rankings (high/medium/low)
    - Stores chunks in memory to avoid context limits
    - Creates individual TODO tasks for tracking progress
    
    📊 WHAT IT RETURNS (SMALL RESPONSE):
    - chunkCount: Total number of chunks created
    - totalDuration: File duration in milliseconds
    - languageDistribution: Language counts (e.g., {"ar": 45, "en": 12})
    - previewChunk: Preview of first chunk metadata only
    - sessionId: For retrieving chunks later
    - message: Instructions for next steps
    - todos: Individual tasks for each chunk
    
    🎯 RECOMMENDED WORKFLOW:
    1. Call detect_conversations with storeInMemory=true
    2. Review metadata to understand file structure (SMALL RESPONSE)
    3. Use get_next_chunk to process chunks one by one
    4. Use translate_srt() for actual translation
    5. Track progress with todo_management
    
    💡 EXAMPLES:
    
    File Path Input:
    {"content": "/path/to/file.srt", "storeInMemory": true, "createTodos": true}
    
    SRT Content Input:
    {"content": "1\\n00:00:02,000 --> 00:00:07,000\\nHello world", "storeInMemory": true}
    
    ⚠️ IMPORTANT:
    - This returns METADATA ONLY - no actual text content
    - Response is SMALL to avoid context overflow
    - Use get_next_chunk() to retrieve individual chunks
    - Use translate_srt() for actual translation
    - Store chunks in memory for large files to avoid context limits`,
                inputSchema: {
                  type: 'object',
                  properties: {
                    content: {
                      type: 'string',
                      description: 'SRT file content OR file path to analyze (auto-detected)',
                    },
                    storeInMemory: {
                      type: 'boolean',
                      description: 'Store chunks in memory to avoid context limits (default: false)',
                      default: false,
                    },
                    sessionId: {
                      type: 'string',
                      description: 'Session ID for memory storage (optional, auto-generated if not provided)',
                    },
                    createTodos: {
                      type: 'boolean',
                      description: 'Create individual TODO tasks for each chunk (default: false)',
                      default: false,
                    },
                  },
                  required: ['content'],
                },
              },
  • Core helper function called by handler. Advanced conversation detection using multi-algorithm approach (timing, speakers, semantics, topics). Configurable for small chunks (1-3 subtitles). Returns SRTChunk[] array used for metadata.
    export function detectConversationsAdvanced(
      subtitles: SRTSubtitle[],
      options: {
        boundaryThreshold?: number;
        maxChunkSize?: number;
        minChunkSize?: number;
        enableSemanticAnalysis?: boolean;
        enableSpeakerDiarization?: boolean;
      } = {}
    ): SRTChunk[] {
      const {
        boundaryThreshold = 0.7,
        maxChunkSize = 20,
        minChunkSize = 2,
        enableSemanticAnalysis = true,
        enableSpeakerDiarization = true
      } = options;
    
      // First pass: Basic boundary detection with custom threshold
      const initialChunks = detectBasicBoundariesWithThreshold(subtitles, boundaryThreshold);
      
      let processedChunks = initialChunks;
      
      // Second pass: Semantic analysis (optional)
      if (enableSemanticAnalysis) {
        processedChunks = applySemanticAnalysis(processedChunks);
      }
      
      // Third pass: Speaker diarization (optional)
      if (enableSpeakerDiarization) {
        processedChunks = applySpeakerDiarization(processedChunks);
      }
      
      // Fourth pass: Size optimization
      processedChunks = optimizeChunkSizesWithLimits(processedChunks, maxChunkSize, minChunkSize);
      
      return processedChunks;
    }
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does an excellent job disclosing behavioral traits: it explains the tool auto-detects input types (file paths vs content), stores chunks in memory to avoid context limits, creates TODO tasks, returns only metadata (not content), and produces small responses to prevent context overflow. It doesn't mention error handling or performance characteristics, keeping it from a perfect score.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

While well-structured with clear sections, the description is verbose with decorative elements (emojis, all-caps headers) that don't add functional value. The core information could be conveyed more efficiently. However, every sentence does serve a purpose in explaining the tool's role in the workflow.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with no annotations and no output schema, the description provides exceptional completeness: it explains what the tool does, how to use it in context with sibling tools, what parameters mean, what behavior to expect, what gets returned, and provides concrete examples. This fully compensates for the lack of structured metadata.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the baseline is 3. The description adds meaningful context beyond the schema: it explains the auto-detection behavior for the 'content' parameter, provides concrete examples of both input types, and clarifies the purpose of storeInMemory (to avoid context limits) and createTodos (for tracking progress). This adds substantial practical guidance.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool analyzes SRT files and creates intelligent chunks for translation, distinguishing it from siblings like parse_srt (which likely parses without chunking) and translate_srt (which handles actual translation). It specifies the verb (analyzes/creates) and resource (SRT files) with specific scope (chunk-based translation workflow).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool versus alternatives, including a recommended workflow (step 1), explicit instructions to use get_next_chunk() for content retrieval and translate_srt() for translation, and warnings that this returns metadata only. It clearly positions this as the entry point in a multi-step process.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/omd0/srt-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server