get_video_retention
Analyze video viewer retention to identify drop-off points and engaging moments. Use this tool to optimize content, find boring sections, and improve completion rates by examining where viewers stop watching.
Instructions
Analyze WHERE viewers stop watching in a video. USE WHEN: Optimizing video content, finding boring sections, identifying engaging moments, improving completion rates. RETURNS: 101 data points (0-100%) showing viewer count at each percent of video. EXAMPLES: 'Where do viewers drop off in video 1_abc123?', 'What parts get replayed?', 'Compare retention for anonymous vs logged-in users'. Shows exact percentages where audience is lost.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| entry_id | Yes | Video to analyze (required, format: '1_abc123'). Get from search_entries or get_media_entry. | |
| from_date | No | Start date (optional, defaults to 30 days ago) | |
| to_date | No | End date (optional, defaults to today) | |
| user_filter | No | Optional viewer segment: 'anonymous' (not logged in), 'registered' (logged in), 'user@email.com' (specific user), 'cohort:students' (named group). Compare different audience behaviors. | |
| compare_segments | No | Compare filtered segment vs all viewers |
Implementation Reference
- The primary handler function implementing get_video_retention tool. Fetches 'percentiles' analytics report from Kaltura API for the specified entry_id, processes raw CSV data into 101 percentile points (0-100%), converts percentiles to video time positions using entry duration, calculates retention percentages relative to initial viewers, identifies major drop-offs and replay hotspots, and returns richly formatted JSON with video metadata, retention curve data, and actionable insights.async def get_video_retention( manager: KalturaClientManager, entry_id: str, from_date: Optional[str] = None, to_date: Optional[str] = None, user_filter: Optional[str] = None, compare_segments: bool = False, ) -> str: """ Analyze viewer retention throughout a video with percentile-level granularity. This function provides detailed retention curves showing exactly where viewers drop off or replay content within a single video. Returns 101 data points representing viewer behavior at each percent of the video duration. USE WHEN: - Analyzing where viewers stop watching within a video - Identifying segments that get replayed frequently - Optimizing video content structure and pacing - Comparing retention between different viewer segments - Understanding completion rates and engagement patterns Args: manager: Kaltura client manager entry_id: Video entry ID to analyze (required) from_date: Start date (optional, defaults to 30 days ago) to_date: End date (optional, defaults to today) user_filter: Filter by user type (optional): - None: All viewers (default) - "anonymous": Only non-logged-in viewers - "registered": Only logged-in viewers - "user@email.com": Specific user - "cohort:name": Named user cohort compare_segments: If True, compare filtered segment vs all viewers Returns: JSON with detailed retention analysis including TIME CONVERSION: { "video": { "id": "1_abc", "title": "Video Title", "duration_seconds": 300, "duration_formatted": "05:00" }, "retention_data": [ { "percentile": 0, "time_seconds": 0, "time_formatted": "00:00", "viewers": 1000, "unique_users": 1000, "retention_percentage": 100.0, "replays": 0 }, { "percentile": 10, "time_seconds": 30, "time_formatted": "00:30", "viewers": 850, "unique_users": 800, "retention_percentage": 85.0, "replays": 50 }, ... ], "insights": { "average_retention": 65.5, "completion_rate": 42.0, "fifty_percent_point": "02:30", "major_dropoffs": [ {"time": "00:30", "time_seconds": 30, "percentile": 10, "retention_loss": 15.0}, ... ], "replay_hotspots": [ {"time": "02:15", "time_seconds": 135, "percentile": 45, "replay_rate": 0.35}, ... ] } } Examples: # Basic retention analysis get_video_retention(manager, entry_id="1_abc123") # Compare anonymous vs all viewers get_video_retention(manager, entry_id="1_abc123", user_filter="anonymous", compare_segments=True) # Analyze specific user's viewing pattern get_video_retention(manager, entry_id="1_abc123", user_filter="john@example.com") """ # Map user-friendly filters to API values user_ids = None if user_filter: if user_filter.lower() == "anonymous": user_ids = "Unknown" elif user_filter.lower() == "registered": # This requires getting all users vs anonymous # Note: comparison logic could be added here in future pass elif user_filter.startswith("cohort:"): # Handle cohort logic user_ids = user_filter[7:] # Remove "cohort:" prefix else: user_ids = user_filter # Default date range if not provided if not from_date or not to_date: from datetime import datetime, timedelta end = datetime.now() start = end - timedelta(days=30) from_date = from_date or start.strftime("%Y-%m-%d") to_date = to_date or end.strftime("%Y-%m-%d") # Use the core analytics function with raw response format from .analytics_core import get_analytics_enhanced # Get raw percentiles data to avoid object creation issues result = await get_analytics_enhanced( manager=manager, from_date=from_date, to_date=to_date, report_type="percentiles", entry_id=entry_id, object_ids=entry_id, user_id=user_ids, limit=500, response_format="raw", ) # Parse and enhance the result try: data = json.loads(result) # Get video metadata to extract duration try: from .media import get_media_entry video_info = await get_media_entry(manager, entry_id) video_data = json.loads(video_info) video_duration = video_data.get("duration", 0) video_title = video_data.get("name", "Unknown") except Exception: # Fallback for tests or when media info is not available # Try to determine duration from the data if we have 100 percentile video_duration = 300 # Default 5 minutes video_title = f"Video {entry_id}" # If we can access the raw response, try to get metadata from there if "kaltura_response" in data and isinstance(data["kaltura_response"], dict): # Sometimes duration might be in the response metadata if ( "totalCount" in data["kaltura_response"] and data["kaltura_response"]["totalCount"] == 101 ): # 101 data points suggest percentiles 0-100, so we have full video coverage # Default to 5 minutes if we can't determine actual duration video_duration = 300 # Create enhanced format with time conversion formatted_result = { "video": { "id": entry_id, "title": video_title, "duration_seconds": video_duration, "duration_formatted": f"{video_duration // 60:02d}:{video_duration % 60:02d}", }, "date_range": {"from": from_date, "to": to_date}, "filter": {"user_ids": user_ids or "all"}, "retention_data": [], } # Process the Kaltura response and add time conversion if "kaltura_response" in data: kaltura_data = data["kaltura_response"] # Parse the CSV data with percentiles if "data" in kaltura_data and kaltura_data["data"]: # Split by newline or semicolon (Kaltura sometimes uses semicolons) if ";" in kaltura_data["data"] and "\n" not in kaltura_data["data"]: rows = kaltura_data["data"].strip().split(";") else: rows = kaltura_data["data"].strip().split("\n") # First pass: collect all data points raw_data_points = [] for row in rows: if row.strip(): # Parse percentile data (format: percentile|viewers|unique_users or CSV) if "|" in row: values = row.split("|") else: values = row.split(",") if len(values) >= 3: try: percentile = int(values[0]) viewers = int(values[1]) unique_users = int(values[2]) raw_data_points.append( { "percentile": percentile, "viewers": viewers, "unique_users": unique_users, } ) except (ValueError, TypeError): continue # Find the maximum viewer count to use as initial reference # This handles cases where percentile 0 has 0 viewers max_viewers = max((p["viewers"] for p in raw_data_points), default=0) # If we have data at percentile 0 with viewers > 0, use that as initial # Otherwise, use the maximum viewer count as the reference point initial_viewers = 0 for point in raw_data_points: if point["percentile"] == 0 and point["viewers"] > 0: initial_viewers = point["viewers"] break if initial_viewers == 0: # No viewers at start, use max viewers as reference initial_viewers = max_viewers # Second pass: calculate retention percentages for point in raw_data_points: percentile = point["percentile"] viewers = point["viewers"] unique_users = point["unique_users"] # Calculate time position time_seconds = int((percentile / 100.0) * video_duration) time_formatted = f"{time_seconds // 60:02d}:{time_seconds % 60:02d}" # Calculate retention percentage if initial_viewers > 0: retention_pct = viewers / initial_viewers * 100 else: # If no initial viewers, show 0% retention retention_pct = 0 if viewers == 0 else 100 formatted_result["retention_data"].append( { "percentile": percentile, "time_seconds": time_seconds, "time_formatted": time_formatted, "viewers": viewers, "unique_users": unique_users, "retention_percentage": round(retention_pct, 2), "replays": viewers - unique_users, } ) # Calculate insights if formatted_result["retention_data"]: retention_values = [ d["retention_percentage"] for d in formatted_result["retention_data"] ] # Find major drop-offs (>5% loss in 10 seconds / ~10 percentile points) major_dropoffs = [] for i in range(10, len(formatted_result["retention_data"]), 10): current = formatted_result["retention_data"][i] previous = formatted_result["retention_data"][i - 10] drop = previous["retention_percentage"] - current["retention_percentage"] if drop >= 5: major_dropoffs.append( { "time": current["time_formatted"], "time_seconds": current["time_seconds"], "percentile": current["percentile"], "retention_loss": round(drop, 2), } ) # Find replay hotspots replay_hotspots = [] for point in formatted_result["retention_data"]: if point["unique_users"] > 0: replay_rate = point["replays"] / point["unique_users"] if replay_rate > 0.2: # 20% replay rate threshold replay_hotspots.append( { "time": point["time_formatted"], "time_seconds": point["time_seconds"], "percentile": point["percentile"], "replay_rate": round(replay_rate, 2), } ) formatted_result["insights"] = { "average_retention": round(sum(retention_values) / len(retention_values), 2), "completion_rate": round(retention_values[-1] if retention_values else 0, 2), "fifty_percent_point": next( ( d["time_formatted"] for d in formatted_result["retention_data"] if d["retention_percentage"] <= 50 ), "Never", ), "major_dropoffs": major_dropoffs[:5], # Top 5 drop-offs "replay_hotspots": sorted( replay_hotspots, key=lambda x: x["replay_rate"], reverse=True )[:5], } # Keep raw response for reference formatted_result["kaltura_raw_response"] = kaltura_data elif "error" in data: return json.dumps(data, indent=2) if user_ids and compare_segments: formatted_result[ "note" ] = "For segment comparison, call this function twice with different user filters" return json.dumps(formatted_result, indent=2) except Exception as e: # If parsing fails, return error return json.dumps( { "error": f"Failed to process retention data: {str(e)}", "video_id": entry_id, "filter": {"user_ids": user_ids or "all"}, }, indent=2, )
- src/kaltura_mcp/server.py:182-211 (schema)The JSON schema definition and tool metadata registration for get_video_retention in the MCP server's list_tools() handler, defining input parameters, descriptions, and usage guidance.types.Tool( name="get_video_retention", description="Analyze WHERE viewers stop watching in a video. USE WHEN: Optimizing video content, finding boring sections, identifying engaging moments, improving completion rates. RETURNS: 101 data points (0-100%) showing viewer count at each percent of video. EXAMPLES: 'Where do viewers drop off in video 1_abc123?', 'What parts get replayed?', 'Compare retention for anonymous vs logged-in users'. Shows exact percentages where audience is lost.", inputSchema={ "type": "object", "properties": { "entry_id": { "type": "string", "description": "Video to analyze (required, format: '1_abc123'). Get from search_entries or get_media_entry.", }, "from_date": { "type": "string", "description": "Start date (optional, defaults to 30 days ago)", }, "to_date": { "type": "string", "description": "End date (optional, defaults to today)", }, "user_filter": { "type": "string", "description": "Optional viewer segment: 'anonymous' (not logged in), 'registered' (logged in), 'user@email.com' (specific user), 'cohort:students' (named group). Compare different audience behaviors.", }, "compare_segments": { "type": "boolean", "description": "Compare filtered segment vs all viewers", }, }, "required": ["entry_id"], }, ),
- src/kaltura_mcp/server.py:505-506 (registration)The dispatch logic in the MCP server's call_tool() handler that routes requests for 'get_video_retention' to the imported handler function with the KalturaClientManager.elif name == "get_video_retention": result = await get_video_retention(kaltura_manager, **arguments)
- src/kaltura_mcp/tools/__init__.py:4-10 (registration)Import and re-export of the get_video_retention handler from analytics.py for convenient access across the tools module.from .analytics import ( get_analytics, get_analytics_timeseries, get_geographic_breakdown, get_quality_metrics, get_realtime_metrics, get_video_retention,