analyze_audio
Analyze speech audio to detect the speaker's emotional state, energy level, and stress indicator. Extract basic audio features like pitch and duration.
Instructions
Analyze audio for emotion and basic features.
This tool analyzes speech audio to detect the speaker's emotional state and extract basic audio features. For Lite tier, ASR is not included, so provide the 'text' parameter if you have a transcription.
Args: audio_path: Path to the audio file (WAV format supported) text: Optional transcription text for context
Returns: Dictionary containing: - transcription: None for Lite tier (ASR not included) - note: Information about Lite tier limitations - emotion: Object with primary emotion, confidence, secondary emotion, scores - speaker_state: Object with energy_level and stress_indicator - features: Raw audio features (duration, pitch, energy, etc.)
Example: { "transcription": null, "note": "Lite tier does not include ASR...", "emotion": { "primary": "happy", "confidence": 0.85, "secondary": "excited", "scores": {"happy": 0.8, "excited": 0.6, ...} }, "speaker_state": { "energy_level": "high", "stress_indicator": "low" }, "features": {...} }
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| audio_path | Yes | ||
| text | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |