Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| OPENAI_API_KEY | Yes | Your OpenAI API key for accessing Whisper and GPT-4o models | |
| AUDIO_FILES_PATH | Yes | Path to your audio files directory |
Capabilities
Server capabilities have not been inspected yet.
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| get_latest_audio | Get the most recent audio file from the audio path. ONLY USE THIS IF THE USER ASKS FOR THE LATEST FILE. |
| list_audio_files | List, filter, and sort audio files from the audio path. Supports regex pattern matching, filtering by metadata (size, duration, date, format), and sorting. |
| convert_audio | A tool used to convert audio files to mp3 or wav which are gpt-4o compatible. |
| compress_audio | A tool used to compress audio files which are >25mb. ONLY USE THIS IF THE USER REQUESTS COMPRESSION OR IF OTHER TOOLS FAIL DUE TO FILES BEING TOO LARGE. |
| transcribe_audio | A tool used to transcribe audio files. It is recommended to use |
| chat_with_audio | A tool used to chat with audio files. The response will be a response to the audio file sent. It is recommended to use |
| transcribe_with_enhancement | Transcribe audio with GPT-4 using specific enhancement prompts. Enhancement types:
- detailed: Provides detailed description including tone, emotion, and background
- storytelling: Transforms the transcription into a narrative
- professional: Formats the transcription in a formal, business-appropriate way
- analytical: Includes analysis of speech patterns, key points, and structure
Args:
input_file_name: Name of the input audio file to process
enhancement_type: Type of enhancement to apply to the transcription
model: The transcription model to use
response_format: The response format
timestamp_granularities: Optional timestamp granularities
Returns:
-------
TranscriptionResult with enhanced transcription
|
| create_audio | Create text-to-speech audio using OpenAI's TTS API with model and voice selection. |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |