Skip to main content
Glama
VincentKaufmann

noapi-google-search-mcp

transcribe_local

Transcribe local audio or video files with timestamps using Whisper. Supports formats like mp3, mp4, and more, with cached results for instant repeats.

Instructions

Transcribe a local audio or video file with timestamps using Whisper.

Supports any format FFmpeg can decode: mp3, wav, m4a, flac, ogg, aac, mp4, mkv, webm, avi, mov, wma, opus, and more.

Results are cached — repeat requests for the same file are instant.

Sample prompts that trigger this tool: - "Transcribe this recording: /path/to/meeting.mp3" - "What's said in this video? /path/to/lecture.mp4" - "Transcribe ~/Downloads/interview.wav" - "Transcribe the audio file on my desktop"

Args: file_path: Absolute path to the audio or video file. model_size: Whisper model size (tiny/base/small/medium/large). Default: tiny. language: Language code (e.g. "en", "de", "fr"). Auto-detected if empty.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYes
model_sizeNotiny
languageNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses format support (any FFmpeg-decodable format) and caching behavior (repeat requests are instant). However, it omits critical behavioral details such as error handling for nonexistent files, required permissions, processing time, or whether the operation is destructive. The caching note is helpful but incomplete.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a clear main sentence, followed by supported formats, caching note, sample prompts, and argument list. It is front-loaded with the essential purpose. The sample prompts are slightly redundant but do not significantly harm conciseness. Minor improvements could be made by removing repetitive examples.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (transcription with model and language options) and the presence of an output schema (so return values need not be detailed), the description covers the key aspects: file, model sizes, language detection, and caching. However, it lacks details on defaults for model_size and language (though they are in schema), and does not mention potential timeout or resource constraints for large files.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds significant meaning beyond the input schema, which has 0% coverage. It clarifies that 'file_path' must be an absolute path, explains 'model_size' values (tiny/base/small/medium/large) with a default, and specifies that 'language' is optional and auto-detected if empty. This compensates well for the schema's lack of descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool's action ('Transcribe'), the resource ('local audio or video file'), and the method ('using Whisper with timestamps'). However, it does not explicitly distinguish itself from the sibling tool 'transcribe_video', which likely processes remote or different sources. The name provides some differentiation, but the description misses the opportunity to clarify the scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes sample prompts but fails to provide guidance on when to use this tool versus alternatives like 'transcribe_video' or other media tools. There is no mention of prerequisites (e.g., FFmpeg installation) or situations where a different tool would be more appropriate. The agent is left to infer usage context from the tool name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/VincentKaufmann/noapi-google-search-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server