Why this server?
This server directly addresses recording audio, and transcribing to text, though it focuses on online sources (YouTube, Bilibili, TikTok) not local.
Alicense-qualityCmaintenanceAn MCP server that downloads videos/extracts audio from various platforms like YouTube, Bilibili, and TikTok, then transcribes them to text using OpenAI's Whisper model.Last updated9MITWhy this server?
While it focuses on video recognition, it also indicates processing of audio and video input using Google's Gemini AI.
AlicenseBqualityCmaintenanceProvides tools for image, audio, and video recognition using Google's Gemini AI through the Model Context Protocol.Last updated310MITWhy this server?
This server provides text-to-speech capabilities and also mentions multiple audio formats. This can indirectly be useful.
Alicense-qualityCmaintenanceA Model Context Protocol server that integrates high-quality text-to-speech capabilities with Claude Desktop and other MCP-compatible clients, supporting multiple voice options and audio formats.Last updated91MITWhy this server?
This server focuses on invoice processing and OCR, it provides capabilities to extract text from invoice PDF and images. Since user needs to transcribe the recorded audio, this server can become useful if the user gets audio as video and converts video to image and then transcribe those images.
Flicense-quality-maintenanceA Python MCP server for invoice and receipt processing that uses OCR technology to extract data from PDFs and images, offering AI assistants the ability to process, extract text from, and merge invoice documents.Last updated2Why this server?
This server can convert various file formats to Markdown which can help in transcription output format.
Alicense-qualityCmaintenanceA Model Context Protocol server that converts various file formats (PDF, PowerPoint, Word, Excel, Images, etc.) to Markdown to make them accessible to LLMs.Last updated1MITWhy this server?
This server focuses on extracting transcripts from YouTube videos which is related to the transcription part of the prompt
AlicenseBqualityCmaintenanceA Model Context Protocol server that enables AI assistants to extract transcripts from YouTube videos, allowing AI to analyze and work with video content directly.Last updated1113MITWhy this server?
This server can convert various file types to Markdown format, and helps user to save the transcription output in Markdown.
AlicenseAquality-maintenanceConverts various file types and web content to Markdown format. It provides a set of tools to transform PDFs, images, audio files, web pages, and more into easily readable and shareable Markdown text.Last updated4101052,600Why this server?
A FastAPI-based application that enables document embedding and semantic retrieval using Qdrant vector database, allowing users to convert documents into embeddings and retrieve relevant content through natural language queries.This can be useful for managing the extracted text and audio.
Flicense-qualityCmaintenanceA multi-model platform that integrates RAG (Retrieval-Augmented Generation) with LLMs, supporting OCR via Tesseract and offering both backend API and frontend web interface.Last updatedWhy this server?
Enables browser automation using Python scripts, offering operations like taking webpage screenshots, retrieving HTML content, and executing JavaScript. This can be useful to record audio from web.
FlicenseAqualityCmaintenanceEnables browser automation using Python scripts, offering operations like taking webpage screenshots, retrieving HTML content, and executing JavaScript.Last updated422