Skip to main content
Glama

baidu-ai-search

Official
by baidubce
agent_speech.ipynb14.1 kB
{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 实时语音对话能力\n", "**注意⚠️:实时语音功能目前处于内测阶段,使用过程中有任何问题,欢迎提issue或微信群反馈~**\n", "\n", "## 目标\n", "实现一个实时语音对话功能,支持多种语音音色。用户可以参考cookbook代码,通过AppBuilder-SDK将实时语音功能很好地融入自己的平台、应用中。\n", "\n", "## 实现原理\n", "通过循环不断处理用户的语音,将语音转文本,然后进行对话,最后将对话结果通过TTS进行播报。。\n", "* 使用大模型的 ASR 进行语音转文本。\n", "* 使用用户自己创建的Agent进行对话,适配用户的应用场景,并具有上下文理解能力。\n", "* 使用大模型的 TTS 进行文本转语音并进行播报。\n", "\n", "## 前置条件\n", "* 使用内置ASR、TTS组件之前,请先开通组件服务并够买额度,可参考[开通组件服务](https://cloud.baidu.com/doc/AppBuilder/s/Glqb6dfiz#3%E3%80%81%E5%BC%80%E9%80%9A%E7%BB%84%E4%BB%B6%E6%9C%8D%E5%8A%A1)\n", "* pip安装pyaudio、webrtcvad依赖包\n", "* 给程序开放麦克风权限\n", "* 创建好自己的Agent应用\n", "\n", "## 示例代码" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Copyright (c) 2024 Baidu, Inc. All Rights Reserved.\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# http://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License.\n", "\n", "import os\n", "import time\n", "import wave\n", "import sys\n", "import pyaudio\n", "import webrtcvad\n", "import appbuilder\n", "import re\n", "\n", "# 请前往千帆AppBuilder官网创建密钥,流程详见:https://cloud.baidu.com/doc/AppBuilder/s/Olq6grrt6#1%E3%80%81%E5%88%9B%E5%BB%BA%E5%AF%86%E9%92%A5\n", "# 设置环境变量\n", "os.environ[\"APPBUILDER_TOKEN\"] = (\n", " \"...\"\n", ")\n", "# 已发布AppBuilder应用的ID\n", "app_id = \"...\"\n", "appbuilder.logger.setLoglevel(\"ERROR\")\n", "\n", "CHUNK = 1024\n", "FORMAT = pyaudio.paInt16\n", "CHANNELS = 1 if sys.platform == \"darwin\" else 2\n", "RATE = 16000\n", "DURATION = 30 # ms\n", "CHUNK = RATE // 1000 * DURATION\n", "\n", "\n", "class Chatbot:\n", " def __init__(self):\n", " self.p = pyaudio.PyAudio()\n", " self.tts = appbuilder.TTS()\n", " self.asr = appbuilder.ASR()\n", " self.agent = appbuilder.AppBuilderClient(app_id)\n", " self.conversation_id = self.agent.create_conversation()\n", "\n", " def run(self):\n", " self.run_tts_and_play_audio(\n", " \"我是你的专属聊天机器人,如果你有什么问题,可以直接问我\"\n", " )\n", " while True:\n", " # Record\n", " audio_path = \"output.wav\"\n", " print(\"开始记录音频...\")\n", " if self.record_audio(audio_path) < 1000:\n", " time.sleep(1)\n", " continue\n", " print(\"音频记录结束\")\n", "\n", " # ASR\n", " print(\"开始执行ASR...\")\n", " query = self.run_asr(audio_path)\n", " print(\"结束执行ASR\")\n", "\n", " # Agent\n", " print(\"query: \", query)\n", " if len(query) == 0:\n", " continue\n", " answer = self.run_agent(query)\n", " results = re.findall(r\"(https?://[^\\s]+)\", answer)\n", " for result in results:\n", " print(\"链接地址:\", result)\n", " answer = answer.replace(result, \"\")\n", " print(\"answer:\", answer)\n", "\n", " # TTS\n", " print(\"开始执行TTS并播报...\")\n", " self.run_tts_and_play_audio(answer)\n", " print(\"结束TTS并播报结束\")\n", "\n", " def record_audio(self, path):\n", " with wave.open(path, \"wb\") as wf:\n", " wf.setnchannels(CHANNELS)\n", " wf.setsampwidth(self.p.get_sample_size(FORMAT))\n", " wf.setframerate(RATE)\n", " stream = self.p.open(\n", " format=FORMAT, channels=CHANNELS, rate=RATE, input=True\n", " )\n", " vad = webrtcvad.Vad(1)\n", " not_speech_times = 0\n", " speech_times = 0\n", " total_times = 0\n", " start_up_times = 33 * 5 # 初始时间设置为5秒\n", " history_speech_times = 0\n", " while True:\n", " if history_speech_times > 33 * 10:\n", " break\n", " data = stream.read(CHUNK, False)\n", " if vad.is_speech(data, RATE):\n", " speech_times += 1\n", " wf.writeframes(data)\n", " else:\n", " not_speech_times += 1\n", " total_times += 1\n", " if total_times >= start_up_times:\n", " history_speech_times += speech_times\n", " # 模拟滑窗重新开始计数\n", " if float(not_speech_times) / float(total_times) > 0.7:\n", " break\n", " not_speech_times = 0\n", " speech_times = 0\n", " total_times = 0\n", " start_up_times = start_up_times / 2\n", " if start_up_times < 33:\n", " start_up_times = 33\n", " stream.close()\n", " return history_speech_times * DURATION\n", "\n", " def run_tts_and_play_audio(self, text: str):\n", " # AppBuilder内置的TTS使用文档,用户可根据文档调整参数:https://github.com/baidubce/app-builder/tree/master/python/core/components/tts\n", " msg = self.tts.run(\n", " appbuilder.Message(content={\"text\": text}),\n", " speed=5,\n", " pitch=5,\n", " volume=5,\n", " person=0,\n", " audio_type=\"pcm\",\n", " model=\"paddlespeech-tts\",\n", " stream=True,\n", " )\n", " stream = self.p.open(\n", " format=self.p.get_format_from_width(2),\n", " channels=1,\n", " rate=24000,\n", " output=True,\n", " frames_per_buffer=2048,\n", " )\n", " for pcm in msg.content:\n", " stream.write(pcm)\n", " stream.stop_stream()\n", " stream.close()\n", "\n", " # AppBuilder内置的ASR使用文档,用户可根据文档调整参数:https://github.com/baidubce/app-builder/blob/master/python/core/components/asr/README.md\n", " def run_asr(self, audio_path: str):\n", " with open(audio_path, \"rb\") as f:\n", " content_data = {\"audio_format\": \"wav\", \"raw_audio\": f.read(), \"rate\": 16000}\n", " msg = appbuilder.Message(content_data)\n", " out = self.asr.run(msg)\n", " text = out.content[\"result\"][0]\n", " return text\n", "\n", " def run_agent(self, query):\n", " msg = self.agent.run(self.conversation_id, query, stream=True)\n", " answer = \"\"\n", " for content in msg.content:\n", " answer += content.answer\n", " return answer\n", "\n", "\n", "if __name__ == \"__main__\":\n", " chatbot = Chatbot()\n", " chatbot.run()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 使用方法\n", "\n", "直接运行程序即可。\n", "\n", "用户也可以将下面的功能模块替换成自己的其他实现或模型:\n", "* record_audio: 录音\n", "* run_asr: 语音识别语音识别,[AppBuilder ASR组件使用文档](https://github.com/baidubce/app-builder/blob/master/python/core/components/asr/README.md)\n", "* run_agent: Agent对话功能,[AppBuilder TTS组件使用文档](https://github.com/baidubce/app-builder/blob/master/python/core/components/tts/README.md)\n", "* run_tts_and_play_audio:回复的语音生成并播报\n", "\n", "**AppBuilder TTS组件参数**\n", "| 参数名称 | 参数类型 | 是否必须 | 描述 | 示例值 |\n", "|------------|---------|------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------|\n", "| message | String | 是 | 待转成语音的文本 | Message(content={\"text\": \"需合成的文本\"}) |\n", "| model | String | 否 | 默认是`baidu-tts`模型,可选值:`paddlespeech-tts`、`baidu-tts` | paddlespeech-tts |\n", "| speed | Integer | 否 | 语音语速,默认是5中等语速,取值范围在0~15之间,仅当模型为`baidu-tts`参数有效,如果模型为`paddlespeech-tts`,参数自动失效 | 5 |\n", "| pitch | Integer | 否 | 语音音调,默认是5中等音调,取值范围在0~15之间,仅当模型为`baidu-tts`参数有效,如果模型为`paddlespeech-tts`,参数自动失效 | 5 |\n", "| volume | Integer | 否 | 语音音量,默认是5中等音量,取值范围在0~15之间,,仅当模型为`baidu-tts`参数有效,如果模型为`paddlespeech-tts`,参数自动失效 | 5 |\n", "| person | Integer | 否 | 语音人物特征,默认是0(度小美),普通音库可选值包括: 0(度小美)、1(度小宇)、3(度逍遥-基础)、4(度丫丫);精品音库包括:5003(度逍遥-精品)、5118(度小鹿)、106(度博文)、110(度小童)、111(度小萌)、103(度米朵)、5(度小娇);臻品音库包括:4003(度逍遥-情感男声)、4106(度博文-专业男主播)、4115(度小贤-电台男主播)、4119(度小鹿-甜美女声)、4105(度灵儿-清激女声)、4117(度小乔-活泼女声)、4100(度小雯-活力女主播)、4103(度米朵-可爱女声)、4144(度姗姗-娱乐女声)、4278(度小贝-知识女主播)、4143(度清风-配音男声)、4140(度小新-专业女主播)、4129(度小彦-知识男主播)、4149(度星河-广告男声)、4254(度小清-广告女声)、4206(度博文-综艺男声)、4226(南方-电台女主播)。仅当模型为`baidu-tts`参数有效,如果模型为`paddlespeech-tts`,参数自动失效 | 0 |\n", "| audio_type | String | 否 | 音频文件格式,如果使用`baidu-tts`模型可选`mp3`, `wav`; 如果使用`paddlespeech-tts`模型非流式返回,参数只能设为`wav`;如果使用`paddlespeech-tts`模型流式返回,参数只能设为`pcm` | wav |\n", "| stream | Bool | 否 | 默认是False, 目前`paddlespeech-tts`模型支持流式返回,`baidu-tts`模型不支持流式返回 | False |\n", "| retry | Integer | 否 | HTTP重试次数 | 3 |\n", "| timeout | Integer | 否 | HTTP超时时间 | 5 |" ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 2 }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/baidubce/app-builder'

If you have feedback or need assistance with the MCP directory API, please join our Discord server