de en es ja ko ru zh

baidu-ai-search

Official

by baidubce

Overview Schema Related Servers Score Discussions

Python

Remote

app-builder
cookbooks
advanced_application

agent_speech.ipynb•13.8 KiB

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 实时语音对话能力\n", "**注意⚠️：实时语音功能目前处于内测阶段，使用过程中有任何问题，欢迎提issue或微信群反馈～**\n", "\n", "## 目标\n", "实现一个实时语音对话功能，支持多种语音音色。用户可以参考cookbook代码，通过AppBuilder-SDK将实时语音功能很好地融入自己的平台、应用中。\n", "\n", "## 实现原理\n", "通过循环不断处理用户的语音，将语音转文本，然后进行对话，最后将对话结果通过TTS进行播报。。\n", "* 使用大模型的 ASR 进行语音转文本。\n", "* 使用用户自己创建的Agent进行对话，适配用户的应用场景，并具有上下文理解能力。\n", "* 使用大模型的 TTS 进行文本转语音并进行播报。\n", "\n", "## 前置条件\n", "* 使用内置ASR、TTS组件之前，请先开通组件服务并够买额度，可参考[开通组件服务](https://cloud.baidu.com/doc/AppBuilder/s/Glqb6dfiz#3%E3%80%81%E5%BC%80%E9%80%9A%E7%BB%84%E4%BB%B6%E6%9C%8D%E5%8A%A1)\n", "* pip安装pyaudio、webrtcvad依赖包\n", "* 给程序开放麦克风权限\n", "* 创建好自己的Agent应用\n", "\n", "## 示例代码" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Copyright (c) 2024 Baidu, Inc. All Rights Reserved.\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# http://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License.\n", "\n", "import os\n", "import time\n", "import wave\n", "import sys\n", "import pyaudio\n", "import webrtcvad\n", "import appbuilder\n", "import re\n", "\n", "# 请前往千帆AppBuilder官网创建密钥，流程详见：https://cloud.baidu.com/doc/AppBuilder/s/Olq6grrt6#1%E3%80%81%E5%88%9B%E5%BB%BA%E5%AF%86%E9%92%A5\n", "# 设置环境变量\n", "os.environ[\"APPBUILDER_TOKEN\"] = (\n", " \"...\"\n", ")\n", "# 已发布AppBuilder应用的ID\n", "app_id = \"...\"\n", "appbuilder.logger.setLoglevel(\"ERROR\")\n", "\n", "CHUNK = 1024\n", "FORMAT = pyaudio.paInt16\n", "CHANNELS = 1 if sys.platform == \"darwin\" else 2\n", "RATE = 16000\n", "DURATION = 30 # ms\n", "CHUNK = RATE // 1000 * DURATION\n", "\n", "\n", "class Chatbot:\n", " def __init__(self):\n", " self.p = pyaudio.PyAudio()\n", " self.tts = appbuilder.TTS()\n", " self.asr = appbuilder.ASR()\n", " self.agent = appbuilder.AppBuilderClient(app_id)\n", " self.conversation_id = self.agent.create_conversation()\n", "\n", " def run(self):\n", " self.run_tts_and_play_audio(\n", " \"我是你的专属聊天机器人，如果你有什么问题，可以直接问我\"\n", " )\n", " while True:\n", " # Record\n", " audio_path = \"output.wav\"\n", " print(\"开始记录音频...\")\n", " if self.record_audio(audio_path) < 1000:\n", " time.sleep(1)\n", " continue\n", " print(\"音频记录结束\")\n", "\n", " # ASR\n", " print(\"开始执行ASR...\")\n", " query = self.run_asr(audio_path)\n", " print(\"结束执行ASR\")\n", "\n", " # Agent\n", " print(\"query: \", query)\n", " if len(query) == 0:\n", " continue\n", " answer = self.run_agent(query)\n", " results = re.findall(r\"(https?://[^\\s]+)\", answer)\n", " for result in results:\n", " print(\"链接地址:\", result)\n", " answer = answer.replace(result, \"\")\n", " print(\"answer:\", answer)\n", "\n", " # TTS\n", " print(\"开始执行TTS并播报...\")\n", " self.run_tts_and_play_audio(answer)\n", " print(\"结束TTS并播报结束\")\n", "\n", " def record_audio(self, path):\n", " with wave.open(path, \"wb\") as wf:\n", " wf.setnchannels(CHANNELS)\n", " wf.setsampwidth(self.p.get_sample_size(FORMAT))\n", " wf.setframerate(RATE)\n", " stream = self.p.open(\n", " format=FORMAT, channels=CHANNELS, rate=RATE, input=True\n", " )\n", " vad = webrtcvad.Vad(1)\n", " not_speech_times = 0\n", " speech_times = 0\n", " total_times = 0\n", " start_up_times = 33 * 5 # 初始时间设置为5秒\n", " history_speech_times = 0\n", " while True:\n", " if history_speech_times > 33 * 10:\n", " break\n", " data = stream.read(CHUNK, False)\n", " if vad.is_speech(data, RATE):\n", " speech_times += 1\n", " wf.writeframes(data)\n", " else:\n", " not_speech_times += 1\n", " total_times += 1\n", " if total_times >= start_up_times:\n", " history_speech_times += speech_times\n", " # 模拟滑窗重新开始计数\n", " if float(not_speech_times) / float(total_times) > 0.7:\n", " break\n", " not_speech_times = 0\n", " speech_times = 0\n", " total_times = 0\n", " start_up_times = start_up_times / 2\n", " if start_up_times < 33:\n", " start_up_times = 33\n", " stream.close()\n", " return history_speech_times * DURATION\n", "\n", " def run_tts_and_play_audio(self, text: str):\n", " # AppBuilder内置的TTS使用文档，用户可根据文档调整参数：https://github.com/baidubce/app-builder/tree/master/python/core/components/tts\n", " msg = self.tts.run(\n", " appbuilder.Message(content={\"text\": text}),\n", " speed=5,\n", " pitch=5,\n", " volume=5,\n", " person=0,\n", " audio_type=\"pcm\",\n", " model=\"paddlespeech-tts\",\n", " stream=True,\n", " )\n", " stream = self.p.open(\n", " format=self.p.get_format_from_width(2),\n", " channels=1,\n", " rate=24000,\n", " output=True,\n", " frames_per_buffer=2048,\n", " )\n", " for pcm in msg.content:\n", " stream.write(pcm)\n", " stream.stop_stream()\n", " stream.close()\n", "\n", " # AppBuilder内置的ASR使用文档，用户可根据文档调整参数：https://github.com/baidubce/app-builder/blob/master/python/core/components/asr/README.md\n", " def run_asr(self, audio_path: str):\n", " with open(audio_path, \"rb\") as f:\n", " content_data = {\"audio_format\": \"wav\", \"raw_audio\": f.read(), \"rate\": 16000}\n", " msg = appbuilder.Message(content_data)\n", " out = self.asr.run(msg)\n", " text = out.content[\"result\"][0]\n", " return text\n", "\n", " def run_agent(self, query):\n", " msg = self.agent.run(self.conversation_id, query, stream=True)\n", " answer = \"\"\n", " for content in msg.content:\n", " answer += content.answer\n", " return answer\n", "\n", "\n", "if __name__ == \"__main__\":\n", " chatbot = Chatbot()\n", " chatbot.run()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 使用方法\n", "\n", "直接运行程序即可。\n", "\n", "用户也可以将下面的功能模块替换成自己的其他实现或模型：\n", "* record_audio: 录音\n", "* run_asr: 语音识别语音识别，[AppBuilder ASR组件使用文档](https://github.com/baidubce/app-builder/blob/master/python/core/components/asr/README.md)\n", "* run_agent: Agent对话功能，[AppBuilder TTS组件使用文档](https://github.com/baidubce/app-builder/blob/master/python/core/components/tts/README.md)\n", "* run_tts_and_play_audio：回复的语音生成并播报\n", "\n", "**AppBuilder TTS组件参数**\n", "| 参数名称 | 参数类型 | 是否必须 | 描述 | 示例值 |\n", "|------------|---------|------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------|\n", "| message | String | 是 | 待转成语音的文本 | Message(content={\"text\": \"需合成的文本\"}) |\n", "| model | String | 否 | 默认是`baidu-tts`模型，可选值：`paddlespeech-tts`、`baidu-tts` | paddlespeech-tts |\n", "| speed | Integer | 否 | 语音语速，默认是5中等语速，取值范围在0~15之间，仅当模型为`baidu-tts`参数有效，如果模型为`paddlespeech-tts`，参数自动失效 | 5 |\n", "| pitch | Integer | 否 | 语音音调，默认是5中等音调，取值范围在0~15之间，仅当模型为`baidu-tts`参数有效，如果模型为`paddlespeech-tts`，参数自动失效 | 5 |\n", "| volume | Integer | 否 | 语音音量，默认是5中等音量，取值范围在0~15之间，,仅当模型为`baidu-tts`参数有效，如果模型为`paddlespeech-tts`，参数自动失效 | 5 |\n", "| person | Integer | 否 | 语音人物特征，默认是0(度小美),普通音库可选值包括: 0(度小美)、1(度小宇)、3(度逍遥-基础)、4(度丫丫)；精品音库包括：5003(度逍遥-精品)、5118(度小鹿)、106(度博文)、110(度小童)、111(度小萌)、103(度米朵)、5(度小娇)；臻品音库包括：4003(度逍遥-情感男声)、4106(度博文-专业男主播)、4115(度小贤-电台男主播)、4119(度小鹿-甜美女声)、4105(度灵儿-清激女声)、4117(度小乔-活泼女声)、4100(度小雯-活力女主播)、4103(度米朵-可爱女声)、4144(度姗姗-娱乐女声)、4278(度小贝-知识女主播)、4143(度清风-配音男声)、4140(度小新-专业女主播)、4129(度小彦-知识男主播)、4149(度星河-广告男声)、4254(度小清-广告女声)、4206(度博文-综艺男声)、4226(南方-电台女主播)。仅当模型为`baidu-tts`参数有效，如果模型为`paddlespeech-tts`，参数自动失效 | 0 |\n", "| audio_type | String | 否 | 音频文件格式，如果使用`baidu-tts`模型可选`mp3`, `wav`; 如果使用`paddlespeech-tts`模型非流式返回，参数只能设为`wav`;如果使用`paddlespeech-tts`模型流式返回，参数只能设为`pcm` | wav |\n", "| stream | Bool | 否 | 默认是False, 目前`paddlespeech-tts`模型支持流式返回，`baidu-tts`模型不支持流式返回 | False |\n", "| retry | Integer | 否 | HTTP重试次数 | 3 |\n", "| timeout | Integer | 否 | HTTP超时时间 | 5 |" ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 2 }

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/baidubce/app-builder'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

agent_speech.ipynb•13.8 KiB