Skip to main content
Glama

baidu-ai-search

Official
by baidubce
qa_system_1_dataset.ipynb14.6 kB
{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# 企业级问答系统 —— 离线知识生产流程\n", "\n", "## 一、背景介绍\n", "\n", "智能问答系统是当前应用范围最广,最容易落地的大模型应用。通常来说,智能问答系统包括以下五部分:\n", "1. Query理解\n", "2. 混合检索召回\n", "3. 语义排序\n", "4. 答案生成\n", "5. 追问生成\n", "\n", "AppBuilder在问答系统沉淀了最佳实践流程,并以极易用的组件化的方式对外开放,可以使用页面操作 + 低代码的方式快速搭建自己的问答系统。\n", "\n", "<img src=\"./qa_system/qa_flow.png\" alt=\"drawing\" width=\"1000\"/>\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 二、实操流程\n", "\n", "整体使用流程包括以下三个环节:\n", "\n", "1. 登录[百度智能云千帆AppBuilder官网](https://cloud.baidu.com/product/AppBuilder)创建账户,\n", "\n", "<img src=\"./qa_system/product.png\" alt=\"drawing\" width=\"1000\"/>\n", "\n", "\n", "2. 在[百度智能云千帆AppBuilder控制台](https://console.bce.baidu.com/ai_apaas/dialogHome)右侧头像栏『API密钥』页面获取密钥,并复制。\n", "\n", "<img src=\"./qa_system/token.jpg\" alt=\"drawing\" width=\"1000\"/>\n", "\n", "\n", "<img src=\"./qa_system/get_token.png\" alt=\"drawing\" width=\"1000\"/>\n", "\n", "\n", "3. 引用AppBuilder SDK代码,调用Dataset API相关接口,创建数据集,并增、查、删除文档。" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 2.1 环境准备\n", "\n", "首先需要安装AppBuilder-SDK代码库,若已在开发环境安装,则可跳过此步。\n", "\n", "**注意:**: appbuilder-sdk 的python版本要求 3.9+" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Defaulting to user installation because normal site-packages is not writeable\n", "Collecting appbuilder-sdk\n", " Using cached appbuilder_sdk-0.6.0-py3-none-any.whl.metadata (768 bytes)\n", "Requirement already satisfied: requests in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from appbuilder-sdk) (2.31.0)\n", "Requirement already satisfied: proto-plus==1.22.3 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from appbuilder-sdk) (1.22.3)\n", "Requirement already satisfied: pydantic==2.6.4 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from appbuilder-sdk) (2.6.4)\n", "Requirement already satisfied: numpy in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from appbuilder-sdk) (1.24.4)\n", "Requirement already satisfied: SQLAlchemy~=1.4.50 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from appbuilder-sdk) (1.4.52)\n", "Requirement already satisfied: urllib3<2.0.0 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from appbuilder-sdk) (1.26.18)\n", "Requirement already satisfied: tenacity in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from appbuilder-sdk) (8.2.3)\n", "Requirement already satisfied: pandas in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from appbuilder-sdk) (2.2.1)\n", "Requirement already satisfied: openpyxl in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from appbuilder-sdk) (3.1.2)\n", "Requirement already satisfied: pymochow>=1.1.2 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from appbuilder-sdk) (1.1.2)\n", "Requirement already satisfied: parameterized in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from appbuilder-sdk) (0.9.0)\n", "Requirement already satisfied: protobuf<5.0.0dev,>=3.19.0 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from proto-plus==1.22.3->appbuilder-sdk) (4.25.3)\n", "Requirement already satisfied: annotated-types>=0.4.0 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from pydantic==2.6.4->appbuilder-sdk) (0.6.0)\n", "Requirement already satisfied: pydantic-core==2.16.3 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from pydantic==2.6.4->appbuilder-sdk) (2.16.3)\n", "Requirement already satisfied: typing-extensions>=4.6.1 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from pydantic==2.6.4->appbuilder-sdk) (4.10.0)\n", "Requirement already satisfied: orjson in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from pymochow>=1.1.2->appbuilder-sdk) (3.9.15)\n", "Requirement already satisfied: future in /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/site-packages (from pymochow>=1.1.2->appbuilder-sdk) (0.18.2)\n", "Requirement already satisfied: et-xmlfile in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from openpyxl->appbuilder-sdk) (1.1.0)\n", "Requirement already satisfied: python-dateutil>=2.8.2 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from pandas->appbuilder-sdk) (2.8.2)\n", "Requirement already satisfied: pytz>=2020.1 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from pandas->appbuilder-sdk) (2024.1)\n", "Requirement already satisfied: tzdata>=2022.7 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from pandas->appbuilder-sdk) (2024.1)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from requests->appbuilder-sdk) (3.3.2)\n", "Requirement already satisfied: idna<4,>=2.5 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from requests->appbuilder-sdk) (3.6)\n", "Requirement already satisfied: certifi>=2017.4.17 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from requests->appbuilder-sdk) (2024.2.2)\n", "Requirement already satisfied: six>=1.5 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from python-dateutil>=2.8.2->pandas->appbuilder-sdk) (1.16.0)\n", "Using cached appbuilder_sdk-0.6.0-py3-none-any.whl (335 kB)\n", "Installing collected packages: appbuilder-sdk\n", "Successfully installed appbuilder-sdk-0.6.0\n" ] } ], "source": [ "!python3 -m pip install appbuilder-sdk" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 2.2 引入AppBuilder-SDK,并设置TOKEN\n", "\n", "**注意:** 请使用刚才在AppBuilder平台上新建的个人TOKEN" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "AppBuilder 模块导入成功!\n", "您的AppBuilder Token为:your_appbuilder_token\n" ] } ], "source": [ "# 引入os模块,引入appbuilder 模块\n", "import os\n", "import appbuilder\n", "\n", "# 设置appbuilder的token密钥,从页面上复制粘贴token,覆盖此处的 \"your_appbuilder_token\"\n", "os.environ['APPBUILDER_TOKEN'] = \"your_appbuilder_token\"\n", "\n", "print(\"AppBuilder 模块导入成功!\")\n", "print(\"您的AppBuilder Token为:{}\".format(os.environ['APPBUILDER_TOKEN']))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 2.3 创建全新知识库\n", "\n", "我们从零开始创建一个新的知识库,并使用它来支持问答系统。我们将其命名为**说明书知识库**" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "知识库创建成功! 知识库的ID为:c368a982-699f-4114-9e9a-c08b76c07f92, 知识库的名称为: 说明书知识库\n" ] } ], "source": [ "# 创建全新知识库\n", "dataset = appbuilder.console.Dataset.create_dataset(\"说明书知识库\")\n", "\n", "print(\"知识库创建成功! 知识库的ID为:{}, 知识库的名称为: {}\".format(\n", " dataset.dataset_id,\n", " dataset.dataset_name))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 2.4 上传文档到知识库\n", "\n", "在notebook的同级目录`./qa_system`下,我们提前准备了两篇文档:\n", "- `./qa_system/平板电脑说明书.pdf`\n", "- `./qa_system/显示器说明书.pdf`\n", "\n", "我们将这两篇文档上传到知识库中,用于后续问答应用的私域知识检索。" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "文档上传到知识库成功!知识库中的文档ID为:['c396c715-66d8-49cb-af9d-af0bd08a16e2', '817d29bb-be72-420b-bee1-c1d9b1116826']\n" ] } ], "source": [ "file_1 = \"./qa_system/平板电脑说明书.pdf\"\n", "file_2 = \"./qa_system/显示器说明书.pdf\"\n", "\n", "document_infos = dataset.add_documents([\n", " file_1, file_2\n", "])\n", "\n", "print(\"文档上传到知识库成功!知识库中的文档ID为:{}\".format(document_infos.document_ids))\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 2.5 查找知识库中的文档\n", "\n", "问答系统中的知识库,常用增、删、查操作,我们刚才演示了增加文档的方法,下面演示如何查找文档。\n", "\n", "我们获取[百度智能云千帆AppBuilder控制台-我的知识](https://console.bce.baidu.com/ai_apaas/dataset)页面中,我们刚才创建的知识库中的文档列表。\n", "\n", "<img src=\"./qa_system/doc_list.png\" alt=\"drawing\" width=\"1000\"/>" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "文档列表获取成功!\n", "第二个的文档是: 平板电脑说明书.pdf, 文档ID是: c396c715-66d8-49cb-af9d-af0bd08a16e2, 文档字数是: 8568\n" ] } ], "source": [ "doc_list = dataset.get_documents(page=1,limit=10)\n", "print(\"文档列表获取成功!\")\n", "print(\"第二个的文档是: {}, 文档ID是: {}, 文档字数是: {}\".format(\n", " doc_list.data[1].name, doc_list.data[1].id, doc_list.data[1].word_count))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 2.6 删除知识库中的文档\n", "\n", "下面我们演示如何删除知识库中的文档。我们尝试删除 『平板电脑说明书.pdf』 这篇文档" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "文档: 平板电脑说明书.pdf, 文档ID: c396c715-66d8-49cb-af9d-af0bd08a16e2 删除成功!\n", "文档删除成功,请刷新『知识库』前端页面,观察是否已无『平板电脑说明书.pdf』文档\n" ] } ], "source": [ "need_delete_doc_name = \"平板电脑说明书.pdf\"\n", "need_delete_doc_id = \"\"\n", "\n", "doc_list = dataset.get_documents(page=1,limit=10)\n", "for doc in doc_list.data:\n", " if doc.name == need_delete_doc_name:\n", " need_delete_doc_id = doc.id\n", " break\n", "\n", "if need_delete_doc_id == \"\":\n", " print(\"未找到指定文档,请检查 2.4 步骤是否正确上传,以及2.5步骤中是否展示了正确的文档列表\")\n", "else:\n", " dataset.delete_documents([need_delete_doc_id])\n", " print(\"文档: {}, 文档ID: {} 删除成功!\".format(\n", " need_delete_doc_name,\n", " need_delete_doc_id\n", " ))\n", "\n", "print(\"文档删除成功,请刷新『知识库』前端页面,观察是否已无『{}』文档\".format(need_delete_doc_name))\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "执行完上述步骤后,前端页面刷新,指定文档已经被成功删除\n", "\n", "<img src=\"./qa_system/delete_doc.png\" alt=\"drawing\" width=\"1000\"/>" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "当前创建的知识库的ID为:c368a982-699f-4114-9e9a-c08b76c07f92, 知识库的名称为: 说明书知识库\n" ] } ], "source": [ "# 再一次确认当前知识库的 Name 与 ID,方便后续 企业级问答系统 —— 在线问答流程 的使用\n", "\n", "print(\"当前创建的知识库的ID为:{}, 知识库的名称为: {}\".format(\n", " dataset.dataset_id,\n", " dataset.dataset_name))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 三、总结\n", "\n", "AppBuilder 为 基于RAG的 企业问答系统提供了 `appbuilder.console.dataset` API,可以方便的以代码态进行知识库的管理,实现基本的增删改查业务需求。\n", "\n", "本示例展示了如何从零创建知识库,并实现增、查、改的代码,并可以观察SDK与前端页面的联动。" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } } }, "nbformat": 4, "nbformat_minor": 2 }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/baidubce/app-builder'

If you have feedback or need assistance with the MCP directory API, please join our Discord server