PyTorch 文档搜索工具

PyTorch 文档搜索工具（项目已暂停）

具有命令行功能的 PyTorch 文档的语义搜索原型。

当前状态（2025 年 4 月 19 日）

⚠️ 该项目目前暂停，以进行重大重新设计。

该工具为 PyTorch 文档提供了一个基本的命令行搜索界面，但在多个方面仍需要进行实质性改进。虽然核心嵌入和搜索功能已基本可用，但相关性质量和 MCP 集成仍需要进一步开发。

示例输出

$ python scripts/search.py "How are multi-attention heads plotted out in PyTorch?"

Found 5 results for 'How are multi-attention heads plotted out in PyTorch?':

--- Result 1 (code) ---
Title: plot_visualization_utils.py
Source: plot_visualization_utils.py
Score: 0.3714
Snippet: # models. Let's start by analyzing the output of a Mask-RCNN model. Note that...

--- Result 2 (code) ---
Title: plot_transforms_getting_started.py
Source: plot_transforms_getting_started.py
Score: 0.3571
Snippet: https://github.com/pytorch/vision/tree/main/gallery/...

什么有效

✅基本语义搜索：用于查询 PyTorch 文档的命令行界面
✅矢量数据库：功能性 ChromaDB 集成，用于存储和查询嵌入
✅内容区分：区分代码和文本内容
✅交互模式：在会话中运行连续交互式查询的选项

需要改进的地方

❌相关性质量：中等相似度得分（0.35-0.37）表示结果不理想
❌内容覆盖：专业主题在数据库中可能没有足够的代表性
❌分块策略：当前方法会在任意点处破坏文档
❌结果呈现：片段太短，缺乏足够的背景信息
❌ MCP 集成：连接超时问题阻碍了 Claude Code 集成

入门

环境设置

创建包含所有依赖项的 conda 环境：

conda env create -f environment.yml
conda activate pytorch_docs_search

API 密钥设置

该工具需要 OpenAI API 密钥来生成嵌入：

export OPENAI_API_KEY=your_key_here

命令行用法

# Search with a direct query
python scripts/search.py "your search query here"

# Run in interactive mode
python scripts/search.py --interactive

# Additional options
python scripts/search.py "query" --results 5  # Limit to 5 results
python scripts/search.py "query" --filter code  # Only code results
python scripts/search.py "query" --json  # Output in JSON format

项目架构

ptsearch/core/ ：核心搜索功能（数据库、嵌入、搜索）
ptsearch/config/ ：配置管理
ptsearch/utils/ ：实用程序函数和日志记录
scripts/ ：命令行工具
data/ ：嵌入式文档和数据库
ptsearch/protocol/ ：MCP 协议处理（当前未使用）
ptsearch/transport/ ：传输实现（STDIO、SSE）（当前未使用）

该项目为何暂停

在评估当前实施情况后，我们发现了几个需要进行重大重新设计的挑战：

数据质量问题：当前的嵌入方法无法有效捕捉 PyTorch 概念之间的语义关系。相关性得分在 0.35-0.37 左右，对于高质量的用户体验来说太低了。
分块限制：我们当前的方法根据字符数而不是概念边界将文档分成块，从而导致结果不完整。
MCP 集成问题：尽管有多种实现方法，但在尝试与 Claude Code 集成时，我们遇到了持续的超时问题：
- 建立连接时 STDIO 集成失败
- 带有 SSE 传输的 Flask 服务器无法维持稳定的连接
- UVX部署也遇到了类似的超时问题

未来路线图

当开发恢复时，我们计划重点关注：

改进的分块策略：实现保留概念边界的语义分块
增强的结果格式：提供更多上下文和更好的片段选择
扩展文档覆盖范围：确保全面涵盖所有 PyTorch 主题
MCP 集成重新设计：与 Claude 团队合作解决超时问题

发展

运行测试

pytest -v tests/

格式代码

black .

执照

MIT 许可证

This server cannot be installed

security - not tested

license - not found

quality - not tested

How are these scores calculated?

提供对 PyTorch 文档的语义搜索功能，使用户能够通过 Claude Code 集成查找相关文档、API、代码示例和错误消息。

Related MCP Servers

Code Research MCP Server
nahmanmate
A
security
A
license
A
quality
Facilitates searching and accessing programming resources across platforms like Stack Overflow, MDN, GitHub, npm, and PyPI, aiding LLMs in finding code examples and documentation.
Last updated -
6
38
JavaScript
AGPL 3.0
RAG Documentation MCP Server
jumasheff
-
security
A
license
-
quality
Provides tools for retrieving and processing documentation through vector search, enabling AI assistants to augment their responses with relevant documentation context.
Last updated -
13
TypeScript
MIT License
DevDocs MCP Server
cyberagiinc
-
security
A
license
-
quality
Integrates with Claude to enable intelligent querying of documentation data, transforming crawled technical documentation into an actionable resource that LLMs can directly interact with.
Last updated -
1,818
TypeScript
Apache 2.0
Documentation MCP Server
sagacious-satadru
A
security
F
license
A
quality
A server that enables Claude to search and access documentation from popular libraries like LangChain, LlamaIndex, and OpenAI directly within conversations.
Last updated -
1
3
Python

View all related MCP servers

PyTorch Documentation Search Tool