add_document_tool
Index files, directories, YouTube videos, or GitHub repositories into PinRAG's searchable database for retrieval-augmented generation with automatic format detection and batch processing.
Instructions
Add files, directories, YouTube videos, or GitHub repos to the index.
Automatically detects format per path and indexes:
- GitHub (URL, e.g. https://github.com/owner/repo or github.com/owner/repo/tree/branch)
- YouTube (URL or video ID, e.g. https://youtu.be/xyz)
- PDF (.pdf)
- Discord export (.txt with DiscordChatExporter Guild:/Channel: header)
Pass one or more paths. Single path: paths=[\"/path/to/file.pdf\"]. Uses
server config for vector store location and collection. Returns both
successful and failed entries so one bad path does not fail the batch.
Args:
paths: List of file paths, directory paths, YouTube URLs, or GitHub URLs to index.
tags: Optional list of tags, one per path (same order as paths).
branch: For GitHub URLs: override branch (default: main). Ignored for other formats.
include_patterns: For GitHub URLs: glob patterns for files to include (e.g. ["*.md", "src/**/*.py"]).
exclude_patterns: For GitHub URLs: glob patterns to exclude. Ignored for other formats.
ctx: MCP request context (injected by the server; unused).
Returns:
Dictionary containing indexed entries, failed entries, and totals.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| paths | Yes | Paths to index: file, directory, YouTube URL, or GitHub URL (e.g. https://github.com/owner/repo). Single path: ["/path/to/file.pdf"] or ["https://github.com/owner/repo"]. | |
| tags | No | Optional list of tags, one per path (same order as paths). | |
| branch | No | For GitHub URLs: override branch (default: main). Ignored for other formats. | |
| include_patterns | No | For GitHub URLs: glob patterns for files to include (e.g. ["*.md", "src/**/*.py"]). Ignored for other formats. | |
| exclude_patterns | No | For GitHub URLs: glob patterns to exclude. Ignored for other formats. |
Implementation Reference
- src/pinrag/mcp/server.py:119-186 (handler)The `add_document_tool` handler function that processes files, directories, YouTube URLs, and GitHub repos by calling `add_files` via a synchronous wrapper.
async def add_document_tool( paths: Annotated[ list[str], Field( description='Paths to index: file, directory, YouTube URL, or GitHub URL (e.g. https://github.com/owner/repo). Single path: ["/path/to/file.pdf"] or ["https://github.com/owner/repo"].' ), ], tags: Annotated[ list[str] | None, Field(description="Optional list of tags, one per path (same order as paths)."), ] = None, branch: Annotated[ str | None, Field( description="For GitHub URLs: override branch (default: main). Ignored for other formats." ), ] = None, include_patterns: Annotated[ list[str] | None, Field( description='For GitHub URLs: glob patterns for files to include (e.g. ["*.md", "src/**/*.py"]). Ignored for other formats.' ), ] = None, exclude_patterns: Annotated[ list[str] | None, Field( description="For GitHub URLs: glob patterns to exclude. Ignored for other formats." ), ] = None, ctx: Context | None = None, ) -> dict: r"""Add files, directories, YouTube videos, or GitHub repos to the index. Automatically detects format per path and indexes: - GitHub (URL, e.g. https://github.com/owner/repo or github.com/owner/repo/tree/branch) - YouTube (URL or video ID, e.g. https://youtu.be/xyz) - PDF (.pdf) - Discord export (.txt with DiscordChatExporter Guild:/Channel: header) Pass one or more paths. Single path: paths=[\"/path/to/file.pdf\"]. Uses server config for vector store location and collection. Returns both successful and failed entries so one bad path does not fail the batch. Args: paths: List of file paths, directory paths, YouTube URLs, or GitHub URLs to index. tags: Optional list of tags, one per path (same order as paths). branch: For GitHub URLs: override branch (default: main). Ignored for other formats. include_patterns: For GitHub URLs: glob patterns for files to include (e.g. ["*.md", "src/**/*.py"]). exclude_patterns: For GitHub URLs: glob patterns to exclude. Ignored for other formats. ctx: MCP request context (injected by the server; unused). Returns: Dictionary containing indexed entries, failed entries, and totals. """ def _run() -> dict: return add_files( paths=paths, persist_dir=config.get_persist_dir(), collection=config.get_collection_name(), tags=tags, branch=branch, include_patterns=include_patterns, exclude_patterns=exclude_patterns, ) return await anyio.to_thread.run_sync(_run)