Skip to main content
Glama

add_document_tool

Index files, directories, YouTube videos, or GitHub repositories into PinRAG's searchable database for retrieval-augmented generation with automatic format detection and batch processing.

Instructions

Add files, directories, YouTube videos, or GitHub repos to the index.

Automatically detects format per path and indexes:
- GitHub (URL, e.g. https://github.com/owner/repo or github.com/owner/repo/tree/branch)
- YouTube (URL or video ID, e.g. https://youtu.be/xyz)
- PDF (.pdf)
- Discord export (.txt with DiscordChatExporter Guild:/Channel: header)

Pass one or more paths. Single path: paths=[\"/path/to/file.pdf\"]. Uses
server config for vector store location and collection. Returns both
successful and failed entries so one bad path does not fail the batch.

Args:
    paths: List of file paths, directory paths, YouTube URLs, or GitHub URLs to index.
    tags: Optional list of tags, one per path (same order as paths).
    branch: For GitHub URLs: override branch (default: main). Ignored for other formats.
    include_patterns: For GitHub URLs: glob patterns for files to include (e.g. ["*.md", "src/**/*.py"]).
    exclude_patterns: For GitHub URLs: glob patterns to exclude. Ignored for other formats.
    ctx: MCP request context (injected by the server; unused).

Returns:
    Dictionary containing indexed entries, failed entries, and totals.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
pathsYesPaths to index: file, directory, YouTube URL, or GitHub URL (e.g. https://github.com/owner/repo). Single path: ["/path/to/file.pdf"] or ["https://github.com/owner/repo"].
tagsNoOptional list of tags, one per path (same order as paths).
branchNoFor GitHub URLs: override branch (default: main). Ignored for other formats.
include_patternsNoFor GitHub URLs: glob patterns for files to include (e.g. ["*.md", "src/**/*.py"]). Ignored for other formats.
exclude_patternsNoFor GitHub URLs: glob patterns to exclude. Ignored for other formats.

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The `add_document_tool` handler function that processes files, directories, YouTube URLs, and GitHub repos by calling `add_files` via a synchronous wrapper.
    async def add_document_tool(
        paths: Annotated[
            list[str],
            Field(
                description='Paths to index: file, directory, YouTube URL, or GitHub URL (e.g. https://github.com/owner/repo). Single path: ["/path/to/file.pdf"] or ["https://github.com/owner/repo"].'
            ),
        ],
        tags: Annotated[
            list[str] | None,
            Field(description="Optional list of tags, one per path (same order as paths)."),
        ] = None,
        branch: Annotated[
            str | None,
            Field(
                description="For GitHub URLs: override branch (default: main). Ignored for other formats."
            ),
        ] = None,
        include_patterns: Annotated[
            list[str] | None,
            Field(
                description='For GitHub URLs: glob patterns for files to include (e.g. ["*.md", "src/**/*.py"]). Ignored for other formats.'
            ),
        ] = None,
        exclude_patterns: Annotated[
            list[str] | None,
            Field(
                description="For GitHub URLs: glob patterns to exclude. Ignored for other formats."
            ),
        ] = None,
        ctx: Context | None = None,
    ) -> dict:
        r"""Add files, directories, YouTube videos, or GitHub repos to the index.
    
        Automatically detects format per path and indexes:
        - GitHub (URL, e.g. https://github.com/owner/repo or github.com/owner/repo/tree/branch)
        - YouTube (URL or video ID, e.g. https://youtu.be/xyz)
        - PDF (.pdf)
        - Discord export (.txt with DiscordChatExporter Guild:/Channel: header)
    
        Pass one or more paths. Single path: paths=[\"/path/to/file.pdf\"]. Uses
        server config for vector store location and collection. Returns both
        successful and failed entries so one bad path does not fail the batch.
    
        Args:
            paths: List of file paths, directory paths, YouTube URLs, or GitHub URLs to index.
            tags: Optional list of tags, one per path (same order as paths).
            branch: For GitHub URLs: override branch (default: main). Ignored for other formats.
            include_patterns: For GitHub URLs: glob patterns for files to include (e.g. ["*.md", "src/**/*.py"]).
            exclude_patterns: For GitHub URLs: glob patterns to exclude. Ignored for other formats.
            ctx: MCP request context (injected by the server; unused).
    
        Returns:
            Dictionary containing indexed entries, failed entries, and totals.
    
        """
    
        def _run() -> dict:
            return add_files(
                paths=paths,
                persist_dir=config.get_persist_dir(),
                collection=config.get_collection_name(),
                tags=tags,
                branch=branch,
                include_patterns=include_patterns,
                exclude_patterns=exclude_patterns,
            )
    
        return await anyio.to_thread.run_sync(_run)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. Discloses critical behaviors: auto-format detection per path, partial failure tolerance (one bad path doesn't fail batch), server config dependency for vector store, and parameter applicability constraints (GitHub params ignored for other formats). Missing idempotency or rate limit disclosures.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with front-loaded purpose statement, followed by behavior details, formatted Args section, and Returns section. Minor redundancy between Args section and schema descriptions, but earns its place by presenting information in readable narrative form with practical examples.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Comprehensive coverage of multi-format input complexity (PDF, Discord exports, GitHub, YouTube), vector store configuration context, and return value summary appropriate given output schema exists. Adequately prepares agent for the tool's multi-modal indexing behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, establishing baseline 3. Description adds value by providing concrete syntax examples in prose ('paths=["/path/to/file.pdf"]'), clarifying the ordering constraint between tags and paths ('one per path, same order'), and noting that ctx is injected/unused—context not fully captured by schema descriptions alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Opens with specific verb 'Add' and enumerates exact resource types (files, directories, YouTube videos, GitHub repos) and target destination 'to the index'. Implicitly distinguishes from sibling add_url_tool by specifically listing content types that require format detection, while add_url_tool likely handles generic URLs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides concrete usage examples ('Single path: paths=[...]') and explains batch error handling ('Returns both successful and failed entries'), but lacks explicit when-to-use guidance versus add_url_tool or prerequisites like file access permissions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ndjordjevic/pinrag'

If you have feedback or need assistance with the MCP directory API, please join our Discord server