Skip to main content
Glama

add_document_tool

Index files, directories, YouTube videos, or GitHub repositories into PinRAG's searchable database for retrieval-augmented generation with automatic format detection and batch processing.

Instructions

Add files, directories, YouTube videos, or GitHub repos to the index.

Automatically detects format per path and indexes:
- GitHub (URL, e.g. https://github.com/owner/repo or github.com/owner/repo/tree/branch)
- YouTube (URL or video ID, e.g. https://youtu.be/xyz)
- PDF (.pdf)
- Discord export (.txt with DiscordChatExporter Guild:/Channel: header)

Pass one or more paths. Single path: paths=[\"/path/to/file.pdf\"]. Uses
server config for vector store location and collection. Returns both
successful and failed entries so one bad path does not fail the batch.

Args:
    paths: List of file paths, directory paths, YouTube URLs, or GitHub URLs to index.
    tags: Optional list of tags, one per path (same order as paths).
    branch: For GitHub URLs: override branch (default: main). Ignored for other formats.
    include_patterns: For GitHub URLs: glob patterns for files to include (e.g. ["*.md", "src/**/*.py"]).
    exclude_patterns: For GitHub URLs: glob patterns to exclude. Ignored for other formats.
    ctx: MCP request context (injected by the server; unused).

Returns:
    Dictionary containing indexed entries, failed entries, and totals.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
pathsYesPaths to index: file, directory, YouTube URL, or GitHub URL (e.g. https://github.com/owner/repo). Single path: ["/path/to/file.pdf"] or ["https://github.com/owner/repo"].
tagsNoOptional list of tags, one per path (same order as paths).
branchNoFor GitHub URLs: override branch (default: main). Ignored for other formats.
include_patternsNoFor GitHub URLs: glob patterns for files to include (e.g. ["*.md", "src/**/*.py"]). Ignored for other formats.
exclude_patternsNoFor GitHub URLs: glob patterns to exclude. Ignored for other formats.

Implementation Reference

  • The `add_document_tool` handler function that processes files, directories, YouTube URLs, and GitHub repos by calling `add_files` via a synchronous wrapper.
    async def add_document_tool(
        paths: Annotated[
            list[str],
            Field(
                description='Paths to index: file, directory, YouTube URL, or GitHub URL (e.g. https://github.com/owner/repo). Single path: ["/path/to/file.pdf"] or ["https://github.com/owner/repo"].'
            ),
        ],
        tags: Annotated[
            list[str] | None,
            Field(description="Optional list of tags, one per path (same order as paths)."),
        ] = None,
        branch: Annotated[
            str | None,
            Field(
                description="For GitHub URLs: override branch (default: main). Ignored for other formats."
            ),
        ] = None,
        include_patterns: Annotated[
            list[str] | None,
            Field(
                description='For GitHub URLs: glob patterns for files to include (e.g. ["*.md", "src/**/*.py"]). Ignored for other formats.'
            ),
        ] = None,
        exclude_patterns: Annotated[
            list[str] | None,
            Field(
                description="For GitHub URLs: glob patterns to exclude. Ignored for other formats."
            ),
        ] = None,
        ctx: Context | None = None,
    ) -> dict:
        r"""Add files, directories, YouTube videos, or GitHub repos to the index.
    
        Automatically detects format per path and indexes:
        - GitHub (URL, e.g. https://github.com/owner/repo or github.com/owner/repo/tree/branch)
        - YouTube (URL or video ID, e.g. https://youtu.be/xyz)
        - PDF (.pdf)
        - Discord export (.txt with DiscordChatExporter Guild:/Channel: header)
    
        Pass one or more paths. Single path: paths=[\"/path/to/file.pdf\"]. Uses
        server config for vector store location and collection. Returns both
        successful and failed entries so one bad path does not fail the batch.
    
        Args:
            paths: List of file paths, directory paths, YouTube URLs, or GitHub URLs to index.
            tags: Optional list of tags, one per path (same order as paths).
            branch: For GitHub URLs: override branch (default: main). Ignored for other formats.
            include_patterns: For GitHub URLs: glob patterns for files to include (e.g. ["*.md", "src/**/*.py"]).
            exclude_patterns: For GitHub URLs: glob patterns to exclude. Ignored for other formats.
            ctx: MCP request context (injected by the server; unused).
    
        Returns:
            Dictionary containing indexed entries, failed entries, and totals.
    
        """
    
        def _run() -> dict:
            return add_files(
                paths=paths,
                persist_dir=config.get_persist_dir(),
                collection=config.get_collection_name(),
                tags=tags,
                branch=branch,
                include_patterns=include_patterns,
                exclude_patterns=exclude_patterns,
            )
    
        return await anyio.to_thread.run_sync(_run)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ndjordjevic/pinrag'

If you have feedback or need assistance with the MCP directory API, please join our Discord server