Skip to main content
Glama

remove_document_tool

Delete documents and associated data from the PinRAG RAG index to manage stored content and maintain relevance.

Instructions

Remove a document and all its chunks from the PinRAG index.

Deletes all chunks and embeddings for the given document. Use
list_documents_tool to see document_ids (e.g. "mybook.pdf", "bwgLXEQdq20", "discord-alicia-1200-pcb", "owner/repo/path" for GitHub).
Uses server config for vector store location and collection.

Args:
    document_id: Document identifier to remove (from list_documents_tool).
    ctx: MCP request context (injected by the server; unused).

Returns:
    Dictionary containing deleted_chunks, document_id, persist_directory, collection_name.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
document_idYesDocument identifier to remove (from list_documents_tool).

Implementation Reference

  • The core logic for removing a document from the Chroma index and associated storage.
    def remove_document(
        document_id: str,
        persist_dir: str = "",
        collection: str | None = None,
    ) -> dict[str, Any]:
        """Remove a document and all its chunks and embeddings from the Chroma index.
    
        The document_id must match exactly the name shown in list_documents (e.g.
        "mybook.pdf" or "discord-alicia-1200-pcb"). Deletes child chunks from Chroma
        and, when parent-child retrieval is enabled, parent chunks from the docstore.
    
        Args:
            document_id: Document identifier to remove (same as in list_documents).
            persist_dir: Chroma persistence directory (default: "chroma_db").
            collection: Chroma collection name (default: "pinrag").
    
        Returns:
            Dictionary with "deleted_chunks" (int), "document_id" (str),
            "persist_directory", "collection_name".
    
        Raises:
            ValueError: If document_id is empty or collection is empty.
            FileNotFoundError: If persist_dir does not exist.
    
        """
        if not document_id or not str(document_id).strip():
            raise ValueError("document_id cannot be empty")
        if collection is None or not str(collection).strip():
            collection = get_collection_name()
        else:
            collection = str(collection).strip()
    
        _persist = (persist_dir or "").strip() or get_persist_dir()
        persist_path = _resolve_persist_dir_path(_persist)
        if not persist_path.exists():
            raise FileNotFoundError(f"Persistence directory does not exist: {_persist}")
    
        store = get_chroma_store(
            persist_directory=_persist,
            collection_name=collection,
        )
    
        # Get chunks matching this document_id (need metadatas for parent doc_ids when parent-child)
        data = store.get(
            where={"document_id": document_id.strip()},
            include=["metadatas"] if get_use_parent_child() else [],
        )
        ids = data.get("ids") or []
        deleted_count = len(ids)
    
        # When parent-child is enabled, also delete parent chunks from docstore
        if get_use_parent_child() and ids:
            metadatas = data.get("metadatas") or []
            parent_ids = set()
            for meta in metadatas:
  • The MCP tool handler function for `remove_document_tool`, which exposes the tool to MCP clients and calls the implementation in `tools.py`.
    async def remove_document_tool(
        document_id: Annotated[
            str,
            Field(description="Document identifier to remove (from list_documents_tool)."),
        ],
        ctx: Context | None = None,
    ) -> dict:
        """Remove a document and all its chunks from the PinRAG index.
    
        Deletes all chunks and embeddings for the given document. Use
        list_documents_tool to see document_ids (e.g. "mybook.pdf", "bwgLXEQdq20", "discord-alicia-1200-pcb", "owner/repo/path" for GitHub).
        Uses server config for vector store location and collection.
    
        Args:
            document_id: Document identifier to remove (from list_documents_tool).
            ctx: MCP request context (injected by the server; unused).
    
        Returns:
            Dictionary containing deleted_chunks, document_id, persist_directory, collection_name.
    
        """
    
        def _run() -> dict:
            return remove_document(
                document_id=document_id,
                persist_dir=config.get_persist_dir(),
                collection=config.get_collection_name(),
            )
    
        return await anyio.to_thread.run_sync(_run)
  • The decorator registration for the `remove_document_tool`.
    @mcp.tool()

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ndjordjevic/pinrag'

If you have feedback or need assistance with the MCP directory API, please join our Discord server