Skip to main content
Glama

remove_document_tool

Delete documents and associated data from the PinRAG RAG index to manage stored content and maintain relevance.

Instructions

Remove a document and all its chunks from the PinRAG index.

Deletes all chunks and embeddings for the given document. Use
list_documents_tool to see document_ids (e.g. "mybook.pdf", "bwgLXEQdq20", "discord-alicia-1200-pcb", "owner/repo/path" for GitHub).
Uses server config for vector store location and collection.

Args:
    document_id: Document identifier to remove (from list_documents_tool).
    ctx: MCP request context (injected by the server; unused).

Returns:
    Dictionary containing deleted_chunks, document_id, persist_directory, collection_name.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
document_idYesDocument identifier to remove (from list_documents_tool).

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The core logic for removing a document from the Chroma index and associated storage.
    def remove_document(
        document_id: str,
        persist_dir: str = "",
        collection: str | None = None,
    ) -> dict[str, Any]:
        """Remove a document and all its chunks and embeddings from the Chroma index.
    
        The document_id must match exactly the name shown in list_documents (e.g.
        "mybook.pdf" or "discord-alicia-1200-pcb"). Deletes child chunks from Chroma
        and, when parent-child retrieval is enabled, parent chunks from the docstore.
    
        Args:
            document_id: Document identifier to remove (same as in list_documents).
            persist_dir: Chroma persistence directory (default: "chroma_db").
            collection: Chroma collection name (default: "pinrag").
    
        Returns:
            Dictionary with "deleted_chunks" (int), "document_id" (str),
            "persist_directory", "collection_name".
    
        Raises:
            ValueError: If document_id is empty or collection is empty.
            FileNotFoundError: If persist_dir does not exist.
    
        """
        if not document_id or not str(document_id).strip():
            raise ValueError("document_id cannot be empty")
        if collection is None or not str(collection).strip():
            collection = get_collection_name()
        else:
            collection = str(collection).strip()
    
        _persist = (persist_dir or "").strip() or get_persist_dir()
        persist_path = _resolve_persist_dir_path(_persist)
        if not persist_path.exists():
            raise FileNotFoundError(f"Persistence directory does not exist: {_persist}")
    
        store = get_chroma_store(
            persist_directory=_persist,
            collection_name=collection,
        )
    
        # Get chunks matching this document_id (need metadatas for parent doc_ids when parent-child)
        data = store.get(
            where={"document_id": document_id.strip()},
            include=["metadatas"] if get_use_parent_child() else [],
        )
        ids = data.get("ids") or []
        deleted_count = len(ids)
    
        # When parent-child is enabled, also delete parent chunks from docstore
        if get_use_parent_child() and ids:
            metadatas = data.get("metadatas") or []
            parent_ids = set()
            for meta in metadatas:
  • The MCP tool handler function for `remove_document_tool`, which exposes the tool to MCP clients and calls the implementation in `tools.py`.
    async def remove_document_tool(
        document_id: Annotated[
            str,
            Field(description="Document identifier to remove (from list_documents_tool)."),
        ],
        ctx: Context | None = None,
    ) -> dict:
        """Remove a document and all its chunks from the PinRAG index.
    
        Deletes all chunks and embeddings for the given document. Use
        list_documents_tool to see document_ids (e.g. "mybook.pdf", "bwgLXEQdq20", "discord-alicia-1200-pcb", "owner/repo/path" for GitHub).
        Uses server config for vector store location and collection.
    
        Args:
            document_id: Document identifier to remove (from list_documents_tool).
            ctx: MCP request context (injected by the server; unused).
    
        Returns:
            Dictionary containing deleted_chunks, document_id, persist_directory, collection_name.
    
        """
    
        def _run() -> dict:
            return remove_document(
                document_id=document_id,
                persist_dir=config.get_persist_dir(),
                collection=config.get_collection_name(),
            )
    
        return await anyio.to_thread.run_sync(_run)
  • The decorator registration for the `remove_document_tool`.
    @mcp.tool()
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. Explicitly states destructive scope ('Deletes all chunks and embeddings'), mentions server config dependency for vector store location, and documents return structure. Could clarify if operation is reversible.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Front-loaded with clear action statement. Uses docstring-style Args/Returns sections which provide structure but repeat schema information. Each sentence adds value (mechanism, prerequisite, config dependency, examples).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Appropriate for a single-parameter destructive tool. Covers prerequisites (list_documents_tool), explains what gets destroyed (chunks/embeddings), documents return values, and notes configuration dependencies. Complete given the tool's limited complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with document_id well-described. Description adds valuable examples of ID formats (e.g., 'bwgLXEQdq20'). However, erroneously documents 'ctx' parameter in Args section that does not exist in the input schema, potentially causing confusion.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear specific verb (Remove) + resource (document and chunks) + scope (PinRAG index). Distinguishes from siblings: contrasts with add_document_tool/add_url_tool (ingestion), list_documents_tool (listing), and query_tool (search).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly directs users to use list_documents_tool first to obtain document_ids, providing clear workflow guidance. Includes concrete examples of ID formats (e.g., 'mybook.pdf', 'owner/repo/path'). Lacks explicit 'when not to use' exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ndjordjevic/pinrag'

If you have feedback or need assistance with the MCP directory API, please join our Discord server