remove_document_tool
Delete documents and associated data from the PinRAG RAG index to manage stored content and maintain relevance.
Instructions
Remove a document and all its chunks from the PinRAG index.
Deletes all chunks and embeddings for the given document. Use
list_documents_tool to see document_ids (e.g. "mybook.pdf", "bwgLXEQdq20", "discord-alicia-1200-pcb", "owner/repo/path" for GitHub).
Uses server config for vector store location and collection.
Args:
document_id: Document identifier to remove (from list_documents_tool).
ctx: MCP request context (injected by the server; unused).
Returns:
Dictionary containing deleted_chunks, document_id, persist_directory, collection_name.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| document_id | Yes | Document identifier to remove (from list_documents_tool). |
Implementation Reference
- src/pinrag/mcp/tools.py:844-898 (handler)The core logic for removing a document from the Chroma index and associated storage.
def remove_document( document_id: str, persist_dir: str = "", collection: str | None = None, ) -> dict[str, Any]: """Remove a document and all its chunks and embeddings from the Chroma index. The document_id must match exactly the name shown in list_documents (e.g. "mybook.pdf" or "discord-alicia-1200-pcb"). Deletes child chunks from Chroma and, when parent-child retrieval is enabled, parent chunks from the docstore. Args: document_id: Document identifier to remove (same as in list_documents). persist_dir: Chroma persistence directory (default: "chroma_db"). collection: Chroma collection name (default: "pinrag"). Returns: Dictionary with "deleted_chunks" (int), "document_id" (str), "persist_directory", "collection_name". Raises: ValueError: If document_id is empty or collection is empty. FileNotFoundError: If persist_dir does not exist. """ if not document_id or not str(document_id).strip(): raise ValueError("document_id cannot be empty") if collection is None or not str(collection).strip(): collection = get_collection_name() else: collection = str(collection).strip() _persist = (persist_dir or "").strip() or get_persist_dir() persist_path = _resolve_persist_dir_path(_persist) if not persist_path.exists(): raise FileNotFoundError(f"Persistence directory does not exist: {_persist}") store = get_chroma_store( persist_directory=_persist, collection_name=collection, ) # Get chunks matching this document_id (need metadatas for parent doc_ids when parent-child) data = store.get( where={"document_id": document_id.strip()}, include=["metadatas"] if get_use_parent_child() else [], ) ids = data.get("ids") or [] deleted_count = len(ids) # When parent-child is enabled, also delete parent chunks from docstore if get_use_parent_child() and ids: metadatas = data.get("metadatas") or [] parent_ids = set() for meta in metadatas: - src/pinrag/mcp/server.py:279-308 (handler)The MCP tool handler function for `remove_document_tool`, which exposes the tool to MCP clients and calls the implementation in `tools.py`.
async def remove_document_tool( document_id: Annotated[ str, Field(description="Document identifier to remove (from list_documents_tool)."), ], ctx: Context | None = None, ) -> dict: """Remove a document and all its chunks from the PinRAG index. Deletes all chunks and embeddings for the given document. Use list_documents_tool to see document_ids (e.g. "mybook.pdf", "bwgLXEQdq20", "discord-alicia-1200-pcb", "owner/repo/path" for GitHub). Uses server config for vector store location and collection. Args: document_id: Document identifier to remove (from list_documents_tool). ctx: MCP request context (injected by the server; unused). Returns: Dictionary containing deleted_chunks, document_id, persist_directory, collection_name. """ def _run() -> dict: return remove_document( document_id=document_id, persist_dir=config.get_persist_dir(), collection=config.get_collection_name(), ) return await anyio.to_thread.run_sync(_run) - src/pinrag/mcp/server.py:277-277 (registration)The decorator registration for the `remove_document_tool`.
@mcp.tool()