Skip to main content
Glama

delete_document

Remove documents from Paperlib MCP by deleting database entries and optionally associated PDF files to manage your academic literature collection.

Instructions

删除指定文档

从数据库删除文档及其所有关联数据(chunks、embeddings、导入记录等)。 可选择同时删除 MinIO 中的 PDF 文件。

Args: doc_id: 文档的唯一标识符 also_delete_object: 是否同时删除 MinIO 中的 PDF 文件,默认 False

Returns: 删除结果,包含删除的记录数量

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
doc_idYes
also_delete_objectNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The core handler function for the 'delete_document' tool. It handles deletion of the document from the database tables (ingest_jobs, documents, chunks, chunk_embeddings) and optionally deletes the associated PDF file from MinIO storage using delete_object. Returns a success dict with stats.
    @mcp.tool()
    def delete_document(
        doc_id: str,
        also_delete_object: bool = False,
    ) -> dict[str, Any]:
        """删除指定文档
        
        从数据库删除文档及其所有关联数据(chunks、embeddings、导入记录等)。
        可选择同时删除 MinIO 中的 PDF 文件。
        
        Args:
            doc_id: 文档的唯一标识符
            also_delete_object: 是否同时删除 MinIO 中的 PDF 文件,默认 False
            
        Returns:
            删除结果,包含删除的记录数量
        """
        try:
            # 先获取文档信息
            doc = query_one(
                "SELECT pdf_key FROM documents WHERE doc_id = %s",
                (doc_id,)
            )
            
            if not doc:
                return {
                    "success": False,
                    "error": f"Document not found: {doc_id}",
                    "doc_id": doc_id,
                }
            
            pdf_key = doc["pdf_key"]
            
            # 统计将要删除的数据
            stats = query_one(
                """
                SELECT 
                    (SELECT COUNT(*) FROM chunks WHERE doc_id = %s) as chunk_count,
                    (SELECT COUNT(*) FROM chunk_embeddings ce 
                     JOIN chunks c ON ce.chunk_id = c.chunk_id 
                     WHERE c.doc_id = %s) as embedding_count,
                    (SELECT COUNT(*) FROM ingest_jobs WHERE doc_id = %s) as job_count
                """,
                (doc_id, doc_id, doc_id)
            )
            
            # 删除导入记录
            execute("DELETE FROM ingest_jobs WHERE doc_id = %s", (doc_id,))
            
            # 删除文档(级联删除 chunks 和 embeddings)
            execute("DELETE FROM documents WHERE doc_id = %s", (doc_id,))
            
            result = {
                "success": True,
                "doc_id": doc_id,
                "deleted_chunks": stats["chunk_count"] if stats else 0,
                "deleted_embeddings": stats["embedding_count"] if stats else 0,
                "deleted_jobs": stats["job_count"] if stats else 0,
                "object_deleted": False,
            }
            
            # 可选删除 MinIO 对象
            if also_delete_object and pdf_key:
                delete_result = delete_object(pdf_key)
                result["object_deleted"] = delete_result.get("deleted", False)
                result["pdf_key"] = pdf_key
            
            return result
            
        except Exception as e:
            return {
                "success": False,
                "error": str(e),
                "doc_id": doc_id,
            }
  • Registration of the fetch tools, including delete_document, by calling register_fetch_tools(mcp) in the main MCP server setup.
    from paperlib_mcp.tools.fetch import register_fetch_tools
    from paperlib_mcp.tools.writing import register_writing_tools
    
    # M2 GraphRAG 工具
    from paperlib_mcp.tools.graph_extract import register_graph_extract_tools
    from paperlib_mcp.tools.graph_canonicalize import register_graph_canonicalize_tools
    from paperlib_mcp.tools.graph_community import register_graph_community_tools
    from paperlib_mcp.tools.graph_summarize import register_graph_summarize_tools
    from paperlib_mcp.tools.graph_maintenance import register_graph_maintenance_tools
    
    # M3 Review 工具
    from paperlib_mcp.tools.review import register_review_tools
    
    # M4 Canonicalization & Grouping 工具
    from paperlib_mcp.tools.graph_relation_canonicalize import register_graph_relation_canonicalize_tools
    from paperlib_mcp.tools.graph_claim_grouping import register_graph_claim_grouping_tools
    from paperlib_mcp.tools.graph_v12 import register_graph_v12_tools
    
    register_health_tools(mcp)
    register_import_tools(mcp)
    register_search_tools(mcp)
    register_fetch_tools(mcp)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It discloses that deletion affects '所有关联数据' (all associated data) and mentions MinIO PDF file deletion as an option, which is valuable behavioral context. However, it doesn't address critical aspects like permissions needed, whether deletion is permanent/irreversible, rate limits, or error conditions - significant gaps for a destructive operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with a clear purpose statement, scope clarification, parameter explanations, and return value indication. Every sentence adds value, though the Args/Returns formatting could be more integrated with the natural language flow.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a destructive tool with no annotations, the description covers core functionality and parameters adequately. The output schema exists, so return values don't need detailed explanation. However, it lacks important context about safety, permissions, and irreversible consequences that would be crucial for responsible tool invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description compensates well by explaining both parameters: 'doc_id' as the document's unique identifier and 'also_delete_object' controlling MinIO PDF deletion with its default value. This adds meaningful context beyond the bare schema, though it could specify format expectations for doc_id.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('删除指定文档' - delete specified document) and resource ('文档及其所有关联数据' - document and all associated data), distinguishing it from sibling tools like 'get_document', 'update_document', and 'list_documents'. It precisely defines the scope of deletion beyond just the document record.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'update_document' or 'rechunk_document'. It doesn't mention prerequisites (e.g., document must exist), consequences, or when deletion might be irreversible versus other modification options.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paperlib-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server