Skip to main content
Glama

ingest_status

Check import status for documents in Paperlib MCP, showing progress stages and error details to monitor PDF processing and troubleshoot issues.

Instructions

查看导入状态

查看指定文档或作业的导入状态,包括各阶段进度和错误信息。

Args: doc_id: 文档 ID(通过 doc_id 查询最新作业) job_id: 作业 ID(直接查询特定作业)

Returns: 导入状态信息,包含各阶段状态、错误摘要和建议修复动作

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
doc_idNo
job_idNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The handler function for the 'ingest_status' tool. It retrieves the status of an import job or document from the database, compiles stage statuses, document statistics, and suggests next actions.
    def ingest_status(
        doc_id: str | None = None,
        job_id: int | None = None,
    ) -> dict[str, Any]:
        """查看导入状态
        
        查看指定文档或作业的导入状态,包括各阶段进度和错误信息。
        
        Args:
            doc_id: 文档 ID(通过 doc_id 查询最新作业)
            job_id: 作业 ID(直接查询特定作业)
            
        Returns:
            导入状态信息,包含各阶段状态、错误摘要和建议修复动作
        """
        try:
            if not doc_id and not job_id:
                return {
                    "error": "Must provide either doc_id or job_id",
                }
            
            # 获取作业信息
            if job_id:
                job = query_one(
                    """
                    SELECT job_id, doc_id, status, current_stage, 
                           started_at::text, finished_at::text, error
                    FROM ingest_jobs
                    WHERE job_id = %s
                    """,
                    (job_id,)
                )
            else:
                job = query_one(
                    """
                    SELECT job_id, doc_id, status, current_stage,
                           started_at::text, finished_at::text, error
                    FROM ingest_jobs
                    WHERE doc_id = %s
                    ORDER BY started_at DESC
                    LIMIT 1
                    """,
                    (doc_id,)
                )
            
            if not job:
                return {
                    "error": f"No ingest job found for {'job_id=' + str(job_id) if job_id else 'doc_id=' + doc_id}",
                    "doc_id": doc_id,
                    "job_id": job_id,
                }
            
            # 获取各阶段详情
            stages = query_all(
                """
                SELECT stage, status, message, created_at::text
                FROM ingest_job_items
                WHERE job_id = %s
                ORDER BY created_at
                """,
                (job["job_id"],)
            )
            
            # 构建阶段状态映射
            stage_status = {}
            for stage in IngestStage:
                stage_status[stage.value] = {
                    "status": "pending",
                    "message": None,
                    "timestamp": None,
                }
            
            for item in stages:
                stage_status[item["stage"]] = {
                    "status": item["status"],
                    "message": item["message"],
                    "timestamp": item["created_at"],
                }
            
            # 生成建议修复动作
            suggested_action = None
            if job["status"] == IngestStatus.FAILED.value:
                if job["current_stage"] == IngestStage.EMBEDDED.value or \
                   stage_status[IngestStage.EMBEDDED.value]["status"] == IngestStatus.FAILED.value:
                    suggested_action = f"Use reembed_document(doc_id='{job['doc_id']}') to retry embedding generation"
                elif job["current_stage"] == IngestStage.CHUNKED.value:
                    suggested_action = f"Use rechunk_document(doc_id='{job['doc_id']}', force=True) to retry chunking"
                else:
                    suggested_action = f"Use import_pdf(file_path=..., force=True) to reimport from scratch"
            elif job["status"] == IngestStatus.RUNNING.value:
                suggested_action = "Job is still running. Wait for completion or check for stuck process."
            
            # 检查文档的实际状态
            doc_stats = None
            if job["doc_id"]:
                stats = query_one(
                    """
                    SELECT 
                        (SELECT COUNT(*) FROM chunks WHERE doc_id = %s) as chunk_count,
                        (SELECT COUNT(*) FROM chunk_embeddings ce 
                         JOIN chunks c ON ce.chunk_id = c.chunk_id 
                         WHERE c.doc_id = %s) as embedded_count
                    """,
                    (job["doc_id"], job["doc_id"])
                )
                if stats:
                    doc_stats = {
                        "chunk_count": stats["chunk_count"],
                        "embedded_count": stats["embedded_count"],
                        "missing_embeddings": stats["chunk_count"] - stats["embedded_count"],
                    }
                    
                    if doc_stats["missing_embeddings"] > 0 and job["status"] == IngestStatus.COMPLETED.value:
                        suggested_action = f"Use reembed_document(doc_id='{job['doc_id']}') to fill missing embeddings"
            
            return {
                "job_id": job["job_id"],
                "doc_id": job["doc_id"],
                "status": job["status"],
                "current_stage": job["current_stage"],
                "started_at": job["started_at"],
                "finished_at": job["finished_at"],
                "error": job["error"],
                "stages": stage_status,
                "document_stats": doc_stats,
                "suggested_action": suggested_action,
            }
            
        except Exception as e:
            return {
                "error": str(e),
                "doc_id": doc_id,
                "job_id": job_id,
            }
  • Imports and calls register_import_tools(mcp), which registers the ingest_status tool (along with import_pdf).
    from paperlib_mcp.tools.import_pdf import register_import_tools
    from paperlib_mcp.tools.search import register_search_tools
    from paperlib_mcp.tools.fetch import register_fetch_tools
    from paperlib_mcp.tools.writing import register_writing_tools
    
    # M2 GraphRAG 工具
    from paperlib_mcp.tools.graph_extract import register_graph_extract_tools
    from paperlib_mcp.tools.graph_canonicalize import register_graph_canonicalize_tools
    from paperlib_mcp.tools.graph_community import register_graph_community_tools
    from paperlib_mcp.tools.graph_summarize import register_graph_summarize_tools
    from paperlib_mcp.tools.graph_maintenance import register_graph_maintenance_tools
    
    # M3 Review 工具
    from paperlib_mcp.tools.review import register_review_tools
    
    # M4 Canonicalization & Grouping 工具
    from paperlib_mcp.tools.graph_relation_canonicalize import register_graph_relation_canonicalize_tools
    from paperlib_mcp.tools.graph_claim_grouping import register_graph_claim_grouping_tools
    from paperlib_mcp.tools.graph_v12 import register_graph_v12_tools
    
    register_health_tools(mcp)
    register_import_tools(mcp)
  • Enum definitions for IngestStatus and IngestStage used by the ingest_status tool to categorize job statuses and stages.
    class IngestStage(str, Enum):
        """导入阶段"""
        HASHED = "HASHED"       # 计算 SHA256
        UPLOADED = "UPLOADED"    # 上传到 MinIO
        EXTRACTED = "EXTRACTED"  # 提取文本
        CHUNKED = "CHUNKED"      # 分块
        EMBEDDED = "EMBEDDED"    # 生成 embedding
        COMMITTED = "COMMITTED"  # 提交完成
    
    
    class IngestStatus(str, Enum):
        """状态"""
        PENDING = "pending"
        RUNNING = "running"
        COMPLETED = "completed"
        FAILED = "failed"
  • The register_import_tools function that defines and registers both import_pdf and ingest_status tools using @mcp.tool() decorators.
    def register_import_tools(mcp: FastMCP) -> None:
        """注册 PDF 导入工具"""
    
        @mcp.tool()
        async def import_pdf(
            file_path: str,
            title: str | None = None,
            authors: str | None = None,
            year: int | None = None,
            force: bool = False,
        ) -> dict[str, Any]:
            """导入 PDF 文献到知识库"""
            return await import_pdf_run(
                file_path=file_path,
                title=title,
                authors=authors,
                year=year,
                force=force,
            )
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses that this is a read operation ('查看' - view) and describes what information is returned (status, progress, errors, suggested fixes). However, it doesn't mention important behavioral aspects like whether this requires authentication, has rate limits, or what happens when both doc_id and job_id are provided. The description adds value but leaves gaps in behavioral understanding.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and appropriately sized. It starts with a clear purpose statement, provides parameter semantics in a structured Args/Returns format, and every sentence adds value. No redundant information or unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that there's an output schema (though not shown), the description doesn't need to detail return values. It covers the tool's purpose, parameter usage, and return content at a high level. For a status-checking tool with 2 parameters and no annotations, this provides adequate context, though it could benefit from more behavioral details like authentication requirements or error handling.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It provides clear semantic explanations for both parameters: doc_id queries the latest job for a document, while job_id queries a specific job directly. This adds significant value beyond the bare schema, though it doesn't explain the relationship between the two parameters or what happens when both are provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: '查看指定文档或作业的导入状态,包括各阶段进度和错误信息' (View the import status of specified documents or jobs, including progress at each stage and error information). It specifies the verb ('查看' - view) and resource ('导入状态' - import status), but doesn't explicitly differentiate from sibling tools like 'graph_status' or 'health_check' that might also provide status information.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context through the parameter explanations: use doc_id to query the latest job for a document, or job_id to query a specific job. However, it doesn't provide explicit guidance on when to choose this tool versus alternatives like 'graph_status' or 'health_check', nor does it mention prerequisites or when-not-to-use scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paperlib-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server