Skip to main content
Glama

list_documents

Retrieve and filter imported academic documents from your literature library with sorting options and embedding status filters.

Instructions

列出所有已导入的文档

获取文献库中所有文档的摘要列表,支持排序和筛选。

Args: limit: 返回结果数量限制,默认 50 offset: 分页偏移量,默认 0 order_by: 排序字段,可选 "created_at"(默认)、"year"、"title" has_embeddings: 筛选条件,True=只显示有完整embedding的,False=只显示缺embedding的,None=显示全部

Returns: 文档列表,包含基本信息和 chunk/embedding 统计

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
limitNo
offsetNo
order_byNocreated_at
has_embeddingsNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The handler function that implements the logic for listing documents with optional pagination (limit/offset), sorting (order_by), and filtering by embedding status (has_embeddings). It constructs dynamic SQL queries to fetch document metadata and chunk statistics from the database.
    @mcp.tool()
    def list_documents(
        limit: int = 50,
        offset: int = 0,
        order_by: str = "created_at",
        has_embeddings: bool | None = None,
    ) -> dict[str, Any]:
        """列出所有已导入的文档
        
        获取文献库中所有文档的摘要列表,支持排序和筛选。
        
        Args:
            limit: 返回结果数量限制,默认 50
            offset: 分页偏移量,默认 0
            order_by: 排序字段,可选 "created_at"(默认)、"year"、"title"
            has_embeddings: 筛选条件,True=只显示有完整embedding的,False=只显示缺embedding的,None=显示全部
            
        Returns:
            文档列表,包含基本信息和 chunk/embedding 统计
        """
        try:
            # 验证 order_by 参数
            valid_order_by = {"created_at": "d.created_at", "year": "d.year", "title": "d.title"}
            order_column = valid_order_by.get(order_by, "d.created_at")
            
            # 构建基础查询
            base_query = """
                SELECT 
                    d.doc_id,
                    d.title,
                    d.authors,
                    d.year,
                    d.created_at::text,
                    COUNT(c.chunk_id) as chunk_count,
                    COUNT(ce.chunk_id) as embedded_count
                FROM documents d
                LEFT JOIN chunks c ON d.doc_id = c.doc_id
                LEFT JOIN chunk_embeddings ce ON c.chunk_id = ce.chunk_id
                GROUP BY d.doc_id
            """
            
            # 添加筛选条件
            if has_embeddings is True:
                # 只显示所有 chunk 都有 embedding 的文档
                base_query += " HAVING COUNT(c.chunk_id) > 0 AND COUNT(c.chunk_id) = COUNT(ce.chunk_id)"
            elif has_embeddings is False:
                # 只显示缺少 embedding 的文档
                base_query += " HAVING COUNT(c.chunk_id) > COUNT(ce.chunk_id)"
            
            # 添加排序(处理 NULL 值)
            if order_by == "year":
                base_query += f" ORDER BY {order_column} DESC NULLS LAST"
            elif order_by == "title":
                base_query += f" ORDER BY {order_column} ASC NULLS LAST"
            else:
                base_query += f" ORDER BY {order_column} DESC"
            
            # 添加分页
            base_query += " LIMIT %s OFFSET %s"
            
            docs = query_all(base_query, (limit, offset))
            
            # 获取总数(考虑筛选条件)
            if has_embeddings is True:
                total_query = """
                    SELECT COUNT(*) as count FROM (
                        SELECT d.doc_id
                        FROM documents d
                        LEFT JOIN chunks c ON d.doc_id = c.doc_id
                        LEFT JOIN chunk_embeddings ce ON c.chunk_id = ce.chunk_id
                        GROUP BY d.doc_id
                        HAVING COUNT(c.chunk_id) > 0 AND COUNT(c.chunk_id) = COUNT(ce.chunk_id)
                    ) sub
                """
            elif has_embeddings is False:
                total_query = """
                    SELECT COUNT(*) as count FROM (
                        SELECT d.doc_id
                        FROM documents d
                        LEFT JOIN chunks c ON d.doc_id = c.doc_id
                        LEFT JOIN chunk_embeddings ce ON c.chunk_id = ce.chunk_id
                        GROUP BY d.doc_id
                        HAVING COUNT(c.chunk_id) > COUNT(ce.chunk_id)
                    ) sub
                """
            else:
                total_query = "SELECT COUNT(*) as count FROM documents"
            
            total = query_one(total_query)
            
            return {
                "total": total["count"] if total else 0,
                "limit": limit,
                "offset": offset,
                "order_by": order_by,
                "has_embeddings_filter": has_embeddings,
                "documents": [
                    {
                        "doc_id": d["doc_id"],
                        "title": d["title"],
                        "authors": d["authors"],
                        "year": d["year"],
                        "created_at": d["created_at"],
                        "chunk_count": d["chunk_count"],
                        "embedded_count": d["embedded_count"],
                        "fully_embedded": d["chunk_count"] > 0 and d["chunk_count"] == d["embedded_count"],
                    }
                    for d in docs
                ],
            }
            
        except Exception as e:
            return {
                "error": str(e),
                "total": 0,
                "documents": [],
            }
  • The line where register_fetch_tools is called on the MCP instance, which in turn registers the list_documents tool via its @mcp.tool() decorator.
    register_fetch_tools(mcp)
  • The registration function that defines and registers multiple fetch tools, including list_documents, using @mcp.tool() decorators.
    def register_fetch_tools(mcp: FastMCP) -> None:
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions '支持排序和筛选' (supports sorting and filtering) which adds some behavioral context, but doesn't disclose critical details like whether this is a read-only operation, potential rate limits, authentication requirements, or what happens with large result sets beyond pagination. For a list operation with no annotations, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with purpose statement, capabilities overview, parameter explanations, and return value description. It's appropriately sized with no wasted sentences, though the bilingual presentation (Chinese with some English terms) could be slightly optimized for consistency.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema (implied by 'Has output schema: true'), the description doesn't need to detail return values. It covers the core functionality, all parameters, and basic capabilities. For a list operation with 4 parameters and no annotations, it's reasonably complete, though could benefit from more behavioral context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by explaining all 4 parameters in detail. It provides clear semantics for 'limit' (result count limit), 'offset' (pagination), 'order_by' (sorting fields with options), and 'has_embeddings' (filtering condition with three states). This adds significant value beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: '列出所有已导入的文档' (list all imported documents) and '获取文献库中所有文档的摘要列表' (get a summary list of all documents in the literature library). It specifies the resource (documents) and verb (list/get), though it doesn't explicitly differentiate from sibling tools like 'get_document' or 'search_*' tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention sibling tools like 'search_fts_only', 'search_hybrid', or 'search_vector_only' for filtered searches, or 'get_document' for detailed single-document retrieval. Usage context is implied but not explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paperlib-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server