Skip to main content
Glama
menyoung

paperqa-mcp-server

by menyoung

index_status

Check paper index health to diagnose query failures by showing indexed papers, errors, and unindexed documents.

Instructions

Check the health of the paper index.

Returns a summary of how many papers are indexed, how many have errors, and how many are unindexed. Use this to diagnose why paper_qa queries might be failing or timing out.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The tool registration and entry point for 'index_status'.
    @mcp.tool()
    async def index_status() -> str:
        """Check the health of the paper index.
    
        Returns a summary of how many papers are indexed, how many have
        errors, and how many are unindexed. Use this to diagnose why
        paper_qa queries might be failing or timing out.
        """
        status = _index_status()
        lines = [
            f"Index status: {status['message']}",
            f"  Indexed: {status['indexed']}",
            f"  Errors:  {status['errored']}",
            f"  Unindexed: {status['unindexed']}",
            f"  Total files: {status['total']}",
        ]
        return "\n".join(lines)
  • The helper function containing the actual logic for checking the index status.
    def _index_status(settings: Settings | None = None) -> dict:
        """Read the index manifest and compare against files in the paper directory.
    
        Returns a dict with keys: indexed, errored, unindexed, total, ready, message.
        """
        if settings is None:
            settings = _settings()
        index_name = settings.get_index_name()
        index_dir = pathlib.Path(settings.agent.index.index_directory) / index_name
        paper_dir = pathlib.Path(settings.agent.index.paper_directory)
        files_filter = settings.agent.index.files_filter
    
        # Discover files PaperQA would try to index (same filter as paperqa)
        total = 0
        if paper_dir.is_dir():
            total = sum(1 for f in paper_dir.rglob("*") if files_filter(f))
    
        # Read the manifest
        manifest_path = index_dir / "files.zip"
        manifest: dict[str, str] = {}
        manifest_error = False
        if manifest_path.exists():
            try:
                manifest = pickle.loads(zlib.decompress(manifest_path.read_bytes()))
            except Exception:
                manifest_error = True
    
        errored = sum(1 for v in manifest.values() if v == "ERROR")
        indexed = len(manifest) - errored
        unindexed = max(0, total - len(manifest))
    
        ready = unindexed <= _UNINDEXED_THRESHOLD and not manifest_error
        if manifest_error:
            message = (
                f"Index manifest is corrupt ({total} files on disk)."
                " Rebuild the index from the terminal"
                " — see the paperqa-mcp-server README, step 6."
            )
        else:
            message = f"{indexed}/{total} papers indexed"
            if errored:
                message += f", {errored} errors"
            if unindexed:
                message += f", {unindexed} unindexed"
            if ready:
                message += ". Ready to query."
            else:
                message += (
                    ". Queries will fail or time out."
                    " Please finish building the index from the terminal"
                    " — see the paperqa-mcp-server README, step 6."
                )
    
        return {
            "indexed": indexed,
            "errored": errored,
            "unindexed": unindexed,
            "total": total,
            "ready": ready,
            "message": message,
        }
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It effectively discloses behavioral traits: it's a read-only diagnostic tool (implied by 'Check' and 'Returns'), and it specifies what information is returned (summary of indexed papers, errors, unindexed). However, it doesn't mention potential rate limits or authentication needs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose, followed by usage guidance. Both sentences earn their place by providing essential information without redundancy, making it highly efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (0 parameters, no annotations, but has an output schema), the description is complete. It explains what the tool does, when to use it, and what it returns, which is sufficient since the output schema will handle return value details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has 0 parameters with 100% schema coverage, so the baseline is 4. The description appropriately doesn't add parameter details, as none are needed, and instead focuses on the tool's purpose and output.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Check the health', 'Returns a summary') and resources ('paper index'), distinguishing it from the sibling 'paper_qa' tool by focusing on diagnostic status rather than querying content.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It explicitly states when to use this tool ('to diagnose why paper_qa queries might be failing or timing out'), providing clear context and distinguishing it from the alternative sibling tool 'paper_qa'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/menyoung/paperqa-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server