paper-mcp

Name: paper-mcp
Author: MCPServings

by io.github.MCPServings

Server Details

Search arXiv/Semantic Scholar/OpenAlex + medical evidence (PubMed/Europe PMC) + LaTeX/PDF tools.

Status: Healthy
Last Tested: 2026-07-21 13:59
Transport: Streamable HTTP
URL
Repository: MCPServings/paper-mcp
GitHub Stars: 0

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

B3.2/5.0

Tool DescriptionsB

Average 3.4/5 across 41 of 41 tools scored. Lowest: 2.3/5.

Server CoherenceA

Disambiguation3/5

Multiple tools overlap in functionality (e.g., forward citation search via get_paper_citations and get_openalex_citations; bibliography via get_paper_references and get_openalex_references). While descriptions clarify the source, an agent may mis-select without careful reading.

Naming Consistency4/5

Most tools follow a verb_noun pattern (e.g., search_papers, get_author, lint_latex). Some inconsistency between 'list_' and 'get_' prefixes (list_categories vs get_paper), but overall naming is clear and predictable.

Tool Count3/5

41 tools is high, covering many sub-domains (papers, authors, citations, OCR, datasets, medical search, etc.). While each tool appears justified, the count feels somewhat bloated for a single server, risking agent confusion.

Completeness5/5

The server offers a comprehensive set of operations for academic paper research: search, retrieval, full-text reading, citation/reference graphs, recommendation, author profiles, PDF extraction, formula/table recognition, and even medical evidence search. No critical gaps are apparent.

Available Tools

41 tools

autocomplete_papersBInspect

Semantic Scholar: autocomplete paper titles for a partial query (fast type-ahead).

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It mentions 'fast type-ahead' but does not disclose behavioral traits such as rate limits, result limits, or whether it is read-only. This leaves significant gaps for an agent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, very concise and front-loaded with the purpose. However, it is somewhat terse and omits important details, preventing a perfect score.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no output schema, no annotations), the description covers the basic purpose but lacks details about response format, limits, or error conditions. It is minimally viable but not fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter 'query' has no description in the schema (0% coverage). The description only says 'partial query (fast type-ahead)', adding minimal meaning beyond the schema. It does not explain input format or any constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it autocompletes paper titles for a partial query, specifying it's a fast type-ahead. This distinguishes it from sibling tools like search_papers which likely return full results.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for type-ahead scenarios, but it does not explicitly state when to use this tool vs. alternatives like search_papers or match_paper_title. More guidance on context would improve score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

extract_pdfAInspect

Extract a PDF to clean Markdown/LaTeX text via MinerU (great for papers behind no open-access full text — give the user's PDF and get readable text back). Provide pdf_url (downloaded server-side, SSRF-guarded) OR pdf_base64. formula/table toggle math/table reconstruction. Returns {task_id, status, cached, content, chars}: a recently-seen (cached) or small PDF comes back with content in one call; a fresh PDF (MinerU is GPU-heavy, minutes) returns status='running' + a task_id — then call extract_pdf_result(task_id) to fetch the text.

ParametersJSON Schema

Name	Required	Description	Default
`table`	No
`formula`	No
`pdf_url`	No
`pdf_base64`	No

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses critical behavioral traits: server-side download with SSRF guarding, caching for recent/small PDFs, async processing for large/fresh PDFs, and GPU-heavy nature. No annotations exist, so the description carries full burden and succeeds.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a coherent single paragraph that efficiently packs information. It is slightly long but well-structured with a logical flow. Could be split into multiple sentences for readability, but no waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (async, multiple input modes, caching, return structure), the description covers all necessary aspects. No output schema exists, but it describes the return values (task_id, status, cached, content, chars) and the follow-up tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by explaining all four parameters: pdf_url and pdf_base64 as input options, and formula/table as toggles for reconstruction. It adds value beyond the schema, e.g., 'SSRF-guarded' and 'GPU-heavy'.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it extracts a PDF to clean Markdown/LaTeX text via MinerU, specifically for papers without open-access full text. It distinguishes itself from similar siblings like read_paper by highlighting GPU-heavy processing and asynchronous behavior.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It explains when to use (for papers with no open-access full text) and mentions the asynchronous workflow requiring extract_pdf_result for large PDFs. It does not explicitly state when not to use or list alternatives, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

extract_pdf_resultAInspect

Fetch the result of an extract_pdf job by task_id. Returns {task_id, status, content, chars}: content is the extracted text once status='done'; while still 'running' content is null — call again shortly. Results expire server-side, so fetch reasonably soon.

ParametersJSON Schema

Name	Required	Description	Default
`task_id`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that content is null while status is 'running' and that results expire server-side, which are key behavioral traits. However, does not cover error cases or what happens with invalid task_id.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded purpose, no redundant information. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequately describes the expected response structure and polling behavior. Could specify the format of task_id (e.g., UUID) but not essential. Missing output schema but description covers key points.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema coverage, the description explains task_id as coming from an extract_pdf job, which is sufficient for a single parameter. No additional details needed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Fetch the result of an extract_pdf job by task_id', which is a specific verb+resource pair. It distinguishes itself from 'extract_pdf' (the submission tool) by focusing on result retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies usage after calling extract_pdf but does not explicitly state when to use vs alternatives or provide usage context. No mention of prerequisites or when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_authorBInspect

Semantic Scholar: a single author's profile by id.

ParametersJSON Schema

Name	Required	Description	Default
`author_id`	Yes

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided. Description does not disclose what the profile includes, authentication needs, rate limits, or any constraints beyond fetching a profile by ID.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise (7 words) and front-loaded with 'Semantic Scholar'. However, the brevity sacrifices detail needed for clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and no output schema, the description is minimally adequate but lacks specifics on author_id format and return value.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage for the only parameter. The description adds 'by id' but does not explain the format, example, or constraints for author_id.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves a single author's profile by ID from Semantic Scholar, with a specific verb and resource. It distinguishes from siblings like get_author_papers and search_authors.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives (e.g., get_author_papers, search_authors). No context on prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_author_papersCInspect

Semantic Scholar: all papers by a given author id, newest first.

ParametersJSON Schema

Name	Required	Description	Default
`start`	No
`author_id`	Yes
`max_results`	No

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must carry the full behavioral burden. It only reveals sorting order (newest first) but omits pagination behavior, error responses, rate limits, or data freshness. For a tool with no annotations, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with the core purpose. No extraneous content or repetition. It efficiently communicates the tool's main function.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of annotations and output schema, the description should elaborate on return value structure, field contents, and error conditions. It only states 'all papers' with no indication of what fields are returned, leaving the agent underinformed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description does not explain any of the three parameters (author_id, start, max_results) beyond the redundant 'author id'. The agent gains no additional insight into parameter meaning, defaults, or constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool fetches all papers by a given author ID, sorted newest first. This is a specific verb-resource pair with distinct functionality. While siblings like search_papers or get_paper are not directly differentiated, the purpose is unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives like search_by_author or get_paper_authors. The context implies use cases for known author ID, but no when-not-to-use or prerequisites are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_authors_batchBInspect

Semantic Scholar: fetch many authors at once by id.

ParametersJSON Schema

Name	Required	Description	Default
`ids`	Yes

Tool Definition Quality

B3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It does not disclose behavioral traits such as rate limits, result size constraints, error handling, or read-only nature beyond the implied 'fetch'.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is one short sentence, front-loaded with the domain identifier. It is efficient with no wasted words, but could be expanded slightly without losing conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and one parameter, the description is too sparse. It does not explain the output format, max number of IDs allowed, or error handling, leaving significant gaps for a batch operation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has one parameter (ids) with 0% description coverage. The description does not mention the parameter, its format, constraints, or examples, adding no value beyond the schema's bare structure.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool fetches many authors at once by ID from Semantic Scholar. The verb 'fetch' and resource 'authors' are specific, and the 'batch' aspect distinguishes it from the singular 'get_author' sibling.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for retrieving multiple authors by ID, but lacks explicit guidance on when to use this tool versus alternatives like get_author. No exclusions or context provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_dataset_diffsBInspect

Semantic Scholar Datasets: incremental diff (added/updated/deleted) for a dataset between two releases. Needs the key.

ParametersJSON Schema

Name	Required	Default
`end_release`	No	latest
`dataset_name`	Yes
`start_release`	Yes

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description is the sole source of behavioral info. It indicates a read-only diff operation (not destructive), which is helpful. However, it lacks details about response format, pagination, and what the 'key' refers to (authentication vs field).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, compact sentence that conveys the core purpose and a key constraint. No unnecessary words, front-loaded with the function.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 3 parameters, no output schema, no annotations, and no parameter descriptions, the description is too sparse. It fails to explain key concepts like 'release', 'diff' format, or what the 'key' is. The agent lacks sufficient context to use this tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 0% description coverage, so the description must compensate. It implies start_release and end_release as 'between two releases' but does not explain dataset_name or the optional end_release default. The 'key' phrase adds confusion as it is not a parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides an incremental diff (added/updated/deleted) for a dataset between two releases. This distinguishes it from siblings like get_dataset_release (full dataset) and get_dataset_download_links. However, it does not explicitly mention the parameter dataset_name and the term 'key' is ambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description only mentions 'Needs the key' as a prerequisite, but gives no guidance on when to use this tool versus other dataset-related siblings (e.g., get_dataset_release, list_dataset_releases). No alternatives or exclusions are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_dataset_download_linksAInspect

Semantic Scholar Datasets: get download links (presigned URLs) for one dataset in a release. Needs the API key.

ParametersJSON Schema

Name	Required	Description	Default
`release_id`	No		latest
`dataset_name`	Yes

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description adds minimal behavioral context: it returns presigned URLs and requires an API key. It does not disclose potential expiration of URLs or further details, but the return type is hinted. For a simple read tool, this is adequate but not rich.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single concise sentence packs the essential information: what the tool does and a prerequisite. No redundant words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of output schema and low parameter documentation, the description is incomplete. It does not specify the return format, pagination, or any limitations. However, for a straightforward download-link retriever, it provides the core context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, yet the description only implicitly refers to 'one dataset' (dataset_name) and does not explain release_id or its default value. The description does not compensate for the lack of schema documentation, leaving parameter meanings unclear.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool gets download links (presigned URLs) for one dataset in a release. The verb 'get' and resource 'download links' are specific, and it distinguishes itself from sibling tools like get_dataset_diffs and get_dataset_release.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions the need for an API key, which is a usage prerequisite. It implicitly indicates when to use (to obtain download links for a dataset) but does not explicitly exclude cases or compare with siblings. The differentiation is clear from the description alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_dataset_releaseBInspect

Semantic Scholar Datasets: which datasets a release contains (papers, abstracts, citations, embeddings, s2orc, tldrs…). release_id defaults to 'latest'.

ParametersJSON Schema

Name	Required	Description	Default
`release_id`	No		latest

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It only states what the tool returns (list of dataset types) and a default parameter value. It does not disclose behavioral traits such as read-only nature, error handling for invalid release_id, rate limits, or any side effects. For a tool with no annotations, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that front-loads the tool's purpose and includes key context (the default parameter value). Every word earns its place with no redundancy or unnecessary detail.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one optional parameter, no output schema or annotations), the description is minimally adequate. It tells what the tool does and the default parameter. However, it lacks details about the return format, error scenarios, and what exactly 'datasets' includes. For a complex sibling set, more completeness would help.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has no description for the parameter release_id (0% coverage). The description adds only that it defaults to 'latest', which is already in the schema default field. It does not explain the format, possible values, or criteria for valid release IDs. Since schema coverage is low, the description should compensate, but it barely adds value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'get', the resource 'dataset release', and what it returns ('which datasets a release contains') with examples like papers, abstracts, citations. It distinguishes from sibling list_dataset_releases by focusing on contents of a single release. However, it could be more precise about the output format.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Usage is implied: use when you need to know what datasets are in a specific release. It mentions the default value for release_id but provides no explicit guidance on when to use this tool versus alternatives like list_dataset_releases. No when-not-to-use or exclusion criteria are given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_openalex_citationsAInspect

OpenAlex: papers that CITE this work (forward citation graph), most-cited first.

ParametersJSON Schema

Name	Required	Description	Default
`start`	No
`work_id`	Yes
`max_results`	No

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are absent, so the description must cover behavioral traits. It discloses the citation direction and sort order but lacks details on pagination behavior, rate limits, authentication needs, or result structure. This is adequate but not comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that conveys the core purpose without extraneous information. It is front-loaded and efficiently uses simple words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the moderate complexity (3 parameters, no output schema, no annotations), the description is incomplete. It does not explain the output format, how pagination works via start and max_results, or any OpenAlex-specific details (e.g., API limits, fields returned). Sibling tools suggest more context is available.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description adds no meaning to the three parameters (start, work_id, max_results). The required work_id is not explained, and the optional parameters lack context beyond their names. Schema or description should clarify the format of work_id.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that this tool returns forward citations (papers that cite the given work) sorted by most-cited first. It uses specific language ('papers that CITE this work') and distinguishes itself from siblings like get_openalex_references (backward citations).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for obtaining citing papers but does not explicitly state when to use vs. alternatives like get_openalex_references or get_paper_citations. No exclusions or usage conditions are provided, leaving the agent to infer context from the tool name and siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_openalex_referencesCInspect

OpenAlex: the works this one REFERENCES (its bibliography).

ParametersJSON Schema

Name	Required	Description	Default
`work_id`	Yes
`max_results`	No

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden but is too brief. It does not disclose authentication needs, rate limits, error handling, or what happens if work_id is invalid. Only the basic purpose is stated.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely short (6 words), which is concise but omits critical details. It could include parameter info without being verbose; currently it is under-specified.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of output schema and annotations, the description is incomplete. It does not explain the return value, pagination, or how max_results is used, leaving the agent without sufficient context for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds no meaning beyond the input schema. Schema coverage is 0%; the parameters work_id and max_results are not explained. The agent gets no clue about the work_id format or how max_results affects results.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves references (bibliography) of a work, distinguishing it from the sibling tool get_openalex_citations which retrieves citing works. However, it does not explicitly name the verb 'get' or 'list', and the context is implicit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like get_openalex_citations or get_paper_references. The agent must infer usage from the description alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_openalex_trendsAInspect

OpenAlex: publication-trend analytics for a query — counts grouped by year (default), or by 'institutions.id', 'authorships.author.id', 'open_access.is_oa', 'type', 'language'. Returns aggregate counts only (cheap, no rows).

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes
`group_by`	No		publication_year

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses that the tool is cheap and returns only aggregate counts, but does not mention authorization, rate limits, or potential side effects. For a read-only query tool, this is moderate transparency; it lacks details on data freshness or pagination.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loaded with the tool's purpose and key behavior (aggregate counts, cheap). Every word adds value; no redundancy. It is well-structured and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has only 2 parameters, no output schema, and no nested objects, the description covers all essential aspects: input (query, group_by), output (aggregate counts per group), and behavior (cheap). It is complete enough for an agent to use correctly, though it could mention supported grouping values more formally.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so the description provides essential meaning not in the schema. It explains that 'query' is the search query and lists the possible values for 'group_by' (e.g., 'institutions.id', 'authorships.author.id'), including the default 'publication_year'. This adds significant value beyond the bare schema titles.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides 'publication-trend analytics for a query' with counts grouped by specified fields. It lists the default grouping (year) and alternative group_by options, and distinguishes itself from siblings by noting it returns aggregate counts only (cheap, no rows). This contrasts with other tools that retrieve individual works or detailed data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains the tool's purpose and output type (aggregate counts), but does not explicitly specify when to use this tool versus alternatives like search_openalex_works. However, the context from sibling names and the phrase 'Returns aggregate counts only (cheap, no rows)' provides implicit guidance that this is for trends, not detailed records.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_openalex_workAInspect

OpenAlex: fetch one work's full record (316M-work, all-field corpus). id accepts OpenAlex Wxxxx, a DOI, or an arXiv id.

ParametersJSON Schema

Name	Required	Description	Default
`work_id`	Yes

Tool Definition Quality

A3.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must fully disclose behavioral traits. It does not mention any side effects, authentication requirements, rate limits, or that it is a read-only operation. The description only covers input format, not what happens on execution.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with no redundant information. It front-loads the purpose and immediately clarifies ID formats, making it efficient for an agent to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description mentions the scale (316M-work corpus) but does not describe what the returned record contains or any limitations. With no output schema, the description could be more complete by hinting at the response structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, but the description adds essential meaning to the 'work_id' parameter by specifying accepted formats (OpenAlex Wxxxx, DOI, arXiv ID). This compensates well for the schema gap, though it could provide more format detail.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool fetches one work's full record from OpenAlex, specifying the accepted ID formats (OpenAlex Wxxxx, DOI, arXiv ID). This is specific and distinguishes it from sibling tools like search_openalex_works or get_openalex_citations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for fetching a single work record but does not provide explicit guidance on when to use this tool versus alternatives (e.g., for batch fetching or citations). No when-not-to-use or prerequisite information is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_paperBInspect

Fetch one paper by id, with full abstract and PDF link.

ParametersJSON Schema

Name	Required	Description	Default
`source`	No		arxiv
`paper_id`	Yes

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the burden. It mentions output includes abstract and PDF link, which is helpful. However, it does not disclose error behavior (e.g., if paper_id not found) or any side effects. For a simple fetch, this is acceptable but minimal.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence of 10 words, very concise. However, it omits important details about parameters, which slightly reduces efficiency. It is front-loaded but under-specified.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple 2-parameter tool with no output schema, the description is mostly adequate. It would benefit from explaining the source parameter and possibly the return format. It covers the primary use case but lacks completeness regarding parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has two parameters (source, paper_id) with 0% description coverage. The description only mentions 'by id' but does not explain the 'source' parameter or its default value. This leaves ambiguity for the agent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it fetches one paper by id with full abstract and PDF link. However, it does not differentiate from the sibling 'read_paper' which may have similar functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

There is no guidance on when to use this tool versus alternatives like 'read_paper' or 'search_papers'. No when-to-use or when-not-to-use information is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_paper_authorsCInspect

Semantic Scholar: the authors of a paper (with h-index, paper/citation counts).

ParametersJSON Schema

Name	Required	Description	Default
`start`	No
`paper_id`	Yes
`max_results`	No

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, and description does not disclose behavioral traits such as pagination behavior, error handling, authentication requirements, or rate limits. Only minimal information about returned data.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with clear front-loading of source (Semantic Scholar) and resource. Efficient but could be expanded to improve other dimensions.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given high complexity (3 params, 33+ siblings, no output schema, no annotations), the description is too brief. It omits pagination details, return structure, and comparison to similar tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description fails to explain the usage of 'start' and 'max_results' parameters. The description only mentions author fields but does not clarify parameter purposes beyond the schema itself.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description states it retrieves authors of a paper with h-index, paper, and citation counts, which clearly defines the resource and key fields. However, it could be more explicit about the verb (e.g., 'get' or 'retrieve').

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus siblings like get_author (single author) or search_authors. No mention of prerequisites, filtering, or alternative use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_paper_citationsBInspect

Semantic Scholar: papers that CITE this one (forward citation graph). id accepts S2 id / DOI: / ARXIV: / CorpusId:.

ParametersJSON Schema

Name	Required	Description	Default
`start`	No
`paper_id`	Yes
`max_results`	No

Tool Definition Quality

B3.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must cover behavioral traits. It only mentions ID formats and basic purpose, missing details on authentication, rate limits, or behavior on invalid IDs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, efficient and front-loaded. No wasted words, but could benefit from slightly more detail without being verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, no annotations, and only 1 of 3 parameters described. The tool is incomplete for an agent to understand return format or pagination behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only paper_id gets some explanation (accepted formats). start and max_results are not described; since schema coverage is 0%, the description fails to add meaning for these parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves papers that cite the given paper (forward citation graph) and specifies accepted ID formats. It distinguishes from siblings like get_paper_references.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for forward citations and lists ID formats, but does not explicitly state when NOT to use or compare with alternatives like get_paper_references.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_paper_referencesCInspect

Semantic Scholar: papers this one REFERENCES (its bibliography). id accepts S2 id / DOI: / ARXIV: / CorpusId:.

ParametersJSON Schema

Name	Required	Description	Default
`start`	No
`paper_id`	Yes
`max_results`	No

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are present, so the description must disclose behavioral traits. It only mentions the source (Semantic Scholar) and accepted ID formats. It does not state that the operation is read-only, any rate limits, required permissions, or what happens if the ID is invalid. The minimal disclosure is insufficient given the lack of annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is one concise sentence that front-loads the core purpose. No redundant words. However, it could be structured to list parameters or usage scenarios, but it remains efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 3 parameters, no output schema, and many sibling tools, the description is incomplete. It lacks information on pagination, return structure, or filtering. The agent would need to infer too much to use this tool correctly in complex scenarios.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It only adds meaning for paper_id (acceptable formats). Parameters start and max_results are not explained at all. An agent would not understand pagination or the effect of these parameters without additional context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves papers referenced by a given paper (its bibliography). It specifies accepted identifier formats, which adds specificity. However, it does not explicitly distinguish itself from sibling tools like get_paper_citations or get_openalex_references, but the mention of 'references' is sufficient to imply the direction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No usage guidance is provided. The description does not indicate when to use this tool over alternatives (e.g., get_paper_citations for citations, or get_openalex_references for OpenAlex data). An agent would lack context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_papers_batchAInspect

Semantic Scholar: fetch many papers at once by id (S2/DOI:/ARXIV:/CorpusId:), up to ~500 per call.

ParametersJSON Schema

Name	Required	Description	Default
`ids`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description covers important behavioral traits: the batch limit (~500) and accepted ID formats. However, it does not disclose behavior on invalid IDs, partial results, or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One sentence with no wasted words. Front-loaded with the service name and key action ('fetch many papers at once by id'). Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema or annotations, the description covers the ID formats and batch limit adequately. Lacks details on output format or error behavior, which would be helpful for a batch operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage, but the description adds significant meaning: it specifies the exact ID formats (S2, DOI, ARXIV, CorpusId) and the batch limit, which are not present in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool fetches many papers at once by ID, specifying supported ID formats (S2, DOI, ARXIV, CorpusId) and a batch limit of ~500. This distinguishes it from siblings like get_paper (single) and search_papers.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use when fetching multiple papers by ID, but does not explicitly state when to use this tool versus alternatives like get_paper or search_papers. No exclusion criteria or alternative recommendations are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

lint_latexAInspect

Lint a LaTeX snippet: report errors and return an auto-fixed version. Input code (the LaTeX source). Returns {errors, fixed_code, summary_en, summary_zh, elapsed_ms}.

ParametersJSON Schema

Name	Required	Description	Default
`code`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, but the description discloses the tool's outputs (errors, fixed_code, summaries, elapsed_ms) and implies it is non-destructive. It does not mention authentication or rate limits, but the behavioral context is clear for a simple linting operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences: first explains purpose and output summary, second details input and return structure. No unnecessary words, well-organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and no output schema, the description covers the essential purpose, input, and outputs. It could mention typical error types or limitations, but is generally complete enough.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter 'code' has 0% schema coverage. The description adds 'the LaTeX source' which clarifies its purpose beyond the schema's generic 'Code'. This adds meaningful context, though format constraints could be detailed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool lints LaTeX snippets, reports errors, and returns an auto-fixed version. This is specific and distinct from sibling tools (no other linting tools are present).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for LaTeX linting, and given the sibling tools are unrelated (search, paper, author tools), it effectively distinguishes itself. However, no explicit when-not or alternative suggestions are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_categoriesCInspect

List common subject category codes for filtering/recent.

ParametersJSON Schema

Name	Required	Description	Default
`source`	No		arxiv

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It only states the action without disclosing any behavioral traits like read-only nature, authentication needs, or side effects, which is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, concise but lacking structure and clarity; it uses informal phrasing ('for filtering/recent') that could be more precise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity and lack of output schema, the description covers the basic purpose but omits details on output format, parameter behavior, and potential edge cases, leaving gaps for an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, and the description does not mention the 'source' parameter or explain its meaning beyond the schema default, adding no value over the structured fields.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists subject category codes for filtering or recent use, with a specific verb and resource. However, it does not differentiate from siblings, as no other sibling tool lists categories.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies usage context ('for filtering/recent') but provides no explicit guidance on when to use this tool vs alternatives, nor any when-not-to-use conditions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_dataset_releasesAInspect

Semantic Scholar Datasets: list all available release ids (dated snapshots of the full corpus).

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A3.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must carry full behavioral disclosure. It only states the basic operation, without mentioning any behavioral traits like pagination, rate limits, authentication, or what a release id entails.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence that is concise and front-loaded with 'Semantic Scholar Datasets:' for context. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a parameterless list tool with no output schema, the description is complete enough. It conveys the purpose clearly, though it could optionally indicate the format or ordering of release ids.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has 0 parameters, so the schema coverage is 100%. Per guidelines, baseline is 4. The description adds no parameter info, but none is needed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists all available release ids (dated snapshots of the full corpus), using a specific verb and resource. It is easily distinguishable from sibling tools, which focus on papers, authors, searches, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives, such as get_dataset_release or other list tools. There is no mention of prerequisites or context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_ocr_modelsAInspect

List the OCR models available for recognize_formula / recognize_table.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A3.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are present; the description does not disclose behavioral traits such as output format, rate limits, or side effects, which is minimal for a read-only listing tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single, non-redundant sentence that conveys the core purpose efficiently, with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (no params, no output schema), the description is adequate but could be improved by specifying what the list contains (e.g., model IDs, names).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With zero parameters and 100% schema coverage, the description adds no param-specific info but baseline is 4 as per guidelines.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists OCR models and specifies they are for use with recognize_formula and recognize_table, distinguishing it from sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context (before recognition tools) but does not explicitly state when to use or provide alternatives, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_openalex_topicsBInspect

OpenAlex: search the topic taxonomy (~4500 topics) to find the right subject term for filtering or recent-work queries.

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes
`max_results`	No

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full responsibility for behavioral disclosure. The description only states the purpose and does not mention any behavioral traits such as read-only nature, pagination, rate limits, or output format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence that efficiently conveys the core functionality. While concise, it could benefit from additional context without being verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (2 parameters, no output schema), the description provides minimal but adequate context for an AI agent to infer basic usage. However, it lacks details on parameter format and behavior, which may lead to suboptimal invocations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description partially compensates by indicating the query parameter is for searching topic names. However, it does not describe the max_results parameter or provide details on expected input formats.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches the OpenAlex topic taxonomy (~4500 topics) to find subject terms, which is a specific verb and resource. However, it does not explicitly differentiate from sibling search tools like search_authors or search_openalex_works.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for filtering or recent-work queries, providing context for when to use this tool. However, it lacks explicit guidance on when not to use it or comparisons to alternatives among the many sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_paper_sourcesAInspect

List available paper corpora.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the burden. 'List' implies a non-destructive read operation, which is transparent. No hidden behaviors or contradictions are present.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, zero wasted words. Perfectly concise for a simple tool with no parameters.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description does not explain what a 'paper corpus' is or what the output format looks like. For a simple list tool, additional context could be helpful but not critical.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist, so baseline is 4. The description adds no parameter info, but schema coverage is 100%, so no gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb 'list' and resource 'available paper corpora', clearly indicating what the tool does. It distinguishes from sibling tools that list other entities like categories or dataset releases.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

While no explicit when-to-use or when-not-to-use guidance is given, the name and description clearly indicate its purpose, and among siblings it is unique enough to infer when to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_recentBInspect

List the latest papers in a subject category, newest first.

ParametersJSON Schema

Name	Required	Default
`start`	No
`source`	No	arxiv
`category`	Yes
`max_results`	No

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It only discloses that results are 'newest first'. It does not mention pagination, default source, or what fields are returned. Lacks essential behavioral detail for an agent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, very concise. However, it is slightly under-specified; a bit more detail on parameters could be added without losing conciseness. Still, it follows the principle of being front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 4 parameters, no output schema, and no annotations, the description is incomplete. It fails to explain parameters, return format, or behavior like pagination and default source. An agent would have insufficient context to use it reliably.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so description must compensate. However, it only implies the 'category' parameter via the phrase 'subject category', and does not explain 'start', 'source', or 'max_results'. Adds almost no meaning beyond the schema names.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'List', the resource 'papers', and the scope 'latest', 'newest first', and by 'subject category'. This differentiates it from sibling tools like search_papers.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for listing recent papers in a category, but provides no explicit guidance on when to use this tool versus alternatives like search_papers or list_categories. No when-not-to-use or alternative mentions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

match_paper_titleAInspect

Semantic Scholar: find the single paper whose title best matches the given text (exact-match lookup).

ParametersJSON Schema

Name	Required	Description	Default
`title`	Yes

Tool Definition Quality

A3.5/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must carry the full behavioral burden. However, it contains an internal contradiction: 'best matches the given text' suggests fuzzy matching while 'exact-match lookup' suggests exact string matching. This ambiguity undermines transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with no wasted words. It front-loads the key purpose and constraints.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and no annotations, the description should provide more behavioral and return-value context. It states the tool returns a single paper but does not describe the output format or fields, making it incomplete for an agent to understand results.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must explain the parameter. It only says 'given text', which barely adds meaning beyond the parameter name 'title'. It does not specify case sensitivity, formatting, or whether partial matches are allowed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool finds a single paper whose title best matches given text via exact-match lookup. It uses a specific verb ('find') and resource ('paper title match'), and distinguishes from sibling tools like search_papers by emphasizing 'single paper' and 'exact-match'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when needing an exact or best match to a title text. It contrasts with search siblings by specifying 'single paper', but does not explicitly exclude alternatives or mention when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

read_paperAInspect

Read a paper's full text. format='markdown' (default, body with formulas as $LaTeX$), 'html' (raw LaTeXML HTML), or 'latex' (the original LaTeX manuscript from the e-print source). arXiv only; id like 2401.01234.

ParametersJSON Schema

Name	Required	Default
`format`	No	markdown
`source`	No	arxiv
`paper_id`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must carry the burden of behavioral transparency. It discloses the available formats and the arXiv-only constraint, but does not mention potential errors, rate limits, or size limitations. As a read operation, safety is implied but not stated. The description is adequate but not thorough.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences: the first covers the core function and format options, the second adds the source constraint and an example. Every word serves a purpose, and the most important information is front-loaded. No redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of three parameters and no output schema, the description covers the format options, the paper_id format, and the source limitation. It is mostly complete but lacks details about the return structure or error behavior. It sufficiently addresses the main user needs for a tool that reads paper text.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 0% description coverage, so the description must compensate. It explains the 'format' parameter in detail (markdown with $LaTeX$ formulas, raw HTML, original LaTeX), and gives an example for 'paper_id'. The 'source' parameter is implicitly described as arXiv-only, but not explicitly explained in the parameter context. Overall, it adds significant meaning beyond the bare schema, especially for format options.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's verb ('Read'), resource ('paper's full text'), and specifics such as available formats (markdown, html, latex) and the source (arXiv only). It distinguishes itself from sibling tools like 'get_paper' by focusing on full text retrieval rather than metadata. The example ID clarifies the required format.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives an example of the paper_id format and limits usage to arXiv, implying when to use the tool. However, it does not explicitly contrast with alternatives like 'get_paper' or 'search_papers', nor does it state when not to use it. Usage is implied but lacks explicit guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recognize_formulaAInspect

Recognize a math formula from an image and return LaTeX. Provide image_url (downloaded server-side) OR image_base64. model: deepseek-ocr (default), paddleocr-vl, or texify. Returns {latex, model, elapsed_ms}.

ParametersJSON Schema

Name	Required	Default
`model`	No	deepseek-ocr
`image_url`	No
`image_base64`	No

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses the return format including elapsed_ms and that image_url is downloaded server-side. Lacking annotations, the description carries the burden but omits details like supported image formats, size limits, or error behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with three sentences, front-loading the purpose and then covering input and output details without unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description covers return format and input options. It implies one of image_url or image_base64 must be provided, but does not clarify that both are optional in schema, nor does it discuss error conditions. Still, it is fairly complete for a simple tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema coverage, the description adds meaning by explaining the model parameter with a list of options and default, and clarifies the OR relationship between image_url and image_base64, which is not evident from the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the verb ('Recognize'), the resource ('math formula from an image'), and the output ('return LaTeX'). It distinguishes itself from sibling tools like recognize_table by specifying 'math formula' and listing the return format as LaTeX.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides instructions on how to provide input (image_url or image_base64) and lists model options. However, it does not explicitly differentiate from sibling tools like recognize_table or indicate when not to use this tool, leaving room for confusion.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recognize_tableAInspect

Recognize a table from an image and return LaTeX tabular code. Provide image_url OR image_base64. model: deepseek-ocr (default), paddleocr-vl, or texify. Returns {latex, model, elapsed_ms}.

ParametersJSON Schema

Name	Required	Default
`model`	No	deepseek-ocr
`image_url`	No
`image_base64`	No

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses the output format {latex, model, elapsed_ms} and the input constraints (one of image_url or image_base64). However, it does not mention behavior when both inputs are provided, error conditions, or any side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with only two sentences. The first sentence states the core function and output, while the second covers inputs and model options. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (3 params, no output schema, no annotations), the description covers the essential aspects: function, input methods, model choices, and output format. It could elaborate on error handling or model differences, but is largely sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, but the description explains the 'model' parameter with valid values and emphasizes the exclusive OR requirement for image_url and image_base64. This adds significant meaning beyond the schema's type/default fields.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Recognize a table from an image' and the output 'return LaTeX tabular code'. This verb+resource+output combination is specific and distinguishes it from siblings like 'recognize_formula'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for table recognition and specifies input options (image_url or image_base64) and model choices. However, it does not explicitly state when to use this tool over siblings like 'recognize_formula' or provide exclusions for specific scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recommend_papers_for_paperAInspect

Semantic Scholar: recommend papers similar to one paper. pool='recent' (last open corpus) or 'all-cs' (all of CS). If the 'recent' pool yields nothing (common for older papers), it automatically retries the 'all-cs' pool.

ParametersJSON Schema

Name	Required	Default
`pool`	No	recent
`paper_id`	Yes
`max_results`	No

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses the retry logic when 'recent' pool yields no results, but omits details on rate limits, authorization, or output format. Given no annotations, this is partial transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, each adding value: purpose and behavior. No wasted words, front-loaded with the core action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While the description covers purpose and pool behavior, it lacks details on return format, error handling, and other parameters. With no output schema, this leaves gaps for agent usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema coverage, the description adds meaning for the 'pool' parameter (explaining valid values and retry behavior), but provides no information for 'paper_id' or 'max_results', leaving the agent to infer their semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool recommends papers similar to a given paper, with explicit pool options. It distinguishes from siblings like recommend_papers_from_examples by specifying it takes a single paper as input.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear guidance on pool selection and the automatic retry behavior. However, it does not explicitly contrast with sibling recommendation tools, so usage context is clear but not exhaustive.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recommend_papers_from_examplesBInspect

Semantic Scholar: recommend papers from positive (and optional negative) example paper ids.

ParametersJSON Schema

Name	Required	Description	Default
`max_results`	No
`negative_ids`	No
`positive_ids`	Yes

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must convey behavior. It states that papers are recommended based on examples, but lacks details on algorithm, limitations, or error handling. Adequate but minimal.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence, front-loaded with the tool's source and function. Could be slightly more detailed without losing conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has 3 parameters and no output schema. The description fails to explain parameters or return value, leaving the agent with insufficient information for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description does not explain the meaning or constraints of parameters like 'max_results', 'positive_ids', or 'negative_ids'. No value added over parameter names.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly specifies the action (recommend papers) and the resource (from positive and optional negative example paper IDs). It distinguishes from sibling tools like 'recommend_papers_for_paper' which uses a single paper.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description does not mention when-not or provide any context for selection among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_allAInspect

Aggregated search across arXiv, Semantic Scholar and OpenAlex at once. Fans out concurrently, de-duplicates the same work across corpora (by DOI or title) and re-ranks with Reciprocal Rank Fusion, so papers found by several sources rank highest. Each hit lists which sources found it and an ids map ({source: id}) you can pass to get_paper / read_paper / the citation tools. Prefer this over search_papers for a broad lookup.

ParametersJSON Schema

Name	Required	Default
`query`	Yes
`sources`	No	arxiv,semanticscholar,openalex
`per_source`	No
`max_results`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations were provided, but the description discloses key behavioral traits: concurrent fan-out, de-duplication, and Reciprocal Rank Fusion. It also describes the output structure (sources, ids map). No information on rate limits or performance, but it adds value beyond schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, dense paragraph that conveys all necessary information efficiently. Each sentence adds value, and the key recommendation is placed at the end. Could be slightly improved by structuring, but overall well-written.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 parameters, no output schema, and no annotations, the description explains the high-level behavior well but lacks detail on parameter semantics and error handling. The tool has moderate complexity, and the description provides sufficient context for broad understanding but not full completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description does not explain individual parameters like query, sources, per_source, or max_results. Only hints about output fields are given. The agent must infer parameter meaning from context, which is insufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs aggregated search across arXiv, Semantic Scholar, and OpenAlex, with de-duplication and RRF re-ranking. It also differentiates from sibling tool search_papers by explicitly recommending this tool for broad lookups.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly advises to prefer this over search_papers for broad lookups. It also explains the output format and how to use the ids for other tools. However, it does not mention when not to use it or any prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_authorsBInspect

Semantic Scholar: search for authors by name; returns profiles with h-index and paper/citation counts.

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes
`start`	No
`max_results`	No

Tool Definition Quality

B3.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It partially discloses behavior by stating the output includes h-index and counts, but it does not mention any safety traits (e.g., read-only), pagination limits, authentication needs, or whether results are sorted. For a search tool, this is adequate but incomplete.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that efficiently conveys the source (Semantic Scholar), action, and output. It is front-loaded and contains no extraneous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 3 parameters and no output schema, the description is serviceable but lacks guidance on pagination, result limits, or how to handle large result sets. It does not fully orient the agent on the tool's capabilities relative to siblings.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It explains the 'query' parameter implicitly via 'search for authors by name', but does not add meaning for 'start' or 'max_results' beyond their schema titles. The schema titles 'Start' and 'Max Results' are somewhat self-explanatory, but the description offers no additional context on defaults or behavior.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (search), resource (authors), and method (by name). It also specifies the return fields (h-index, paper/citation counts). However, it does not differentiate from sibling tools like 'get_author' or 'search_by_author', which could cause confusion about when to prefer this tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for searching authors by name but provides no guidance on when not to use this tool, nor does it mention alternatives among the many sibling author-related tools. There is no context about prerequisites or edge cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_by_authorCInspect

Find papers by a specific author, newest first.

ParametersJSON Schema

Name	Required	Default
`start`	No
`author`	Yes
`source`	No	arxiv
`max_results`	No

Tool Definition Quality

C2.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are present, so the description carries full burden. It only reveals the sorting order but does not mention read-only nature, rate limits, authentication needs, or any side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely short (7 words) but lacks necessary details. While concise, it under-specifies the tool, making it less helpful than it could be.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 4 parameters, no output schema, no annotations, and only a terse description, the tool definition is severely incomplete. An agent would lack sufficient context to use it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description adds no explanation for the four parameters (author, start, source, max_results). The tool name implies author is the filter, but no details are provided about defaults or expected formats.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Find papers'), resource ('papers'), filter ('by a specific author'), and sort order ('newest first'). However, it does not differentiate from the sibling tool 'get_author_papers', which may have similar functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like 'search_papers' or 'get_author_papers'. The description lacks any context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_medicalAInspect

Evidence-graded MEDICAL literature search (PubMed + Europe PMC). Unlike search_all (generic, ranks high-cited reviews/guidelines above trials), this filters by research type via PubMed Publication-Type tags and re-ranks by the evidence pyramid (meta-analysis / systematic review > RCT > cohort > ...), so the actual clinical trials surface first. Open-access full text is pulled from Europe PMC by PMID. query should be English keyword/boolean text (PubMed maps it); do natural-language/multilingual understanding upstream. Returns hits with pmid/doi/study_type/evidence_level/citations/abstract and, when open-access, fulltext.

ParametersJSON Schema

Name	Required	Default
`query`	Yes
`year_from`	No
`max_results`	No
`study_types`	No	rct,meta-analysis,systematic-review
`fetch_fulltext`	No

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description bears full burden. It details significant behaviors: evidence pyramid re-ranking, filtering by PubMed Publication-Type tags, open-access full text retrieval from Europe PMC via PMID, and the return fields. It does not mention rate limits, authentication, or data freshness, but the disclosed behaviors are substantial.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise at about 5 sentences, front-loaded with the core purpose. It efficiently packs purpose, differentiation, behavioral details, parameter hints, and return fields. However, the first sentence is quite long and might be slightly dense.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, no output schema), the description covers purpose, usage guidelines, behavioral transparency, and parameter semantics well. It lists return fields and explains ranking logic. While it could mention more about the output format or error handling, it is fairly complete for an AI agent to understand selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description adds considerable value. It explains that 'query' should be English keyword/boolean text. For 'study_types', it mentions 'PubMed Publication-Type tags' and provides defaults ('rct,meta-analysis,systematic-review'). 'fetch_fulltext' has implied context. 'year_from' and 'max_results' are not explained, but the overall parameter context is good.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts with 'Evidence-graded MEDICAL literature search (PubMed + Europe PMC)', clearly stating the tool's purpose. It distinguishes from sibling 'search_all' by explaining differences in ranking (evidence pyramid vs high-cited reviews) and filtering by research type. The verb 'search' and resource 'medical literature' are specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly contrasts with 'search_all' and advises when to use this tool ('actual clinical trials surface first'). It specifies that queries should be English keyword/boolean text and suggests doing natural-language understanding upstream, providing clear usage guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_openalex_authorsCInspect

OpenAlex: search authors; returns profiles with h-index, i10-index, works/citation counts and institutions.

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes
`start`	No
`max_results`	No

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description lists the output fields, providing some behavioral insight, but fails to disclose pagination behavior, sorting, authentication needs, or any side effects. Since annotations are absent, the description should cover more behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, making it concise, but it omits crucial details about parameters and usage. Brevity without completeness reduces effectiveness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a search tool with three parameters and no output schema, the description is insufficient. It does not explain query semantics, result format, pagination, or how it differs from similar tools, leaving major gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description offers no explanation of the three input parameters (query, start, max_results). With 0% schema description coverage, the description was needed to clarify parameter meaning, but it provides none, leaving the agent to infer from names only.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it searches authors and specifies the returned fields (h-index, i10-index, citations, institutions), distinguishing from sibling tools that search works or institutions. However, it does not differentiate from the sibling 'search_authors' which may have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like 'get_author' for specific author IDs or 'search_authors' which might have different features. The description lacks any contextual usage instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_openalex_institutionsBInspect

OpenAlex: search institutions (universities, labs) with ROR id, country, works/citation counts.

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes
`max_results`	No

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It hints at filtering criteria but does not disclose behavior like pagination, rate limits, or read-only nature.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence is concise and front-loaded with source ('OpenAlex') and resource ('institutions'). Could include more structure but is efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema and 0% schema coverage; description only partially covers what is returned (ROR id, country, counts) but lacks details on structure, pagination, or edge cases.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so description must compensate. It mentions possible search fields but does not explain how they relate to the 'query' parameter or provide format/syntax details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it searches institutions (universities, labs) in OpenAlex, with specific fields like ROR id, country, and citation counts. Differentiates from sibling tools like search_authors or search_openalex_works.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus others, but the name and description imply it's for institution searches. Lacks exclusions or alternative tool mentions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_openalex_worksBInspect

OpenAlex: advanced filtered work search. Filters: from_year, to_year, is_oa (open access only), min_citations, institution_id. sort_by: relevance|newest|cited.

ParametersJSON Schema

Name	Required	Default
`is_oa`	No
`query`	No
`sort_by`	No	relevance
`to_year`	No
`from_year`	No
`max_results`	No
`min_citations`	No
`institution_id`	No

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must fully disclose behavior. It mentions filters and sort options but omits critical details like pagination, rate limits, empty result handling, or the role of the 'query' parameter. The description is insufficiently transparent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: a single sentence followed by a compact list. It is front-loaded with the tool's purpose. However, it could be more structured by grouping filters or adding brief usage notes.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 8 parameters and no output schema, the description lacks completeness. It does not explain the 'query' or 'max_results' parameters, nor does it describe the return value or result limits. A user cannot fully understand the tool's behavior from this description.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds some meaning by explaining filters (e.g., 'is_oa (open access only)', 'sort_by: relevance|newest|cited'), but it covers only 5 of 8 parameters and omits 'query' and 'max_results'. With 0% schema coverage, the description partially compensates but remains incomplete.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is an 'advanced filtered work search' for OpenAlex, listing specific filters and sort options. This distinguishes it from siblings like get_openalex_work (single work retrieval) and search_papers (general paper search).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage through its title and filter listing, but it does not explicitly state when to use this tool over alternatives or provide any exclusions. The agent is left to infer context from the sibling list.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_papersCInspect

Search academic papers. Returns normalized hits with a short abstract preview; call get_paper for the full record.

ParametersJSON Schema

Name	Required	Default
`query`	Yes
`start`	No
`source`	No	arxiv
`sort_by`	No	relevance
`max_results`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must disclose behavior. It states returns are 'normalized hits with a short abstract preview', which is useful but lacks details on any side effects, authentication, or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise with two sentences, front-loading the purpose. However, the extreme brevity sacrifices necessary detail for a tool with multiple parameters.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 5 parameters, no output schema, and no schema descriptions, the description is incomplete. It omits critical details like pagination, source selection, and sorting, which are essential for correct usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% for 5 parameters, but the description does not explain any parameter. It fails to add semantics beyond the schema, leaving all parameters undocumented in plain language.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Search academic papers' with a specific verb and resource. It distinguishes itself from 'get_paper' by mentioning it returns previews, but doesn't explicitly differentiate from other sibling search tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description suggests using 'get_paper' for full records, providing a clear alternative. However, it doesn't specify when to use this tool vs. other sibling search tools or any prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_papers_bulkBInspect

Semantic Scholar: bulk paper search (up to 1000 hits, sortable e.g. 'citationCount:desc' or 'publicationDate:desc', with a continuation token). Filters: fields_of_study, year (e.g. '2020-2024'), venue, publication_types, open_access_pdf.

ParametersJSON Schema

Name	Required	Description	Default
`sort`	No
`year`	No
`query`	Yes
`token`	No
`venue`	No
`max_results`	No
`fields_of_study`	No
`open_access_pdf`	No
`publication_types`	No

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses key behaviors: returns up to 1000 hits, supports sorting and continuation token. However, with no annotations, it fails to mention whether it's read-only, rate limits, or authorization needs. Partially transparent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence efficiently conveys core functionality and filters. Front-loads key info. Could be slightly more structured for quick scanning, but no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 9 parameters, no output schema, and no annotations, the description covers main purpose and filter options but lacks details on parameter formats, output structure, and pagination behavior. Adequate but leaves gaps for autonomous agent use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Adds meaning beyond pure schema by listing filter parameters and giving sort examples. But it doesn't explain format for all parameters (e.g., venue, publication_types) or constraints. Incomplete but adds value above baseline (0% schema coverage).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it's a bulk paper search from Semantic Scholar with specific capabilities (up to 1000 hits, sorting, continuation token). Mentioning filters and sortable fields distinguishes it from sibling tools like search_papers and other search variants.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives. It implies bulk usage but doesn't contrast with search_papers (likely for fewer results) or other search tools. Missing when-not-to-use and prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_snippetsBInspect

Semantic Scholar: search INSIDE paper full text and return matching text snippets (not just titles/abstracts).

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes
`max_results`	No

Tool Definition Quality

B3.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must cover behavioral traits. It only states basic function (search and return snippets) without details on result format, limits, or pagination. This is insufficient for a tool with no additional metadata.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence with no extraneous words. It is front-loaded with the key differentiator.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity of the tool and the lack of output schema or annotations, the description is adequate but has gaps: it does not mention result structure, error handling, or how to interpret snippets. More detail would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, yet the description does not explain the parameters (query, max_results) beyond the implied search context. For a low-coverage schema, the description should add meaning, but it falls short.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches inside paper full text and returns snippets, explicitly distinguishing it from title/abstract searches. This differentiation from siblings like search_papers is strong.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use when needing in-text snippets, but does not explicitly state when not to use or compare with alternatives. Sibling context suggests broader searches use other tools, but no direct guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Server Details

Available Tools

Discussions

Your Connectors

Resources