Skip to main content
Glama

Server Details

Papers With Code MCP — browse ML research papers and their code repositories

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL
Repository
pipeworx-io/mcp-paperswithcode
GitHub Stars
0

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsB

Average 3.8/5 across 9 of 9 tools scored. Lowest: 2.9/5.

Server CoherenceA
Disambiguation4/5

Most tools have clearly distinct purposes (e.g., search_papers vs. get_paper vs. get_repositories). However, the first three tools (ask_pipeworx, discover_tools, and the memory tools) are generic and overlap in purpose with the domain-specific tools, causing mild ambiguity.

Naming Consistency3/5

Tool names mix verb_noun patterns (search_papers, get_paper, get_repositories) with generic single-word verbs (forget, recall, remember) and the nonstandard 'ask_pipeworx' and 'discover_tools'. The inconsistency in style and verb choice reduces predictability.

Tool Count5/5

With 9 tools, the count is well-scoped for a papers-with-code server. Each tool serves a clear purpose, and the set is neither too thin nor overly large.

Completeness3/5

The tool set covers core search and retrieval for papers and repositories, but lacks operations for managing data (e.g., no create/update/delete for papers) and lacks access to other common features like authors, evaluations, or datasets. The generic memory tools partially compensate but are not domain-specific.

Available Tools

9 tools
ask_pipeworxAInspect

Ask a question in plain English and get an answer from the best available data source. Pipeworx picks the right tool, fills the arguments, and returns the result. No need to browse tools or learn schemas — just describe what you need. Examples: "What is the US trade deficit with China?", "Look up adverse events for ozempic", "Get Apple's latest 10-K filing".

ParametersJSON Schema
NameRequiredDescriptionDefault
questionYesYour question or request in natural language
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. Describes that it picks the right tool and fills arguments, but doesn't disclose limits like data recency, source accuracy, or rate limits. Adequate but could add more behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is three sentences plus examples, front-loaded with key purpose. Efficient but could be slightly more concise by integrating examples more tightly. Overall well-structured and no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (single parameter, no output schema), the description is complete enough to guide an agent. It explains what the tool does and how to use it. Missing details on response format or error handling, but these are minor given the context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a single parameter described as 'Your question or request in natural language.' Description reinforces this with examples but doesn't add new parameter semantics beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool accepts a natural language question and returns an answer from the best data source. It distinguishes itself from sibling tools by abstracting away tool selection and argument filling, making its purpose unique and specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly tells when to use: when you have a question in plain English and want the system to handle tool and schema selection. Implicitly says not to use sibling tools for direct queries. Provides example questions to guide usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

discover_toolsAInspect

Search the Pipeworx tool catalog by describing what you need. Returns the most relevant tools with names and descriptions. Call this FIRST when you have 500+ tools available and need to find the right ones for your task.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of tools to return (default 20, max 50)
queryYesNatural language description of what you want to do (e.g., "analyze housing market trends", "look up FDA drug approvals", "find trade data between countries")
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses the search behavior (returns relevant tools based on natural language query), the limit parameter, and that it returns names and descriptions. However, it does not mention any limitations like query language support, result sorting, or whether it performs fuzzy matching. A score of 4 is justified because the description gives a clear mental model of operation without deep technical detail.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, each sentence adding essential information. The first sentence states what the tool does and its return value. The second sentence gives usage context. No filler or redundant content. The imperative 'Call this FIRST' is front-loaded and impactful.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that there is no output schema and no annotations, the description must explain return format and behavioral expectations. It mentions 'returns the most relevant tools with names and descriptions' which covers the output. However, it does not specify if the output includes a relevance score or ordering, nor does it discuss error handling or query syntax. With sibling tools focused on specific tasks, the description is sufficient for an agent to use this tool effectively, but lacks edge-case detail.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, meaning both parameters (query and limit) are already described in the schema. The description adds that the query should be a 'natural language description' and that the limit defaults to 20 (max 50). This adds some value beyond the schema's type and description fields, but the schema already provides adequate semantic meaning. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Search') and resource ('Pipeworx tool catalog'), clearly states the return type ('most relevant tools with names and descriptions'), and distinguishes from sibling tools by suggesting calling this FIRST. It also explains the scenario (500+ tools) and goal (find right tools), making purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Call this FIRST when you have 500+ tools available and need to find the right ones for your task,' providing clear when-to-use guidance and implying it's a preliminary step before other tools. The sibling tools include specific functions like get_paper, search_papers, etc., and this tool is positioned as a discovery layer, differentiating it from them.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

forgetCInspect

Delete a stored memory by key.

ParametersJSON Schema
NameRequiredDescriptionDefault
keyYesMemory key to delete
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It states deletion but does not disclose whether the operation is irreversible, if confirmation is needed, or any side effects (e.g., cascading deletions).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is a single, concise sentence that is front-loaded with the verb 'Delete'. Every word is necessary, but could be slightly more specific about the resource type.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and a simple parameter, the description is minimal. It does not explain return value (e.g., success/failure) or error conditions (e.g., key not found).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% (parameter 'key' documented in schema). Description adds no extra meaning beyond 'delete by key', so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (delete) and the resource (stored memory), with the specific parameter 'key'. It is distinguishable from siblings like 'remember' (create) and 'recall' (read).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives like 'recall' or 'remember'. Does not mention prerequisites (e.g., memory must exist) or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_paperAInspect

Get full details for a specific paper by ID. Returns title, abstract, authors, publication venue, and links.

ParametersJSON Schema
NameRequiredDescriptionDefault
idYesPapers With Code paper ID (e.g., "attention-is-all-you-need")
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It states the tool returns 'full metadata including title, abstract, authors, and links', which is useful. However, it does not disclose any potential side effects (unlikely for a read tool), rate limits, or whether the ID must be exact. Since the tool is a simple getter, the description is adequate but not comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two short sentences with no redundancy. Front-loaded with the core action and identifier method. Every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Tool is simple (1 param, no output schema, no nested objects). Description covers what it does and what it returns. Could mention that output structure matches typical paper metadata, but schema already describes the ID. For a simple lookup, this is nearly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% (one parameter 'id' with description). The description adds that the ID is a 'Papers With Code paper ID' and gives an example, which aligns with schema. No additional semantic enrichment beyond the schema, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description states 'Get a single paper by its Papers With Code ID', which is a specific verb ('Get') and resource ('single paper') with an explicit identifier. This clearly distinguishes from siblings like 'search_papers' (which returns multiple results) and 'trending_papers' (which is a curated list).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when you have a specific Papers With Code ID and need full metadata for a single paper. It does not explicitly state when not to use it, but the context of sibling tools (search_papers for queries, trending_papers for discovery) provides implicit guidance. Lacks explicit alternatives for different use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_repositoriesAInspect

Find code implementations linked to a paper (by paper ID). Returns repo URLs, star counts, framework, and official implementation flag.

ParametersJSON Schema
NameRequiredDescriptionDefault
idYesPapers With Code paper ID
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are present, so the description must convey behavioral traits. It correctly indicates a read operation (get) and lists the output fields. However, it does not disclose potential issues like rate limits, authentication needs, or behavior for invalid/missing IDs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences that front-load the purpose and then list key return fields. Every word adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity of the tool (single required parameter, no output schema, no annotations), the description is nearly complete. It could mention what happens when the paper ID is not found, but otherwise covers the essential details for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% (the id parameter is described as 'Papers With Code paper ID'). The description adds no additional detail about the parameter beyond that, but since the schema already covers it, the baseline is 3. No extra value is provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it retrieves code repositories for a paper, specifying the exact resource (repositories linked to a paper) and action (get by paper ID). The description also lists the returned fields (repo URL, stars, framework, official status), making the tool's purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description clearly says the tool requires a paper ID, which is explicit in the input schema. It does not, however, provide guidance on when to use this tool versus alternatives like get_paper or search_papers, or note any limitations (e.g., what if no repositories exist).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recallAInspect

Retrieve a previously stored memory by key, or list all stored memories (omit key). Use this to retrieve context you saved earlier in the session or in previous sessions.

ParametersJSON Schema
NameRequiredDescriptionDefault
keyNoMemory key to retrieve (omit to list all keys)
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It states that the tool retrieves data (non-destructive) and can list all keys. However, it doesn't disclose behavior like whether the memory persists across sessions, or if there are limits on key length or memory size. This is adequate but not thorough.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, concise and front-loaded with the main action. The second sentence adds useful context. No wasted words, though it could be slightly more compact.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description explains the two modes (retrieve by key, list all). It covers the main use cases. However, it could mention if keys are case-sensitive or what happens if key doesn't exist.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with one parameter. Description adds value by explaining that omitting key lists all stored memories, which is a key usage pattern not obvious from the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool retrieves a memory by key or lists all memories when key is omitted, using specific verbs 'retrieve' and 'list' and the resource 'memory'. It distinguishes itself from sibling tools like 'remember' (which stores) and 'forget' (which deletes).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description explains when to use this tool ('retrieve context you saved earlier') and implicitly differentiates from 'remember' (store) and 'forget' (delete). However, it doesn't explicitly say when not to use it or mention alternatives beyond implicit context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rememberAInspect

Store a key-value pair in your session memory. Use this to save intermediate findings, user preferences, or context across tool calls. Authenticated users get persistent memory; anonymous sessions last 24 hours.

ParametersJSON Schema
NameRequiredDescriptionDefault
keyYesMemory key (e.g., "subject_property", "target_ticker", "user_preference")
valueYesValue to store (any text — findings, addresses, preferences, notes)
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are absent, so the description carries the full burden. It discloses memory persistence details (authenticated vs anonymous) but does not mention any limits on key length, value size, or total number of entries, nor what happens on overwriting an existing key. This leaves gaps in behavioral understanding.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, all essential: the first states the core function, the second lists typical use cases, the third clarifies persistence behavior. No fluff, information front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (2 required string params, no output schema, no nested objects), the description is largely complete. However, it omits any mention of overwrite behavior or limits, which would improve completeness for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the description adds limited value beyond schema. It reinforces the purpose of storing 'intermediate findings, user preferences, or context' but does not elaborate on parameter constraints or examples beyond the schema's own hints (e.g., example keys). Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it stores a key-value pair in session memory. It specifies the resource ('session memory') and the action ('store'), distinguishing it from siblings like 'forget' and 'recall' which handle removal and retrieval respectively.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: to save intermediate findings, user preferences, or context across tool calls. It also notes persistence differences based on authentication status, which helps the agent decide usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_papersBInspect

Search ML research papers by keyword. Returns title, authors, abstract, conference, and links. Use when exploring a research topic or finding papers on specific methods.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoNumber of results to return (default: 10, max: 50)
queryYesSearch query (e.g., "attention transformer")
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It does not disclose behavioral traits such as rate limits, data source freshness, whether results are paginated, or any side effects. The description is minimal and lacks important context for safe invocation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, concise and front-loaded with purpose and output. However, it could be more efficient by avoiding the output list if obvious, but it is not wasteful.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool is simple (2 params, no output schema), the description covers the basic purpose and output. However, it lacks context on search behavior (e.g., exact vs fuzzy matching, sorting) and does not leverage the opportunity to compensate for missing annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%: both 'query' and 'limit' have descriptions in the schema. The description adds no new semantics beyond listing output fields, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches ML research papers by keyword and specifies the output fields (title, authors, abstract, conference, links). It distinguishes from siblings like 'get_paper' (likely retrieves a specific paper) and 'trending_papers' (trending rather than search).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for keyword-based search but does not explicitly state when to use this vs alternatives like 'get_paper' or 'trending_papers'. No exclusions or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.