Skip to main content
Glama
alexwade

DataCite MCP Server

by alexwade

search_dois

Search over 125 million research DOIs using full-text queries and filters by resource type, funder, year, repository, and more.

Instructions

Search DataCite's index of 125M+ research DOIs. Supports full-text queries and filters by resource type, funder, year, repository, and more.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryNo
resource_typeNo
funder_ror_idNo
affiliation_ror_idNo
client_idNo
provider_idNo
prefixNo
published_yearNo
sortNo
page_sizeNo
page_cursorNo

Implementation Reference

  • The registerTool function registers the 'search_dois' MCP tool. The handler (lines 88-161) parses Zod-validated input, builds DataCite API query parameters with Elasticsearch-style field filters, calls the DataCite /dois endpoint with caching, extracts cursor-based pagination, formats results via formatDoiSummary, and returns the response as JSON.
    export function registerTool(server: McpServer): void {
      server.tool(
        "search_dois",
        "Search DataCite's index of 125M+ research DOIs. Supports full-text queries and filters by resource type, funder, year, repository, and more.",
        SearchSchema.shape,
        async (params) => {
          const input = SearchSchema.parse(params);
    
          const apiParams: Record<string, string | number | boolean> = {
            "page[size]": input.page_size,
            detail: true,
          };
    
          // Build Elasticsearch query clauses. Field-path filters are appended here
          // rather than as dedicated API params because:
          //  - types.resourceTypeGeneral:{value} is verified to work; resource-type-id
          //    requires lowercase-hyphenated values (computationalnotebook → broken).
          //  - funderIdentifier / affiliationIdentifier require full https://ror.org/ URIs;
          //    the dedicated funder-id / affiliation-id params have the same requirement
          //    but the query-path form is verified correct against the live index.
          const queryClauses: string[] = [];
          if (input.query)            queryClauses.push(`(${input.query})`);
          if (input.resource_type)    queryClauses.push(`types.resourceTypeGeneral:${input.resource_type}`);
          if (input.funder_ror_id)    queryClauses.push(`fundingReferences.funderIdentifier:"${normalizeRorId(input.funder_ror_id)}"`);
          if (input.affiliation_ror_id) queryClauses.push(`creators.affiliation.affiliationIdentifier:"${normalizeRorId(input.affiliation_ror_id)}"`);
          if (input.published_year)   queryClauses.push(`publicationYear:${input.published_year}`);
          if (queryClauses.length)    apiParams["query"] = queryClauses.join(" AND ");
    
          if (input.client_id)   apiParams["client-id"]    = input.client_id;
          if (input.provider_id) apiParams["provider-id"]  = input.provider_id;
          if (input.prefix)      apiParams["prefix"]       = input.prefix;
          if (input.sort && input.sort !== "relevance") apiParams["sort"] = input.sort;
          if (input.page_cursor) apiParams["page[cursor]"] = input.page_cursor;
    
          const cacheKey = JSON.stringify(
            Object.entries(apiParams).sort(([a], [b]) => a.localeCompare(b))
          );
    
          try {
            const response = await getCached<SearchResponse>(
              searchCache,
              cacheKey,
              () => dataciteClient.get<SearchResponse>("/dois", apiParams)
            );
    
            // Extract next cursor from links.next URL
            let next_cursor: string | null = null;
            if (response.links?.next) {
              try {
                const nextUrl = new URL(response.links.next);
                next_cursor = nextUrl.searchParams.get("page[cursor]");
              } catch {
                // ignore parse errors
              }
            }
    
            const results = (response.data ?? []).map(formatDoiSummary);
    
            return {
              content: [
                {
                  type: "text" as const,
                  text: JSON.stringify(
                    {
                      results,
                      total_results: response.meta?.total ?? results.length,
                      next_cursor,
                    },
                    null,
                    2
                  ),
                },
              ],
            };
          } catch (err) {
            const msg = err instanceof Error ? err.message : String(err);
            throw apiError(msg);
          }
        }
      );
    }
  • SearchSchema defines the input schema using Zod: optional query string, resource_type enum (29 values), funder_ror_id, affiliation_ror_id, client_id, provider_id, prefix, published_year, sort enum, page_size (default 10, max 100), and page_cursor.
    const SearchSchema = z.object({
      query: z.string().optional(),
      resource_type: z
        .enum([
          "Audiovisual",
          "Book",
          "BookChapter",
          "Collection",
          "ComputationalNotebook",
          "ConferencePaper",
          "ConferenceProceeding",
          "DataPaper",
          "Dataset",
          "Dissertation",
          "Event",
          "Image",
          "Instrument",
          "InteractiveResource",
          "Journal",
          "JournalArticle",
          "Model",
          "OutputManagementPlan",
          "PeerReview",
          "PhysicalObject",
          "Preprint",
          "Report",
          "Service",
          "Software",
          "Sound",
          "Standard",
          "StudyRegistration",
          "Text",
          "Workflow",
          "Other",
        ])
        .optional(),
      funder_ror_id: z.string().optional(),
      affiliation_ror_id: z.string().optional(),
      client_id: z.string().optional(),
      provider_id: z.string().optional(),
      prefix: z.string().optional(),
      published_year: z.number().int().optional(),
      sort: z
        .enum([
          "relevance",
          "created",
          "-created",
          "updated",
          "-updated",
          "published",
          "-published",
          "citation-count",
          "-citation-count",
          "view-count",
          "-view-count",
        ])
        .optional(),
      page_size: z.number().int().min(1).max(100).default(10),
      page_cursor: z.string().optional(),
    });
  • registerSearchDois(server) is called inside registerAllTools() to register the tool on the MCP server instance.
    export function registerAllTools(server: McpServer): void {
      registerSearchDois(server);
  • normalizeRorId normalizes ROR identifiers (bare ID, ror.org/..., http/https variants) to the canonical https://ror.org/... form, used by the handler for funder and affiliation filters.
    function normalizeRorId(raw: string): string {
      const s = raw.trim();
      if (s.startsWith("https://ror.org/")) return s;
      if (s.startsWith("http://ror.org/"))  return `https://ror.org/${s.slice("http://ror.org/".length)}`;
      if (s.startsWith("ror.org/"))         return `https://${s}`;
      return `https://ror.org/${s}`;
    }
  • formatDoiSummary is called by the handler to transform each DataCite DoiRecord into a condensed response object with doi, title, creators, year, resource_type, publisher, abstract_snippet, and counts.
    export function formatDoiSummary(record: DoiRecord): object {
      const a = record.attributes;
      const title = a.titles?.[0]?.title ?? "(no title)";
      const creators = (a.creators ?? []).slice(0, 3).map(formatCreator);
      const firstDesc = a.descriptions?.[0]?.description ?? "";
      const abstract_snippet = firstDesc.length > 300 ? firstDesc.slice(0, 300) + "…" : firstDesc;
    
      return {
        doi: a.doi ?? record.id,
        title,
        creators,
        year: a.publicationYear,
        resource_type: a.types?.resourceTypeGeneral ?? a.resourceTypeGeneral,
        publisher: a.publisher,
        abstract_snippet: abstract_snippet || undefined,
        view_count: a.viewCount,
        download_count: a.downloadCount,
        citation_count: a.citationCount,
      };
    }
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided. The description mentions it indexes 125M+ DOIs, but lacks details on pagination, sorting defaults, rate limits, or auth requirements. Adequate for a search tool but incomplete.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with purpose. No wasted words, but could be structured to list filters more clearly. Still, very efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 11 parameters, no output schema, and no description of return structure or pagination, the description is incomplete for an agent to use effectively. Lacks details on how to handle results beyond the search itself.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It only names a few filters (resource type, funder, year, repository) but does not explain parameter specifics like format for affiliation_ror_id or curser usage. Insufficient for 11 parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches DataCite's index of DOIs, with specific capabilities like full-text queries and filters. It distinguishes itself from siblings like get_doi (single retrieval) and search_by_person (person search).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for broad searches with filtering, but does not explicitly state when not to use it or list alternatives. However, the context of sibling tools makes the primary use case clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/alexwade/datacite-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server