Skip to main content
Glama
ukicar

Gallica/BnF MCP Server

by ukicar

advanced_search

Search the Gallica digital library using custom CQL queries to find documents by creator, type, subject, language, or other metadata fields.

Instructions

Perform an advanced search using custom CQL query syntax. Examples: dc.creator all "Victor Hugo" and dc.type all "monographie", dc.subject all "Paris" and dc.type all "carte", dc.language all "eng".

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYesCustom CQL query string
max_resultsNoMaximum number of results to return (1-50)
start_recordNoStarting record for pagination

Implementation Reference

  • Tool definition with input schema and handler function for advanced_search. Defines the tool name, description, input parameters (query, max_results, start_record), and handler that calls searchApi.advancedSearch().
    export function createAdvancedSearchTool(searchApi: SearchAPI) {
      return {
        name: 'advanced_search',
        description: 'Perform an advanced search using custom CQL query syntax. Examples: dc.creator all "Victor Hugo" and dc.type all "monographie", dc.subject all "Paris" and dc.type all "carte", dc.language all "eng".',
        inputSchema: {
          type: 'object',
          properties: {
            query: {
              type: 'string',
              description: 'Custom CQL query string',
            },
            max_results: {
              type: 'number',
              description: 'Maximum number of results to return (1-50)',
              default: config.defaultMaxRecords,
            },
            start_record: {
              type: 'number',
              description: 'Starting record for pagination',
              default: config.defaultStartRecord,
            },
          },
          required: ['query'],
        },
        handler: async (args: unknown) => {
          const parsed = searchParamsSchema.extend({ query: z.string() }).parse(args);
          return await searchApi.advancedSearch(
            parsed.query,
            parsed.max_results ?? config.defaultMaxRecords,
            parsed.start_record ?? config.defaultStartRecord
          );
        },
      };
  • The SearchAPI.advancedSearch method implementation - a thin wrapper that delegates to the private search() method with custom CQL query support.
    advancedSearch(
      query: string,
      maxResults: number = config.defaultMaxRecords,
      startRecord: number = config.defaultStartRecord
    ): Promise<SearchResult> {
      return this.search(query, startRecord, maxResults);
    }
  • Core search method that executes the actual HTTP request to Gallica SRU API, handles error cases, and returns parsed search results.
    private async search(
      query: string,
      startRecord: number = config.defaultStartRecord,
      maxRecords: number = config.defaultMaxRecords
    ): Promise<SearchResult> {
      logger.info(`[SEARCH] Executing search query: "${query}" (startRecord: ${startRecord}, maxRecords: ${maxRecords})`);
      const params = {
        version: '1.2',
        operation: 'searchRetrieve',
        query,
        startRecord: String(startRecord),
        maximumRecords: String(Math.min(maxRecords, 50)), // Cap at 50 like Python
      };
    
      try {
        logger.debug(`[SEARCH] Calling Gallica SRU API with params:`, params);
        const xmlBody = await this.httpClient.getXml(this.sruUrl, params);
        logger.debug(`[SEARCH] Received XML response, length: ${xmlBody.length} bytes`);
        const result = this.parseSruResponse(xmlBody, query);
        logger.info(`[SEARCH] Search completed: ${result.records.length} records returned out of ${result.metadata.total_records} total`);
        return result;
      } catch (error) {
        logger.error(`[SEARCH] Error during Gallica API request: ${error instanceof Error ? error.message : String(error)}`);
        logger.error(`[SEARCH] Error stack:`, error instanceof Error ? error.stack : 'No stack trace');
        return {
          metadata: {
            query,
            total_records: '0',
            records_returned: 0,
            date_retrieved: new Date().toISOString().replace('T', ' ').substring(0, 19),
          },
          records: [],
          error: error instanceof Error ? error.message : String(error),
          parameters: params,
        };
      }
    }
  • XML parsing logic that parses SRU XML responses and extracts Dublin Core fields (title, creator, subject, etc.) and Gallica URLs from records.
    private parseSruResponse(xmlBody: string, query: string): SearchResult {
      try {
        const parser = new XMLParser({
          ignoreAttributes: false,
          attributeNamePrefix: '@_',
          textNodeName: '#text',
          parseAttributeValue: true,
        });
    
        const result = parser.parse(xmlBody);
    
        // Navigate through SRU response structure
        const sruResponse = result['srw:searchRetrieveResponse'] || result.searchRetrieveResponse;
        if (!sruResponse) {
          throw new Error('Invalid SRU response structure');
        }
    
        const numberOfRecords = sruResponse['srw:numberOfRecords']?.['#text'] || 
                                sruResponse.numberOfRecords?.['#text'] ||
                                sruResponse['srw:numberOfRecords'] ||
                                sruResponse.numberOfRecords ||
                                '0';
    
        const records = sruResponse['srw:records']?.['srw:record'] || 
                       sruResponse.records?.record ||
                       [];
    
        const recordsArray = Array.isArray(records) ? records : records ? [records] : [];
    
        const parsedRecords: Array<Record<string, string | string[] | undefined>> = [];
    
        for (const record of recordsArray) {
          const recordData = record['srw:recordData']?.['oai_dc:dc'] ||
                            record.recordData?.['oai_dc:dc'] ||
                            record['srw:recordData'] ||
                            record.recordData;
    
          if (!recordData) continue;
    
          const recordDict: Record<string, string | string[] | undefined> = {};
    
          // Extract Dublin Core fields
          const dcFields = [
            'title', 'creator', 'contributor', 'publisher', 'date',
            'description', 'type', 'format', 'identifier', 'source',
            'language', 'relation', 'coverage', 'rights', 'subject',
          ];
    
          for (const field of dcFields) {
            const elements = recordData[`dc:${field}`] || recordData[field];
            if (elements) {
              const values = Array.isArray(elements) ? elements : [elements];
              const textValues = values
                .map((v: unknown) => {
                  if (typeof v === 'string') return v.trim();
                  if (v && typeof v === 'object' && '#text' in v) return String(v['#text']).trim();
                  return String(v).trim();
                })
                .filter((v: string) => v.length > 0);
    
              if (textValues.length > 0) {
                const value: string | string[] = textValues.length === 1 ? textValues[0]! : textValues;
                recordDict[field] = value;
              }
            }
          }
    
          // Extract Gallica URL from identifiers
          const identifiers = recordDict.identifier;
          if (identifiers) {
            const idArray = Array.isArray(identifiers) ? identifiers : [identifiers];
            for (const identifier of idArray) {
              if (typeof identifier === 'string' && identifier.includes('gallica.bnf.fr/ark:')) {
                recordDict.gallica_url = identifier;
                break;
              }
            }
          }
    
          parsedRecords.push(recordDict);
        }
    
        return {
          metadata: {
            query,
            total_records: String(numberOfRecords),
            records_returned: parsedRecords.length,
            date_retrieved: new Date().toISOString().replace('T', ' ').substring(0, 19),
          },
          records: parsedRecords,
        };
      } catch (error) {
        logger.error(`Error parsing XML response: ${error instanceof Error ? error.message : String(error)}`);
        return {
          metadata: {
            query,
            total_records: '0',
            records_returned: 0,
            date_retrieved: new Date().toISOString().replace('T', ' ').substring(0, 19),
          },
          records: [],
          error: `XML parsing error: ${error instanceof Error ? error.message : String(error)}`,
        };
      }
    }
  • Tool registration - creates the advancedSearch tool instance and adds it to the tools array for MCP server registration.
    const advancedSearch = createAdvancedSearchTool(searchApi);
    const naturalLanguageSearch = createNaturalLanguageSearchTool(searchApi);
    
    // Register extended item tools (4 new tools)
    const getItemDetails = createGetItemDetailsTool(itemsClient);
    const getItemPages = createGetItemPagesTool(itemsClient);
    const getPageImage = createGetPageImageTool(iiifClient);
    const getPageText = createGetPageTextTool(textClient);
    
    // Register sequential reporting tool
    const sequentialReporting = createSequentialReportingTool(reportingServer);
    
    // Register all tools with error handling
    const tools = [
      searchByTitle,
      searchByAuthor,
      searchBySubject,
      searchByDate,
      searchByDocumentType,
      advancedSearch,
      naturalLanguageSearch,
      getItemDetails,
      getItemPages,
      getPageImage,
      getPageText,
      sequentialReporting,
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions 'advanced search' and provides CQL examples, but doesn't disclose important behavioral traits: whether this is read-only or has side effects, authentication requirements, rate limits, error handling, or what the search results contain. For a search tool with no annotation coverage, this leaves significant gaps in understanding its behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately concise with two sentences: a clear purpose statement followed by helpful examples. The examples are relevant and illustrate the tool's capability without being verbose. However, the structure could be improved by front-loading more critical information about when to use this versus other search tools.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (advanced search with custom query syntax), no annotations, and no output schema, the description is incomplete. It doesn't explain what the search returns, how results are structured, error conditions, or performance characteristics. For a tool that presumably returns search results, the lack of output information combined with minimal behavioral context makes this inadequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters thoroughly. The description doesn't add any parameter semantics beyond what's in the schema - it mentions CQL query syntax generally but doesn't provide additional details about the 'query' parameter format, constraints, or examples that go beyond the schema's 'Custom CQL query string' description. Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the purpose: 'Perform an advanced search using custom CQL query syntax.' It specifies the verb ('perform an advanced search') and resource (search functionality), but doesn't explicitly differentiate from sibling tools like 'natural_language_search' or other search_by_* tools. The examples help illustrate the scope but don't provide explicit differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With multiple sibling search tools (natural_language_search, search_by_author, search_by_date, etc.), there's no indication of when this advanced CQL search is preferred over simpler, more specific search tools. The examples show CQL syntax but don't establish usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ukicar/sweet-bnf'

If you have feedback or need assistance with the MCP directory API, please join our Discord server