Skip to main content
Glama
AI-Riksarkivet

Riksarkivet MCP Server

search_transcribed

Search transcribed historical documents from Swedish National Archives using keywords, wildcards, and advanced query syntax to find specific content in digitized archives.

Instructions

Search for keywords in transcribed historical documents from the Swedish National Archives (Riksarkivet).

This tool searches through historical documents and returns matching pages with their transcriptions.
Supports advanced Solr query syntax including wildcards, fuzzy search, Boolean operators, and proximity searches.

Key features:
- Returns document metadata, page numbers, and text snippets containing the keyword
- Provides direct links to page images and ALTO XML transcriptions
- Supports pagination via offset parameter for comprehensive discovery
- Advanced search syntax for precise queries

Search syntax examples:
- Basic: "Stockholm" - exact term search
- Wildcards: "Stock*", "St?ckholm", "*holm" - match patterns
- Fuzzy: "Stockholm~" or "Stockholm~1" - find similar words (typos, variants)
- Proximity: '"Stockholm trolldom"~10' - words within 10 words of each other
- Boolean: "(Stockholm AND trolldom)", "(Stockholm OR Göteborg)", "(Stockholm NOT trolldom)"
- Boosting: "Stockholm^4 trol*" - increase relevance of specific terms
- Complex: "((troll* OR häx*) AND (Stockholm OR Göteborg))" - combine operators

NOTE: make sure to use grouping () for any boolean search also  "" is important to group multiple words
E.g do '((skatt* OR guld* OR silver*) AND (stöld* OR stul*))' instead of '(skatt* OR guld* OR silver*) AND (stöld* OR stul*)', i.e prefer grouping as that will retrun results, non-grouping will return 0 results 

also prefer to use fuzzy search i.e. something like ((stöld~2 OR tjufnad~2) AND (silver* OR guld*)) AND (döm* OR straff*) as many trancriptions are OCR/HTR AI based with common errors. Also account for old swedish i.e (((präst* OR prest*) OR (kyrko* OR kyrck*)) AND ((silver* OR silfv*) OR (guld* OR gull*)))

Proximity guide:

    Use quotes around the search terms

    "term1 term2"~N ✅
    term1 term2~N ❌

    Only 2 terms work reliably

    "kyrka stöld"~10 ✅
    "kyrka silver stöld"~10 ❌

    The number indicates maximum word distance

    ~3 = within 3 words
    ~10 = within 10 words
    ~50 = within 50 words

    📊 Working Examples by Category:
    Crime & Punishment:
    "tredje stöld"~5           # Third-time theft
    "dömd hänga"~10            # Sentenced to hang  
    "inbrott natt*"~5          # Burglary at night
    "kyrka stöld"~10           # Church theft
    Values & Items:
    "hundra daler"~3           # Hundred dalers
    "stor* stöld*"~5           # Major theft
    "guld* ring*"~10           # Gold ring
    "silver* kalk*"~10         # Silver chalice
    Complex Combinations:
    ("kyrka stöld"~10 OR "kyrka tjuv*"~10) AND 17*
    # Church thefts or church thieves in 1700s

    ("inbrott natt*"~5) AND (guld* OR silver*)  
    # Night burglaries involving gold or silver

    ("första resan" AND stöld*) OR ("tredje stöld"~5)
    # First-time theft OR third theft (within proximity)
    🔧 Troubleshooting Tips:
    If proximity search returns no results:

    Check your quotes - Must wrap both terms
    Reduce to 2 terms - Drop extra words
    Try exact terms first - Before wildcards
    Increase distance - Try ~10 instead of ~3
    Simplify wildcards - Use on one term only

    💡 Advanced Strategy:
    Layer your searches from simple to complex:
    Step 1: "kyrka stöld"~10
    Step 2: ("kyrka stöld"~10 OR "kyrka tjuv*"~10)
    Step 3: (("kyrka stöld"~10 OR "kyrka tjuv*"~10) AND 17*)
    Step 4: (("kyrka stöld"~10 OR "kyrka tjuv*"~10) AND 17*) AND (guld* OR silver*)
    Most Reliable Proximity Patterns:

    Exact + Exact: "hundra daler"~3
    Exact + Wildcard: "inbrott natt*"~5
    Wildcard + Wildcard (sometimes): "stor* stöld*"~5

    The key is that proximity operators in this system work best with exactly 2 terms in quotes, and you can then combine multiple proximity searches using Boolean operators outside the quotes!



Parameters:
- keyword: Search term or Solr query (required)
- offset: Starting position for pagination - use 0, then 50, 100, etc. (required)
- max_results: Maximum documents to return per query (default: 10)
- max_hits_per_document: Maximum matching pages per document (default: 3)
- max_response_tokens: Maximum tokens in response (default: 15000)

Best practices:
- Start with offset=0 and increase by 50 to discover all matches
- Search related terms and variants for comprehensive coverage
- Use wildcards (*) for word variations: "troll*" finds "trolldom", "trolleri", "trollkona"
- Use fuzzy search (~) for historical spelling variants
- Use browse_document tool to view full page transcriptions of interesting results

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
keywordYes
offsetYes
max_resultsNo
max_hits_per_documentNo
max_response_tokensNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • MCP tool registration with name 'search_transcribed', detailed description, and parameter schema (keyword, offset, max_results, etc.). Defines the tool's input schema and usage instructions.
    @search_mcp.tool(
        name="search_transcribed",
        description="""Search for keywords in transcribed historical documents from the Swedish National Archives (Riksarkivet).
    
        This tool searches through historical documents and returns matching pages with their transcriptions.
        Supports advanced Solr query syntax including wildcards, fuzzy search, Boolean operators, and proximity searches.
    
        Key features:
        - Returns document metadata, page numbers, and text snippets containing the keyword
        - Provides direct links to page images and ALTO XML transcriptions
        - Supports pagination via offset parameter for comprehensive discovery
        - Advanced search syntax for precise queries
    
        Search syntax examples:
        - Basic: "Stockholm" - exact term search
        - Wildcards: "Stock*", "St?ckholm", "*holm" - match patterns
        - Fuzzy: "Stockholm~" or "Stockholm~1" - find similar words (typos, variants)
        - Proximity: '\"Stockholm trolldom\"~10' - words within 10 words of each other
        - Boolean: "(Stockholm AND trolldom)", "(Stockholm OR Göteborg)", "(Stockholm NOT trolldom)"
        - Boosting: \"Stockholm^4 trol*\" - increase relevance of specific terms
        - Complex: "((troll* OR häx*) AND (Stockholm OR Göteborg))" - combine operators
    
        NOTE: make sure to use grouping () for any boolean search also  \"\" is important to group multiple words
        E.g do '((skatt* OR guld* OR silver*) AND (stöld* OR stul*))' instead of '(skatt* OR guld* OR silver*) AND (stöld* OR stul*)', i.e prefer grouping as that will retrun results, non-grouping will return 0 results 
    
        also prefer to use fuzzy search i.e. something like ((stöld~2 OR tjufnad~2) AND (silver* OR guld*)) AND (döm* OR straff*) as many trancriptions are OCR/HTR AI based with common errors. Also account for old swedish i.e (((präst* OR prest*) OR (kyrko* OR kyrck*)) AND ((silver* OR silfv*) OR (guld* OR gull*)))
    
        Proximity guide:
    
            Use quotes around the search terms
    
            "term1 term2"~N ✅
            term1 term2~N ❌
    
            Only 2 terms work reliably
    
            "kyrka stöld"~10 ✅
            "kyrka silver stöld"~10 ❌
    
            The number indicates maximum word distance
    
            ~3 = within 3 words
            ~10 = within 10 words
            ~50 = within 50 words
    
            📊 Working Examples by Category:
            Crime & Punishment:
            "tredje stöld"~5           # Third-time theft
            "dömd hänga"~10            # Sentenced to hang  
            "inbrott natt*"~5          # Burglary at night
            "kyrka stöld"~10           # Church theft
            Values & Items:
            "hundra daler"~3           # Hundred dalers
            "stor* stöld*"~5           # Major theft
            "guld* ring*"~10           # Gold ring
            "silver* kalk*"~10         # Silver chalice
            Complex Combinations:
            ("kyrka stöld"~10 OR "kyrka tjuv*"~10) AND 17*
            # Church thefts or church thieves in 1700s
    
            ("inbrott natt*"~5) AND (guld* OR silver*)  
            # Night burglaries involving gold or silver
    
            ("första resan" AND stöld*) OR ("tredje stöld"~5)
            # First-time theft OR third theft (within proximity)
            🔧 Troubleshooting Tips:
            If proximity search returns no results:
    
            Check your quotes - Must wrap both terms
            Reduce to 2 terms - Drop extra words
            Try exact terms first - Before wildcards
            Increase distance - Try ~10 instead of ~3
            Simplify wildcards - Use on one term only
    
            💡 Advanced Strategy:
            Layer your searches from simple to complex:
            Step 1: "kyrka stöld"~10
            Step 2: ("kyrka stöld"~10 OR "kyrka tjuv*"~10)
            Step 3: (("kyrka stöld"~10 OR "kyrka tjuv*"~10) AND 17*)
            Step 4: (("kyrka stöld"~10 OR "kyrka tjuv*"~10) AND 17*) AND (guld* OR silver*)
            Most Reliable Proximity Patterns:
    
            Exact + Exact: "hundra daler"~3
            Exact + Wildcard: "inbrott natt*"~5
            Wildcard + Wildcard (sometimes): "stor* stöld*"~5
    
            The key is that proximity operators in this system work best with exactly 2 terms in quotes, and you can then combine multiple proximity searches using Boolean operators outside the quotes!
    
    
    
        Parameters:
        - keyword: Search term or Solr query (required)
        - offset: Starting position for pagination - use 0, then 50, 100, etc. (required)
        - max_results: Maximum documents to return per query (default: 10)
        - max_hits_per_document: Maximum matching pages per document (default: 3)
        - max_response_tokens: Maximum tokens in response (default: 15000)
    
        Best practices:
        - Start with offset=0 and increase by 50 to discover all matches
        - Search related terms and variants for comprehensive coverage
        - Use wildcards (*) for word variations: "troll*" finds "trolldom", "trolleri", "trollkona"
        - Use fuzzy search (~) for historical spelling variants
        - Use browse_document tool to view full page transcriptions of interesting results
        """,
    )
  • Primary handler function for the search_transcribed tool. Instantiates services, calls search_operations.search_transcribed, formats results using SearchDisplayService, applies token limits and pagination info, handles exceptions.
    async def search_transcribed(
        keyword: str,
        offset: int,
        max_results: int = 50,
        max_hits_per_document: int = 3,
        max_response_tokens: int = 15000,
    ) -> str:
        try:
            search_operations = SearchOperations(http_client=default_http_client)
            search_display_service = SearchDisplayService(formatter=PlainTextFormatter())
    
            search_result = search_operations.search_transcribed(
                keyword=keyword,
                offset=offset,
                max_results=max_results,
                max_hits_per_document=max_hits_per_document,
            )
    
            formatted_results = search_display_service.format_search_results(
                search_result,
                maximum_documents_to_display=max_results,
            )
    
            formatted_results = _apply_token_limit_if_needed(formatted_results, max_response_tokens)
    
            formatted_results = _append_pagination_info_if_needed(formatted_results, search_result, offset, max_results)
    
            return formatted_results
    
        except Exception as e:
            return format_error_message(
                f"Search failed: {str(e)}",
                error_suggestions=[
                    "Try a simpler search term",
                    "Check if the service is available",
                    "Reduce max_results",
                ],
            )
  • SearchOperations.search_transcribed: Wrapper that calls SearchAPI.search_transcribed_text and constructs SearchResult object.
    def search_transcribed(
        self,
        keyword: str,
        offset: int = 0,
        max_results: int = 10,
        max_hits_per_document: Optional[int] = None,
    ) -> SearchResult:
        """Search for transcribed text across document collections.
    
        Executes a keyword search across all transcribed documents in the Riksarkivet
        collections.
    
        Args:
            keyword: Search term or phrase to look for in transcribed text.
            offset: Number of results to skip for pagination.
            max_results: Maximum number of documents to return.
            max_hits_per_document: Limit hits per document (None for unlimited).
    
        Returns:
            SearchResult containing search hits, total count, and metadata.
        """
        # Execute search and build operation in one step
        hits, total_hits = self.search_api.search_transcribed_text(keyword, max_results, offset, max_hits_per_document)
    
        search_result = SearchResult(
            hits=hits,
            total_hits=total_hits,
            keyword=keyword,
            offset=offset,
            enriched=False,
        )
    
        return search_result
  • Core implementation in SearchAPI: Builds search parameters, executes HTTP request to Riksarkivet search API, processes JSON response into list of SearchHit objects and total count.
    def search_transcribed_text(
        self,
        search_keyword: str,
        maximum_documents: int = DEFAULT_MAX_RESULTS,
        pagination_offset: int = 0,
        maximum_hits_per_document: Optional[int] = None,
    ) -> Tuple[List[SearchHit], int]:
        """Fast search for keyword in transcribed materials.
    
        Args:
            keyword: Search term
            max_results: Maximum number of documents to fetch from API
            offset: Pagination offset
            max_hits_per_document: Maximum number of page hits to return per document (None = all)
    
        Returns:
            tuple: (list of SearchHit objects, total number of results)
        """
        search_parameters = self._build_search_parameters(search_keyword, maximum_documents, pagination_offset)
    
        try:
            search_result_data = self._execute_search_request(search_parameters)
    
            retrieved_documents = self._extract_documents_from_response(search_result_data, maximum_documents)
    
            collected_search_hits = self._collect_hits_from_documents(retrieved_documents, maximum_hits_per_document)
    
            total_available_results = search_result_data.get("totalHits", len(collected_search_hits))
    
            return collected_search_hits, total_available_results
    
        except Exception as error:
            raise Exception(f"Search failed: {error}") from error
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure and does so comprehensively. It explains the search capabilities (Solr query syntax), return format (document metadata, page numbers, text snippets), pagination behavior, and provides extensive examples and troubleshooting guidance for effective use.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

While the description is comprehensive and well-structured with clear sections, it is extremely lengthy with extensive examples, troubleshooting tips, and advanced strategies that could be condensed. The core information is front-loaded, but the overall length exceeds what's typically needed for effective tool selection.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of the search functionality, 0% schema description coverage, no annotations, and the presence of an output schema, the description provides complete contextual information. It covers purpose, usage, parameters, behavioral characteristics, examples, troubleshooting, and integration with sibling tools, making it fully self-contained for an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by providing detailed semantic explanations for all parameters. It explains 'keyword' supports advanced Solr syntax with extensive examples, 'offset' for pagination with specific increments, and provides default values and usage guidance for all other parameters in the 'Best practices' section.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches for keywords in transcribed historical documents from the Swedish National Archives, specifying it returns matching pages with transcriptions. It distinguishes from the sibling 'browse_document' tool by focusing on search functionality rather than viewing full transcriptions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool vs alternatives, mentioning to use 'browse_document tool to view full page transcriptions of interesting results.' It also includes troubleshooting tips and advanced strategy sections that guide users on effective usage patterns.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/AI-Riksarkivet/ra-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server