search_transcribed
Find specific keywords within transcribed historical documents from the Swedish National Archives to locate relevant information in archival records.
Instructions
Search for keywords in transcribed historical documents from Riksarkivet
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| context_padding | No | ||
| keyword | Yes | ||
| max_hits_per_document | No | ||
| max_pages_with_context | No | ||
| max_response_tokens | No | ||
| max_results | No | ||
| offset | Yes | ||
| show_context | No | ||
| truncate_page_text | No |
Implementation Reference
- src/ra_mcp/search_tools.py:195-233 (handler)The primary handler function for the MCP 'search_transcribed' tool. It initializes services, performs the search via SearchOperations, formats results with SearchDisplayService, applies token limits and pagination info, and handles errors.async def search_transcribed( keyword: str, offset: int, max_results: int = 50, max_hits_per_document: int = 3, max_response_tokens: int = 15000, ) -> str: try: search_operations = SearchOperations(http_client=default_http_client) search_display_service = SearchDisplayService(formatter=PlainTextFormatter()) search_result = search_operations.search_transcribed( keyword=keyword, offset=offset, max_results=max_results, max_hits_per_document=max_hits_per_document, ) formatted_results = search_display_service.format_search_results( search_result, maximum_documents_to_display=max_results, ) formatted_results = _apply_token_limit_if_needed(formatted_results, max_response_tokens) formatted_results = _append_pagination_info_if_needed(formatted_results, search_result, offset, max_results) return formatted_results except Exception as e: return format_error_message( f"Search failed: {str(e)}", error_suggestions=[ "Try a simpler search term", "Check if the service is available", "Reduce max_results", ], )
- src/ra_mcp/search_tools.py:90-194 (registration)The @search_mcp.tool decorator registers the 'search_transcribed' tool, providing its name, detailed description, usage instructions, search syntax examples, parameters schema (keyword, offset, max_results, etc.), and best practices.@search_mcp.tool( name="search_transcribed", description="""Search for keywords in transcribed historical documents from the Swedish National Archives (Riksarkivet). This tool searches through historical documents and returns matching pages with their transcriptions. Supports advanced Solr query syntax including wildcards, fuzzy search, Boolean operators, and proximity searches. Key features: - Returns document metadata, page numbers, and text snippets containing the keyword - Provides direct links to page images and ALTO XML transcriptions - Supports pagination via offset parameter for comprehensive discovery - Advanced search syntax for precise queries Search syntax examples: - Basic: "Stockholm" - exact term search - Wildcards: "Stock*", "St?ckholm", "*holm" - match patterns - Fuzzy: "Stockholm~" or "Stockholm~1" - find similar words (typos, variants) - Proximity: '\"Stockholm trolldom\"~10' - words within 10 words of each other - Boolean: "(Stockholm AND trolldom)", "(Stockholm OR Göteborg)", "(Stockholm NOT trolldom)" - Boosting: \"Stockholm^4 trol*\" - increase relevance of specific terms - Complex: "((troll* OR häx*) AND (Stockholm OR Göteborg))" - combine operators NOTE: make sure to use grouping () for any boolean search also \"\" is important to group multiple words E.g do '((skatt* OR guld* OR silver*) AND (stöld* OR stul*))' instead of '(skatt* OR guld* OR silver*) AND (stöld* OR stul*)', i.e prefer grouping as that will retrun results, non-grouping will return 0 results also prefer to use fuzzy search i.e. something like ((stöld~2 OR tjufnad~2) AND (silver* OR guld*)) AND (döm* OR straff*) as many trancriptions are OCR/HTR AI based with common errors. Also account for old swedish i.e (((präst* OR prest*) OR (kyrko* OR kyrck*)) AND ((silver* OR silfv*) OR (guld* OR gull*))) Proximity guide: Use quotes around the search terms "term1 term2"~N ✅ term1 term2~N ❌ Only 2 terms work reliably "kyrka stöld"~10 ✅ "kyrka silver stöld"~10 ❌ The number indicates maximum word distance ~3 = within 3 words ~10 = within 10 words ~50 = within 50 words 📊 Working Examples by Category: Crime & Punishment: "tredje stöld"~5 # Third-time theft "dömd hänga"~10 # Sentenced to hang "inbrott natt*"~5 # Burglary at night "kyrka stöld"~10 # Church theft Values & Items: "hundra daler"~3 # Hundred dalers "stor* stöld*"~5 # Major theft "guld* ring*"~10 # Gold ring "silver* kalk*"~10 # Silver chalice Complex Combinations: ("kyrka stöld"~10 OR "kyrka tjuv*"~10) AND 17* # Church thefts or church thieves in 1700s ("inbrott natt*"~5) AND (guld* OR silver*) # Night burglaries involving gold or silver ("första resan" AND stöld*) OR ("tredje stöld"~5) # First-time theft OR third theft (within proximity) 🔧 Troubleshooting Tips: If proximity search returns no results: Check your quotes - Must wrap both terms Reduce to 2 terms - Drop extra words Try exact terms first - Before wildcards Increase distance - Try ~10 instead of ~3 Simplify wildcards - Use on one term only 💡 Advanced Strategy: Layer your searches from simple to complex: Step 1: "kyrka stöld"~10 Step 2: ("kyrka stöld"~10 OR "kyrka tjuv*"~10) Step 3: (("kyrka stöld"~10 OR "kyrka tjuv*"~10) AND 17*) Step 4: (("kyrka stöld"~10 OR "kyrka tjuv*"~10) AND 17*) AND (guld* OR silver*) Most Reliable Proximity Patterns: Exact + Exact: "hundra daler"~3 Exact + Wildcard: "inbrott natt*"~5 Wildcard + Wildcard (sometimes): "stor* stöld*"~5 The key is that proximity operators in this system work best with exactly 2 terms in quotes, and you can then combine multiple proximity searches using Boolean operators outside the quotes! Parameters: - keyword: Search term or Solr query (required) - offset: Starting position for pagination - use 0, then 50, 100, etc. (required) - max_results: Maximum documents to return per query (default: 10) - max_hits_per_document: Maximum matching pages per document (default: 3) - max_response_tokens: Maximum tokens in response (default: 15000) Best practices: - Start with offset=0 and increase by 50 to discover all matches - Search related terms and variants for comprehensive coverage - Use wildcards (*) for word variations: "troll*" finds "trolldom", "trolleri", "trollkona" - Use fuzzy search (~) for historical spelling variants - Use browse_document tool to view full page transcriptions of interesting results """, )
- Core search logic helper called by the handler: invokes SearchAPI.search_transcribed_text and constructs SearchResult object.def search_transcribed( self, keyword: str, offset: int = 0, max_results: int = 10, max_hits_per_document: Optional[int] = None, ) -> SearchResult: """Search for transcribed text across document collections. Executes a keyword search across all transcribed documents in the Riksarkivet collections. Args: keyword: Search term or phrase to look for in transcribed text. offset: Number of results to skip for pagination. max_results: Maximum number of documents to return. max_hits_per_document: Limit hits per document (None for unlimited). Returns: SearchResult containing search hits, total count, and metadata. """ # Execute search and build operation in one step hits, total_hits = self.search_api.search_transcribed_text(keyword, max_results, offset, max_hits_per_document) search_result = SearchResult( hits=hits, total_hits=total_hits, keyword=keyword, offset=offset, enriched=False, ) return search_result
- Low-level API client method that performs the actual HTTP search request to Riksarkivet API, parses response, and creates SearchHit objects from transcribed text snippets.def search_transcribed_text( self, search_keyword: str, maximum_documents: int = DEFAULT_MAX_RESULTS, pagination_offset: int = 0, maximum_hits_per_document: Optional[int] = None, ) -> Tuple[List[SearchHit], int]: """Fast search for keyword in transcribed materials. Args: keyword: Search term max_results: Maximum number of documents to fetch from API offset: Pagination offset max_hits_per_document: Maximum number of page hits to return per document (None = all) Returns: tuple: (list of SearchHit objects, total number of results) """ search_parameters = self._build_search_parameters(search_keyword, maximum_documents, pagination_offset) try: search_result_data = self._execute_search_request(search_parameters) retrieved_documents = self._extract_documents_from_response(search_result_data, maximum_documents) collected_search_hits = self._collect_hits_from_documents(retrieved_documents, maximum_hits_per_document) total_available_results = search_result_data.get("totalHits", len(collected_search_hits)) return collected_search_hits, total_available_results except Exception as error: raise Exception(f"Search failed: {error}") from error