This tool searches through historical documents and returns matching pages with their transcriptions.
Supports advanced Solr query syntax including wildcards, fuzzy search, Boolean operators, and proximity searches.
Key features:
- Returns document metadata, page numbers, and text snippets containing the keyword
- Provides direct links to page images and ALTO XML transcriptions
- Supports pagination via offset parameter for comprehensive discovery
- Advanced search syntax for precise queries
Search syntax examples:
- Basic: "Stockholm" - exact term search
- Wildcards: "Stock*", "St?ckholm", "*holm" - match patterns
- Fuzzy: "Stockholm~" or "Stockholm~1" - find similar words (typos, variants)
- Proximity: '"Stockholm trolldom"~10' - words within 10 words of each other
- Boolean: "(Stockholm AND trolldom)", "(Stockholm OR Göteborg)", "(Stockholm NOT trolldom)"
- Boosting: "Stockholm^4 trol*" - increase relevance of specific terms
- Complex: "((troll* OR häx*) AND (Stockholm OR Göteborg))" - combine operators
NOTE: make sure to use grouping () for any boolean search also "" is important to group multiple words
E.g do '((skatt* OR guld* OR silver*) AND (stöld* OR stul*))' instead of '(skatt* OR guld* OR silver*) AND (stöld* OR stul*)', i.e prefer grouping as that will retrun results, non-grouping will return 0 results
also prefer to use fuzzy search i.e. something like ((stöld~2 OR tjufnad~2) AND (silver* OR guld*)) AND (döm* OR straff*) as many trancriptions are OCR/HTR AI based with common errors. Also account for old swedish i.e (((präst* OR prest*) OR (kyrko* OR kyrck*)) AND ((silver* OR silfv*) OR (guld* OR gull*)))
Proximity guide:
Use quotes around the search terms
"term1 term2"~N ✅
term1 term2~N ❌
Only 2 terms work reliably
"kyrka stöld"~10 ✅
"kyrka silver stöld"~10 ❌
The number indicates maximum word distance
~3 = within 3 words
~10 = within 10 words
~50 = within 50 words
📊 Working Examples by Category:
Crime & Punishment:
"tredje stöld"~5 # Third-time theft
"dömd hänga"~10 # Sentenced to hang
"inbrott natt*"~5 # Burglary at night
"kyrka stöld"~10 # Church theft
Values & Items:
"hundra daler"~3 # Hundred dalers
"stor* stöld*"~5 # Major theft
"guld* ring*"~10 # Gold ring
"silver* kalk*"~10 # Silver chalice
Complex Combinations:
("kyrka stöld"~10 OR "kyrka tjuv*"~10) AND 17*
# Church thefts or church thieves in 1700s
("inbrott natt*"~5) AND (guld* OR silver*)
# Night burglaries involving gold or silver
("första resan" AND stöld*) OR ("tredje stöld"~5)
# First-time theft OR third theft (within proximity)
🔧 Troubleshooting Tips:
If proximity search returns no results:
Check your quotes - Must wrap both terms
Reduce to 2 terms - Drop extra words
Try exact terms first - Before wildcards
Increase distance - Try ~10 instead of ~3
Simplify wildcards - Use on one term only
💡 Advanced Strategy:
Layer your searches from simple to complex:
Step 1: "kyrka stöld"~10
Step 2: ("kyrka stöld"~10 OR "kyrka tjuv*"~10)
Step 3: (("kyrka stöld"~10 OR "kyrka tjuv*"~10) AND 17*)
Step 4: (("kyrka stöld"~10 OR "kyrka tjuv*"~10) AND 17*) AND (guld* OR silver*)
Most Reliable Proximity Patterns:
Exact + Exact: "hundra daler"~3
Exact + Wildcard: "inbrott natt*"~5
Wildcard + Wildcard (sometimes): "stor* stöld*"~5
The key is that proximity operators in this system work best with exactly 2 terms in quotes, and you can then combine multiple proximity searches using Boolean operators outside the quotes!
Parameters:
- keyword: Search term or Solr query (required)
- offset: Starting position for pagination - use 0, then 50, 100, etc. (required)
- max_results: Maximum documents to return per query (default: 10)
- max_hits_per_document: Maximum matching pages per document (default: 3)
- max_response_tokens: Maximum tokens in response (default: 15000)
Best practices:
- Start with offset=0 and increase by 50 to discover all matches
- Search related terms and variants for comprehensive coverage
- Use wildcards (*) for word variations: "troll*" finds "trolldom", "trolleri", "trollkona"
- Use fuzzy search (~) for historical spelling variants
- Use browse_document tool to view full page transcriptions of interesting results