ra-mcp (WIP)
MCPs for Riksarkivet
A MCP server and command-line tools for searching and browsing transcribed historical documents from the Swedish National Archives (Riksarkivet).
Features
Full-text search across millions of transcribed historical documents
Complete page transcriptions with accurate text extraction from historical manuscripts
Reference-based document browsing using official archive reference codes
Contextual search highlighting to identify relevant content quickly
High-resolution image access to original document scans via IIIF
Getting Started
Quick Setup
How to Use
1. Search for Keywords
Find documents containing specific words or phrases:
Options:
--max N
- Maximum search results (default: 50)--max-display N
- Maximum results to display (default: 20)--context
- Show full page transcriptions--max-pages N
- Maximum pages to load context for (default: 10)--context-padding N
- Include N pages before/after each hit for context (default: 0)--no-grouping
- Show pages individually instead of grouped by document
2. Browse Specific Documents
When you find interesting documents, browse them directly:
Options:
--page
or--pages
- Page numbers (e.g., "5", "1-10", "5,7,9")--search-term
- Highlight this term in the text--max-display N
- Maximum pages to display (default: 20)
3. Search with Full Context
The --context
flag shows complete page transcriptions instead of just snippets:
Output Features
🔍 Search Results
When you run a search, results are presented with:
Document grouping - Related pages grouped together for context
Institution & dates - Archive location and document dates
Page numbers - Specific pages containing your search terms
Highlighted snippets - Preview text with keywords emphasized
Browse commands - Ready-to-run commands for deeper exploration
Example output:
📄 Full Page Display
With the --context
flag, you get complete page transcriptions featuring:
Full text transcriptions - Complete page content from ALTO XML
Keyword highlighting - Your search terms highlighted in yellow
Rich metadata - Document titles, dates, and archive hierarchy
Direct access links - Quick links to images, XML, and interactive viewer
Context indicators - Clear marking of surrounding pages when using
--context-padding
Example output:
🔗 Available Resources
Each result provides direct access to:
Resource | Description | Use Case |
ALTO XML | Structured transcription data with precise positioning | Text analysis, data extraction |
IIIF Images | High-resolution document scans with zoom/crop support | Visual inspection, citations |
Bildvisning | Interactive web viewer with search highlighting | Online browsing, sharing |
Collections | IIIF metadata for document series | Understanding document context |
Examples
Basic Workflow
Search for a keyword:
uv run ra search "Stockholm"Get full context for interesting hits:
uv run ra search "Stockholm" --context --max-pages 3Include surrounding pages for additional context:
uv run ra search "Stockholm" --context --context-padding 1 --max-pages 3Browse specific documents:
uv run ra browse "SE/RA/123456" --page "10-15" --search-term "Stockholm"
Advanced Usage
Technical Details
Riksarkivet APIs & Data Sources
This tool integrates with multiple Riksarkivet APIs to provide comprehensive access to historical documents:
Current Integrations
Search API - Primary endpoint for full-text search across transcribed materials (Documentation)
IIIF Collections - Access to digitized document collections via IIIF standard (Documentation)
ALTO XML - Structured text transcriptions with precise positioning data
IIIF Images - High-resolution document images with zoom and cropping capabilities
Bildvisning - Interactive document viewer with search highlighting
OAI-PMH - Metadata harvesting for archive records and references (Documentation)
Additional Resources
The Riksarkivet Data Platform Wiki provides comprehensive documentation for building additional MCP integrations.
Experimental Features
Förvaltningshistorik - Semantic search interface (under evaluation)
AI-Riksarkivet HTRflow - Handwritten text recognition pipeline (PyPI package)
Troubleshooting
Common Issues
No results found: Try broader search terms or check spelling
Page not loading: Some pages may not have transcriptions available
Network timeouts: Tool includes retry logic, but very slow connections may time out
Getting Help
MCP Server Development
Running the MCP Server
Testing with MCP Inspector
Use the MCP Inspector to test and debug the MCP server:
The MCP Inspector provides a web interface to test server tools, resources, and prompts during development.
Building and Publishing with Dagger
The project uses Dagger for containerized builds and publishing to Docker registries. Pre-built images are available on Docker Hub.
Prerequisites
Dagger CLI installed
Docker registry credentials (for publishing)
Available Commands
Build locally:
Run tests:
Build and publish to Docker registry:
Available Dagger Functions
build
: Creates a production-ready container image using the Dockerfiletest
: Runs the test suite using pytest with coverage reportingpublish
: Builds and publishes container image to registry with authenticationbuild-local
: Build with custom environment variables and registry settings
The Dagger configuration is located in .dagger/main.go
and provides a complete CI/CD pipeline for the project.
Current MCP Server Implementation
The MCP server provides access to transcribed historical documents from the Swedish National Archives (Riksarkivet) through three primary tools and two resources:
🔧 Available Tools
1. search_transcribed
Search for keywords in transcribed materials with pagination support.
2. browse_document
Browse specific pages of a document by reference code.
3. get_document_structure
Get document structure and metadata without fetching content.
📚 Available Resources
riksarkivet://contents/table_of_contents - Complete guide index (Innehållsförteckning)
riksarkivet://guide/{filename} - Specific guide sections (e.g., '01_Domstolar.md', '02_Fangelse.md')
🔄 Typical Workflow
Search →
search_transcribed("trolldom", offset=0)
to find relevant documentsPaginate → Continue with
offset=50, 100, 150...
for comprehensive discoveryBrowse → Use
browse_document()
to view specific pages with full transcriptionsStructure → Use
get_document_structure()
to understand document organization
💡 Search Strategy Tips
Start with
show_context=False
to maximize hit coverageUse pagination (increasing offsets) to find all matches
Enable
show_context=True
only when you need full page text for specific hitsBrowse specific pages for detailed examination with keyword highlighting
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
Enables users to search and access digital collections from the Swedish National Archives (Riksarkivet) through multiple APIs. Supports searching records by keywords, exploring collections, and downloading historical images and documents.