ra-mcp (WIP)

MCPs for Riksarkivet

A MCP server and command-line tools for searching and browsing transcribed historical documents from the Swedish National Archives (Riksarkivet).

Features

Full-text search across millions of transcribed historical documents
Complete page transcriptions with accurate text extraction from historical manuscripts
Reference-based document browsing using official archive reference codes
Contextual search highlighting to identify relevant content quickly
High-resolution image access to original document scans via IIIF

Getting Started

Quick Setup

# Search for anything uv run ra search "Stockholm"

How to Use

1. Search for Keywords

Find documents containing specific words or phrases:

# Basic search uv run ra search "Stockholm" # Search with full page transcriptions uv run ra search "trolldom" --context --max-pages 5 # Search with surrounding pages for context uv run ra search "trolldom" --context --context-padding 1 --max-pages 3 # Search without document grouping uv run ra search "vasa" --context --no-grouping --max-pages 3

Options:

--max N - Maximum search results (default: 50)
--max-display N - Maximum results to display (default: 20)
--context - Show full page transcriptions
--max-pages N - Maximum pages to load context for (default: 10)
--context-padding N - Include N pages before/after each hit for context (default: 0)
--no-grouping - Show pages individually instead of grouped by document

2. Browse Specific Documents

When you find interesting documents, browse them directly:

# View single page uv run ra browse "SE/RA/123" --page 5 # View page range uv run ra browse "SE/RA/123" --pages "1-10" # View specific pages with search highlighting uv run ra browse "SE/RA/123" --page "5,7,9" --search-term "Stockholm"

Options:

--page or --pages - Page numbers (e.g., "5", "1-10", "5,7,9")
--search-term - Highlight this term in the text
--max-display N - Maximum pages to display (default: 20)

3. Search with Full Context

The --context flag shows complete page transcriptions instead of just snippets:

# Search with full page transcriptions uv run ra search "Stockholm" --context --max-pages 5 # Include surrounding pages for additional context uv run ra search "trolldom" --context --context-padding 2 # Show pages individually instead of grouped by document uv run ra search "vasa" --context --no-grouping

Output Features

🔍 Search Results

When you run a search, results are presented with:

Document grouping - Related pages grouped together for context
Institution & dates - Archive location and document dates
Page numbers - Specific pages containing your search terms
Highlighted snippets - Preview text with keywords emphasized
Browse commands - Ready-to-run commands for deeper exploration

Example output:

Document: SE/RA/310187/1 - Kommissorialrätt i Stockholm ang. trolldom Institution: Riksarkivet i Stockholm/Täby | Date: 1676 - 1677 ├─ Page 2: "...Kommissorialrätt i Stockholm ang. **trolldom** 1676..." ├─ Page 7: "...som sig medh någon klagomåhl öfwer detta **Trolldoms** wäsende..." ├─ Page 8: "...till hemmande af denne **Trolldoms** Sundh på åthskillige orter..." ├─ Page 52: "...hustru Anna förklarades skyldig till **trolldom** och förde..." └─ Page 72: "...bekände han sig hafwa brukat **trolldom** emot sina fiender..." Browse commands: uv run ra browse "SE/RA/310187/1" --page 7 --search-term "trolldom" uv run ra browse "SE/RA/310187/1" --pages "2,7,8,52,72" --search-term "trolldom"

📄 Full Page Display

With the --context flag, you get complete page transcriptions featuring:

Full text transcriptions - Complete page content from ALTO XML
Keyword highlighting - Your search terms highlighted in yellow
Rich metadata - Document titles, dates, and archive hierarchy
Direct access links - Quick links to images, XML, and interactive viewer
Context indicators - Clear marking of surrounding pages when using --context-padding

Example output:

═══ SE/RA/310187/1 - Page 7 ═══ Title: Kommissorialrätt i Stockholm ang. trolldom Date: 1676-1677 | Institution: Riksarkivet i Stockholm/Täby skäligt sin emillan förafskeda, och det eftter Kongl. Senarens förordning, att alla dhe, som sig medh någon klagomåhl öfwer detta **Trolldoms** wäsende angifwa wela, ther medh skola inställa sigh för höga öfwerheten, huilket alt Högwälborne Herr General Leutnanten och Gouverneuren medh dhe brefwen han oss tillsändt hafwer, jämwäl och i muntel samptahl medh oss sigh förklarat, som wij och nu wid thetta tillfället Commissionen till hemmande af denne **Trolldoms** Sundh på åthskillige orter... Links: 📄 ALTO XML: https://sok.riksarkivet.se/dokument/alto/SE_RA_310187_1_007.xml 🖼️ Image: https://lbiiif.riksarkivet.se/arkiv/SE_RA_310187_1_007.jpg 🔍 Bildvisning: https://sok.riksarkivet.se/bildvisning/SE_RA_310187_1#007

🔗 Available Resources

Each result provides direct access to:

Resource	Description	Use Case
ALTO XML	Structured transcription data with precise positioning	Text analysis, data extraction
IIIF Images	High-resolution document scans with zoom/crop support	Visual inspection, citations
Bildvisning	Interactive web viewer with search highlighting	Online browsing, sharing
Collections	IIIF metadata for document series	Understanding document context

Examples

Basic Workflow

Search for a keyword:
uv run ra search "Stockholm"
Get full context for interesting hits:
uv run ra search "Stockholm" --context --max-pages 3
Include surrounding pages for additional context:
uv run ra search "Stockholm" --context --context-padding 1 --max-pages 3
Browse specific documents:
uv run ra browse "SE/RA/123456" --page "10-15" --search-term "Stockholm"

Advanced Usage

# Comprehensive search with context and surrounding pages uv run ra search "trolldom" --context --context-padding 2 --max-pages 8 # Targeted document browsing uv run ra browse "SE/RA/760264" --pages "1,5,10-12" --search-term "trolldom" # Large search with selective display uv run ra search "trolldom" --max 100 --max-display 30

Technical Details

Riksarkivet APIs & Data Sources

This tool integrates with multiple Riksarkivet APIs to provide comprehensive access to historical documents:

Current Integrations

Search API - Primary endpoint for full-text search across transcribed materials (Documentation)
IIIF Collections - Access to digitized document collections via IIIF standard (Documentation)
ALTO XML - Structured text transcriptions with precise positioning data
IIIF Images - High-resolution document images with zoom and cropping capabilities
Bildvisning - Interactive document viewer with search highlighting
OAI-PMH - Metadata harvesting for archive records and references (Documentation)

Additional Resources

The Riksarkivet Data Platform Wiki provides comprehensive documentation for building additional MCP integrations.

Experimental Features

Förvaltningshistorik - Semantic search interface (under evaluation)
AI-Riksarkivet HTRflow - Handwritten text recognition pipeline (PyPI package)

Troubleshooting

Common Issues

No results found: Try broader search terms or check spelling
Page not loading: Some pages may not have transcriptions available
Network timeouts: Tool includes retry logic, but very slow connections may time out

Getting Help

uv run ra --help uv run ra search --help uv run ra browse --help uv run ra serve --help

MCP Server Development

Running the MCP Server

# Install dependencies uv sync && uv pip install -e . # Run the main MCP server (stdio) cd src/ra_mcp && python server.py # Run with SSE/HTTP transport on port 8000 cd src/ra_mcp && python server.py --http

Testing with MCP Inspector

Use the MCP Inspector to test and debug the MCP server:

# Test the server interactively npx @modelcontextprotocol/inspector uv run python src/ra_mcp/server.py

The MCP Inspector provides a web interface to test server tools, resources, and prompts during development.

Building and Publishing with Dagger

The project uses Dagger for containerized builds and publishing to Docker registries. Pre-built images are available on Docker Hub.

Prerequisites

Dagger CLI installed
Docker registry credentials (for publishing)

Available Commands

Build locally:

dagger call build

Run tests:

dagger call test

Build and publish to Docker registry:

# Set environment variables export DOCKER_USERNAME="your-username" export DOCKER_PASSWORD="your-password" # Build and publish dagger call publish \ --docker-username=env:DOCKER_USERNAME \ --docker-password=env:DOCKER_PASSWORD \ --image-repository="riksarkivet/ra-mcp" \ --tag="latest" \ --source=.

Available Dagger Functions

build: Creates a production-ready container image using the Dockerfile
test: Runs the test suite using pytest with coverage reporting
publish: Builds and publishes container image to registry with authentication
build-local: Build with custom environment variables and registry settings

The Dagger configuration is located in .dagger/main.go and provides a complete CI/CD pipeline for the project.

Current MCP Server Implementation

The MCP server provides access to transcribed historical documents from the Swedish National Archives (Riksarkivet) through three primary tools and two resources:

🔧 Available Tools

1. search_transcribed

Search for keywords in transcribed materials with pagination support.

search_transcribed( keyword="trolldom", # Search term offset=0, # Pagination offset (required) show_context=False, # Full page text (default: False for more results) max_results=10, # Maximum results to return max_hits_per_document=3 # Max hits per document )

2. browse_document

Browse specific pages of a document by reference code.

browse_document( reference_code="SE/RA/310187/1", # Document reference pages="7,8,52", # Page numbers or ranges highlight_term="trolldom", # Optional keyword highlighting max_pages=20 # Maximum pages to display )

3. get_document_structure

Get document structure and metadata without fetching content.

get_document_structure( reference_code="SE/RA/310187/1", # Document reference (or use pid) include_manifest_info=True # Include IIIF manifest details )

📚 Available Resources

riksarkivet://contents/table_of_contents - Complete guide index (Innehållsförteckning)
riksarkivet://guide/{filename} - Specific guide sections (e.g., '01_Domstolar.md', '02_Fangelse.md')

🔄 Typical Workflow

Search → search_transcribed("trolldom", offset=0) to find relevant documents
Paginate → Continue with offset=50, 100, 150... for comprehensive discovery
Browse → Use browse_document() to view specific pages with full transcriptions
Structure → Use get_document_structure() to understand document organization

💡 Search Strategy Tips

Start with show_context=False to maximize hit coverage
Use pagination (increasing offsets) to find all matches
Enable show_context=True only when you need full page text for specific hits
Browse specific pages for detailed examination with keyword highlighting