Riksarkivet MCP Server

README.md•15.3 KiB

<div align="center"> <img src="assets/logo-rm-bg.png" alt="RA-MCP Logo" width="350"> </div> # ra-mcp > **⚠️ Work in Progress (WIP)** > > This repository is the result of two hackathons and is currently under development. It's more of a proof of concept than a production-ready solution. The codebase, documentation, and build processes are being continuously refined. > > **Please note:** > - Expect changes and breaking updates > - APIs and interfaces may change without notice > - Use in production environments at your own risk > - Contributions and feedback are welcome as we work toward stability [![Tests](https://github.com/AI-Riksarkivet/ra-mcp/actions/workflows/ci.yml/badge.svg)](https://github.com/AI-Riksarkivet/ra-mcp/actions/workflows/ci.yml) [![Publish](https://github.com/AI-Riksarkivet/ra-mcp/actions/workflows/publish.yml/badge.svg)](https://github.com/AI-Riksarkivet/ra-mcp/actions/workflows/publish.yml) [![Secret Leaks](https://github.com/AI-Riksarkivet/ra-mcp/actions/workflows/trufflehog.yml/badge.svg)](https://github.com/AI-Riksarkivet/ra-mcp/actions/workflows/trufflehog.yml) ## MCPs for Riksarkivet A MCP server and command-line tools for searching and browsing transcribed historical documents from the Swedish National Archives (Riksarkivet). ## Features - **Full-text search** across millions of transcribed historical documents - **Complete page transcriptions** with accurate text extraction from historical manuscripts - **Reference-based document browsing** using official archive reference codes - **Contextual search highlighting** to identify relevant content quickly - **High-resolution image access** to original document scans via IIIF ## Getting Started ## MCP Adding ra-mcp with streamable http for ChatGPT or Claude: url: `https://riksarkivet-ra-mcp.hf.space/mcp` ### Claude Code ```bash claude mcp add --transport http ra-mcp https://riksarkivet-ra-mcp.hf.space/mcp ``` ### IDEs ```bash cat > mcp.json <<'EOF' { "mcpServers": { "ra-mcp": { "type": "streamable-http", "url": "https://riksarkivet-ra-mcp.hf.space/mcp", "note": "ra-mcp server (FastMCP) - via Streamable HTTP" } } } EOF ``` ## CLI Install cli ```bash uv pip install ra-mcp # or uv add ra-mcp ``` ## How to Use ### 1. Search for Keywords Find documents containing specific words or phrases: ```bash # Basic search uv run ra search "Stockholm" # Search with full page transcriptions uv run ra search "trolldom" --browse --max-pages 5 # Wildcard search - single character (?) uv run ra search "St?ckholm" # Matches "Stockholm", "Stäckholm", etc. # Wildcard search - multiple characters (*) uv run ra search "Stock*" # Matches "Stockholm", "Stocksund", "Stocken", etc. uv run ra search "St*holm" # Matches "Stockholm", "Strömholm", etc. uv run ra search "*holm" # Matches "Stockholm", "Söderholm", etc. # Fuzzy search - find similar words uv run ra search "Stockholm~" # Matches "Stockholm", "Stokholm", "Stokholms", etc. uv run ra search "Stockholm~1" # Matches "Stockholm", "Stokholm" (max edit distance: 1) # Proximity search - find words within distance uv run ra search '"Stockholm trolldom"~10' # "Stockholm" and "trolldom" within 10 words # Boosting terms - increase relevance of specific terms uv run ra search "Stockholm^4 trol*" # Boost "Stockholm" relevance with wildcard uv run ra search '("Stockholm dom*"^4 Reg*)' # Boost entire phrase with wildcard # Boolean operators - combine search terms uv run ra search "(Stockholm AND trolldom)" # Both terms required uv run ra search "(Stockholm OR Göteborg)" # Either term (or both) uv run ra search "(Stockholm NOT trolldom)" # Stockholm but not trolldom uv run ra search "+Stockholm -trolldom" # Require Stockholm, exclude trolldom # Grouping - create complex queries with sub-queries uv run ra search "((Stockholm OR Göteborg) AND troll*)" # Either city + häxprocess uv run ra search "((troll* OR häx*) AND (Stockholm OR Göteborg))" # Complex grouping ``` **Search Options:** - `--browse` - Show full page transcriptions - `--max N` - Maximum search results (default: 50) - `--max-display N` - Maximum results to display (default: 20) - `--max-pages N` - Maximum pages to load context for (default: 10) - `--max-hits-per-vol N` - Maximum hits to return per volume (default: 3) **Search Types:** | Type | Syntax | Example | Description | |------|--------|---------|-------------| | **Exact** | `"word"` | `"Stockholm"` | Find exact matches | | **Wildcard (single)** | `?` | `"St?ckholm"` | Matches any single character | | **Wildcard (multiple)** | `*` | `"Stock*"` | Matches zero or more characters | | **Fuzzy** | `~` | `"Stockholm~"` | Finds similar terms based on edit distance (default: 2) | | **Fuzzy (custom)** | `~N` | `"Stockholm~1"` | Finds similar terms with max edit distance N (0-2) | | **Proximity** | `"word1 word2"~N` | `"Stockholm trolldom"~10` | Finds terms within N words of each other | | **Boosting** | `^N` | `"Stockholm^4 trol*"` | Increases relevance of boosted term (default: 1) | | **Boolean AND** | `AND` or `&&` | `(Stockholm AND trolldom)` | Both terms must be present | | **Boolean OR** | `OR` or `\|\|` | `(Stockholm OR Göteborg)` | Either term (or both) must be present | | **Boolean NOT** | `NOT` or `!` | `(Stockholm NOT trolldom)` | First term without second term | | **Required/Exclude** | `+` / `-` | `+Stockholm -trolldom` | Require term (+) or exclude term (-) | | **Grouping** | `(...)` | `((Stockholm OR Göteborg) AND troll*)` | Group clauses to form sub-queries | ### 2. Browse Specific Documents When you find interesting documents, browse them directly: ```bash # View single page uv run ra browse "SE/RA/123" --page 5 # View page range uv run ra browse "SE/RA/123" --pages "1-10" # View specific pages with search highlighting uv run ra browse "SE/RA/123" --page "5,7,9" --search-term "Stockholm" ``` **Options:** - `--page` or `--pages` - Page numbers (e.g., "5", "1-10", "5,7,9") - `--search-term` - Highlight this term in the text - `--max-display N` - Maximum pages to display (default: 20) ### 3. Search with Full Context The `--browse` flag shows complete page transcriptions instead of just snippets: ```bash # Search with full page transcriptions uv run ra search "Stockholm" --browse --max-pages 5 ``` ## Output Features ### 🔍 Search Results When you run a search, results are presented with: - **Document grouping** - Related pages grouped together for context - **Institution & dates** - Archive location and document dates - **Page numbers** - Specific pages containing your search terms - **Highlighted snippets** - Preview text with keywords emphasized - **Browse commands** - Ready-to-run commands for deeper exploration **Example output:** ``` Document: SE/RA/310187/1 - Kommissorialrätt i Stockholm ang. trolldom Institution: Riksarkivet i Stockholm/Täby | Date: 1676 - 1677 ├─ Page 2: "... **trolldom** ..." ├─ Page 7: "... **Trolldoms** ..." ├─ Page 8: "... **Trolldoms**..." Browse commands: uv run ra browse "SE/RA/310187/1" --page 7 --search-term "trolldom" uv run ra browse "SE/RA/310187/1" --pages "2,7,8,52,72" --search-term "trolldom" ``` ### 📄 Full Page Display With the `--browse` flag, you get complete page transcriptions featuring: - **Full text transcriptions** - Complete page content from ALTO XML - **Keyword highlighting** - Your search terms highlighted in yellow - **Rich metadata** - Document titles, dates, and archive hierarchy - **Direct access links** - Quick links to images, XML, and interactive viewer **Example output:** ``` ═══ SE/RA/310187/1 - Page 7 ═══ Title: Kommissorialrätt i Stockholm ang. trolldom Date: 1676-1677 | Institution: Riksarkivet i Stockholm/Täby .... Links: 📄 ALTO XML: https://sok.riksarkivet.se/dokument/alto/SE_RA_310187_1_007.xml 🖼️ Image: https://lbiiif.riksarkivet.se/arkiv/SE_RA_310187_1_007.jpg 🔍 Bildvisning: https://sok.riksarkivet.se/bildvisning/SE_RA_310187_1#007 ``` ### 🔗 Available Resources Each result provides direct access to: | Resource | Description | Use Case | |----------|-------------|----------| | **ALTO XML** | Structured transcription data with precise positioning | Text analysis, data extraction | | **IIIF Images** | High-resolution document scans with zoom/crop support | Visual inspection, citations | | **Bildvisning** | Interactive web viewer with search highlighting | Online browsing, sharing | | **Collections** | IIIF metadata for document series | Understanding document context | ## Examples ### Basic Workflow 1. **Search for a keyword:** ```bash uv run ra search "Stockholm" ``` 2. **Get full context for interesting hits:** ```bash uv run ra search "Stockholm" --browse --max-pages 3 ``` 3. **Browse specific documents:** ```bash uv run ra browse "SE/RA/123456" --page "10-15" --search-term "Stockholm" ``` ### Advanced Usage ```bash # Comprehensive search with full page content uv run ra search "trolldom" --browse --max-pages 8 # Targeted document browsing uv run ra browse "SE/RA/760264" --pages "1,5,10-12" --search-term "trolldom" # Large search with selective display uv run ra search "trolldom" --max 100 --max-display 30 ``` ## Technical Details ### Riksarkivet APIs & Data Sources This tool integrates with multiple Riksarkivet APIs to provide comprehensive access to historical documents: #### Current Integrations - **[Search API](https://data.riksarkivet.se/api/records)** - Primary endpoint for full-text search across transcribed materials ([Documentation](https://github.com/Riksarkivet/dataplattform/wiki/Search-API)) - **[IIIF Collections](https://lbiiif.riksarkivet.se/collection/arkiv)** - Access to digitized document collections via IIIF standard ([Documentation](https://github.com/Riksarkivet/dataplattform/wiki/IIIF)) - **[ALTO XML](https://sok.riksarkivet.se/dokument/alto)** - Structured text transcriptions with precise positioning data - **[IIIF Images](https://lbiiif.riksarkivet.se)** - High-resolution document images with zoom and cropping capabilities - **[Bildvisning](https://sok.riksarkivet.se/bildvisning)** - Interactive document viewer with search highlighting - **[OAI-PMH](https://oai-pmh.riksarkivet.se/OAI)** - Metadata harvesting for archive records and references ([Documentation](https://github.com/Riksarkivet/dataplattform/wiki/OAI-PMH)) #### Additional Resources The [Riksarkivet Data Platform Wiki](https://github.com/Riksarkivet/dataplattform/wiki) provides comprehensive documentation for building additional MCP integrations. #### Experimental Features - **[Förvaltningshistorik](https://forvaltningshistorik.riksarkivet.se/Index.htm)** - Semantic search interface (under evaluation) - **[AI-Riksarkivet HTRflow](https://pypi.org/project/htrflow/)** - Handwritten text recognition pipeline (PyPI package) ## Troubleshooting ### Common Issues 1. **No results found**: Try broader search terms or check spelling 2. **Page not loading**: Some pages may not have transcriptions available 3. **Network timeouts**: Tool includes retry logic, but very slow connections may time out ### Getting Help ```bash uv run ra --help uv run ra search --help uv run ra browse --help uv run ra serve --help ``` ## MCP Server Development ```bash # clone repo git clone https://github.com/AI-Riksarkivet/ra-mcp.git ``` ### Running the MCP Server ```bash # Install dependencies uv sync && uv pip install -e . # Run the main MCP server (stdio) cd src/ra_mcp && uv run ra serve # Run with SSE/HTTP transport on port 8000 cd src/ra_mcp && uv run ra serve --http ``` ### Testing with MCP Inspector Use the [MCP Inspector](https://github.com/modelcontextprotocol/inspector) to test and debug the MCP server: ```bash # Test the server interactively npx @modelcontextprotocol/inspector uv run ra serve --http ``` The MCP Inspector provides a web interface to test server tools, resources, and prompts during development. ### Building and Publishing with Dagger The project uses Dagger for containerized builds and publishing to Docker registries. Pre-built images are available on [Docker Hub](https://hub.docker.com/r/riksarkivet/ra-mcp). #### Prerequisites - [Dagger CLI](https://docs.dagger.io/install) installed - Docker registry credentials (for publishing) #### Available Commands **Build locally:** ```bash dagger call build ``` **Run tests:** ```bash dagger call test ``` **Build and publish to Docker registry:** ```bash # Set environment variables export DOCKER_PASSWORD="your-password" # Build and publish dagger call publish \ --docker-username="username" \ --docker-password=env:DOCKER_PASSWORD \ --image-repository="riksarkivet/ra-mcp" \ --tag="latest" \ --source=. ``` #### Available Dagger Functions - `build`: Creates a production-ready container image using the Dockerfile - `test`: Runs the test suite using pytest with coverage reporting - `publish`: Builds and publishes container image to registry with authentication - `build-local`: Build with custom environment variables and registry settings The Dagger configuration is located in `.dagger/main.go` and provides a complete CI/CD pipeline for the project. ![image](https://github.com/user-attachments/assets/bde56408-5135-4a2a-baf3-f26c32fab9dc) ___ ## Current MCP Server Implementation The MCP server provides access to transcribed historical documents from the Swedish National Archives (Riksarkivet) through three primary tools and two resources: ### 🔧 Available Tools #### 1. **search_transcribed** Search for keywords in transcribed materials with pagination support. ```python search_transcribed( keyword="trolldom", # Search term offset=0, # Pagination offset (required) show_context=False, # Full page text (default: False for more results) max_results=10, # Maximum results to return max_hits_per_document=3 # Max hits per document ) ``` #### 2. **browse_document** Browse specific pages of a document by reference code. ```python browse_document( reference_code="SE/RA/310187/1", # Document reference pages="7,8,52", # Page numbers or ranges highlight_term="trolldom", # Optional keyword highlighting max_pages=20 # Maximum pages to display ) ``` ### 📚 Available Resources - **riksarkivet://contents/table_of_contents** - Complete guide index (Innehållsförteckning) - **riksarkivet://guide/{filename}** - Specific guide sections (e.g., '01_Domstolar.md', '02_Fangelse.md') ### 🔄 Typical Workflow 1. **Search** → `search_transcribed("trolldom", offset=0)` to find relevant documents 2. **Paginate** → Continue with `offset=50, 100, 150...` for comprehensive discovery 3. **Browse** → Use `browse_document()` to view specific pages with full transcriptions ### 💡 Search Strategy Tips - Start with `show_context=False` to maximize hit coverage - Use pagination (increasing offsets) to find all matches - Enable `show_context=True` only when you need full page text for specific hits - Browse specific pages for detailed examination with keyword highlighting ___

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/AI-Riksarkivet/ra-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•15.3 KiB