Skip to main content
Glama

Oxenstierna

README.md8.29 kB
# ra-mcp (WIP) ## MCPs for Riksarkivet A MCP server and command-line tools for searching and browsing transcribed historical documents from the Swedish National Archives (Riksarkivet). ## Features - **Full-text search** across millions of transcribed historical documents - **Complete page transcriptions** with accurate text extraction from historical manuscripts - **Reference-based document browsing** using official archive reference codes - **Contextual search highlighting** to identify relevant content quickly - **High-resolution image access** to original document scans via IIIF ## Getting Started ### Quick Setup ```bash # Search for anything - uv will auto-install dependencies uv run tools/ra.py search "Stockholm" ``` ## How to Use ### 1. Search for Keywords Find documents containing specific words or phrases: ```bash # Basic search uv run tools/ra.py search "Stockholm" # Search with full page transcriptions uv run tools/ra.py search "trolldom" --context --max-pages 5 # Search without document grouping uv run tools/ra.py search "vasa" --context --no-grouping --max-pages 3 ``` **Options:** - `--max N` - Maximum search results (default: 50) - `--max-display N` - Maximum results to display (default: 20) - `--context` - Show full page transcriptions - `--max-pages N` - Maximum pages to load context for (default: 10) - `--no-grouping` - Show pages individually instead of grouped by document ### 2. Browse Specific Documents When you find interesting documents, browse them directly: ```bash # View single page uv run tools/ra.py browse "SE/RA/123" --page 5 # View page range uv run tools/ra.py browse "SE/RA/123" --pages "1-10" # View specific pages with search highlighting uv run tools/ra.py browse "SE/RA/123" --page "5,7,9" --search-term "Stockholm" ``` **Options:** - `--page` or `--pages` - Page numbers (e.g., "5", "1-10", "5,7,9") - `--search-term` - Highlight this term in the text - `--max-display N` - Maximum pages to display (default: 20) ### 3. Get Full Context See complete pages with surrounding context for better understanding: ```bash # Find pages with keyword and show full transcriptions uv run tools/ra.py show-pages "Stockholm" --max-pages 5 # Include surrounding pages for context uv run tools/ra.py show-pages "trolldom" --context-padding 2 # Show pages individually uv run tools/ra.py show-pages "vasa" --no-grouping ``` **Options:** - `--max-pages N` - Maximum pages to display (default: 10) - `--context-padding N` - Include N pages before/after each hit (default: 1) - `--no-grouping` - Show pages individually instead of grouped by document ## Output Features ### Search Results - **Grouped by document** for better context - **Institution and date** information - **Page numbers** with search hits - **Snippet previews** with keyword highlighting - **Browse command examples** for further exploration ### Full Page Display - **Complete transcriptions** from ALTO XML - **Keyword highlighting** in yellow - **Document metadata** (title, date, hierarchy) - **Direct links** to images, ALTO XML, and Bildvisning - **Context pages** marked clearly ### Links Provided - **ALTO XML** - Full transcription data - **IIIF Images** - High-resolution document images - **Bildvisning** - Interactive viewer with search highlighting - **Collections & Manifests** - IIIF metadata ## Examples ### Basic Workflow 1. **Search for a keyword:** ```bash uv run tools/ra.py search "Stockholm" ``` 2. **Get full context for interesting hits:** ```bash uv run tools/ra.py search "Stockholm" --context --max-pages 3 ``` 3. **Browse specific documents:** ```bash uv run tools/ra.py browse "SE/RA/123456" --page "10-15" --search-term "Stockholm" ``` ### Advanced Usage ```bash # Comprehensive search with context uv run tools/ra.py show-pages "handelsbalansen" --context-padding 2 --max-pages 8 # Targeted document browsing uv run tools/ra.py browse "SE/RA/760264" --pages "1,5,10-12" --search-term "export" # Large search with selective display uv run tools/ra.py search "järnväg" --max 100 --max-display 30 ``` ## Technical Details ### Riksarkivet APIs & Data Sources This tool integrates with multiple Riksarkivet APIs to provide comprehensive access to historical documents: #### Current Integrations - **[Search API](https://data.riksarkivet.se/api/records)** - Primary endpoint for full-text search across transcribed materials ([Documentation](https://github.com/Riksarkivet/dataplattform/wiki/Search-API)) - **[IIIF Collections](https://lbiiif.riksarkivet.se/collection/arkiv)** - Access to digitized document collections via IIIF standard ([Documentation](https://github.com/Riksarkivet/dataplattform/wiki/IIIF)) - **[ALTO XML](https://sok.riksarkivet.se/dokument/alto)** - Structured text transcriptions with precise positioning data - **[IIIF Images](https://lbiiif.riksarkivet.se)** - High-resolution document images with zoom and cropping capabilities - **[Bildvisning](https://sok.riksarkivet.se/bildvisning)** - Interactive document viewer with search highlighting - **[OAI-PMH](https://oai-pmh.riksarkivet.se/OAI)** - Metadata harvesting for archive records and references ([Documentation](https://github.com/Riksarkivet/dataplattform/wiki/OAI-PMH)) #### Additional Resources The [Riksarkivet Data Platform Wiki](https://github.com/Riksarkivet/dataplattform/wiki) provides comprehensive documentation for building additional MCP integrations. #### Experimental Features - **[Förvaltningshistorik](https://forvaltningshistorik.riksarkivet.se/Index.htm)** - Semantic search interface (under evaluation) - **AI-Riksarkivet HTRflow** - Handwritten text recognition pipeline (PyPI package) ## Troubleshooting ### Common Issues 1. **No results found**: Try broader search terms or check spelling 2. **Page not loading**: Some pages may not have transcriptions available 3. **Network timeouts**: Tool includes retry logic, but very slow connections may time out ### Getting Help ```bash uv run tools/ra.py --help uv run tools/ra.py search --help uv run tools/ra.py browse --help uv run tools/ra.py show-pages --help ``` ## MCP Server Development ### Running the MCP Server ```bash # Install dependencies uv sync && uv pip install -e . # Run the main MCP server (stdio) cd src/ra_mcp && python server.py # Run with SSE/HTTP transport on port 8000 cd src/ra_mcp && python server.py --http ``` ### Testing with MCP Inspector Use the [MCP Inspector](https://github.com/modelcontextprotocol/inspector) to test and debug the MCP server: ```bash # Test the server interactively npx @modelcontextprotocol/inspector uv run python src/ra_mcp/server.py ``` The MCP Inspector provides a web interface to test server tools, resources, and prompts during development. ![image](https://github.com/user-attachments/assets/bde56408-5135-4a2a-baf3-f26c32fab9dc) ___ ## Current MCP Server Implementation ``` This server provides access to the Swedish National Archives (Riksarkivet) through multiple APIs. SEARCH-BASED WORKFLOW (start here): - search_records: Search for content by keywords (e.g., "coffee", "medical records") - get_collection_info: Explore what's available in a collection - get_all_manifests_from_pid: Get all image batches from a collection - get_manifest_info: Get details about a specific image batch - get_manifest_image: Download specific images from a batch - get_all_images_from_pid: Download all images from a collection URL BUILDING TOOLS: - build_image_url: Build IIIF Image URLs with custom parameters - get_image_urls_from_manifest: Get all URLs from an image batch - get_image_urls_from_pid: Get all URLs from a collection TYPICAL WORKFLOW: 1. search_records("your keywords") → find PIDs 2. get_collection_info(pid) → see what's available 3. get_manifest_info(manifest_id) → explore specific image batch 4. get_manifest_image(manifest_id, image_index) → download specific image Example PID: LmOmKigRrH6xqG3GjpvwY3 ``` ___

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/AI-Riksarkivet/oxenstierna'

If you have feedback or need assistance with the MCP directory API, please join our Discord server