Enables deployment of the MCP server as a Cloudflare Worker, providing serverless hosting for semantic search capabilities across NARA archives
Provides access to historical documents and metadata stored on IPFS, including content identifiers (CIDs) for decentralized retrieval of digitized archive materials
Utilizes OpenAI embeddings to power semantic search across the Arke Institute's archive of National Archives records and presidential libraries
Arke Institute MCP Server
A Model Context Protocol (MCP) server that provides AI assistants with semantic search capabilities across the Arke Institute's extensive archive of NARA (National Archives and Records Administration) records and presidential libraries.
Features
Semantic Search: Natural language queries powered by OpenAI embeddings and Pinecone vector search
Rich Entity Types: Search across institutions, collections, series, file units, and digitized objects
Extracted Text: Access OCR'd content from scanned documents and PDFs
Complete Metadata: Full NARA catalog records, access restrictions, physical locations, and hierarchical relationships
Fast Responses: Sub-second search times with parallel API processing
Easy Integration: Works with Claude Desktop, Cloudflare AI Playground, and any MCP client
What is Arke Institute?
The Arke Institute provides semantic access to historical archives, starting with digitizing and indexing the complete holdings of the National Archives. The search API powers discovery across millions of historical documents, photographs, and records.
Installation
Deploy to Cloudflare Workers
Or via command line:
Your MCP server will be deployed to: arke-mcp-server.<your-account>.workers.dev/sse
Local Development
The server runs at http://localhost:8787
MCP Tool Reference
search_arke
Perform semantic search across Arke Institute archives.
Parameters:
query
(string, required): Natural language search querytopK
(number, optional): Number of results (1-100, default: 10)namespaces
(array, optional): Filter by entity type(s)
Available Namespaces:
institution
- Institutional collectionscollection
- Record collectionsseries
- Record seriesfileUnit
- File unitsdigitalObject
- Digital objects (scanned documents, images, PDFs with extracted text)
Returns:
Formatted search results including:
Similarity scores (0-1 range, higher = better match)
Entity titles and descriptions
NARA identifiers and persistent identifiers (PIs)
Date ranges and record types
Parent/child entity relationships
Physical locations and access restrictions
IPFS content identifiers (CIDs)
Extracted text from digitized documents
Usage Examples
Example 1: General Search
Searches all entity types for Apollo 11 content.
Example 2: Search Digitized Documents
Searches only digitized objects (with extracted text) for WWII photos.
Example 3: Search File Units
Searches file units and digitized objects for climate-related presidential speeches.
Connect to Claude Desktop
To use this MCP server with Claude Desktop, add it to your configuration:
Open Claude Desktop Settings > Developer > Edit Config
Add the server configuration:
For local development:
Restart Claude Desktop
You should see the
search_arke
tool available
Connect to Cloudflare AI Playground
Enter your deployed MCP server URL:
arke-mcp-server.<your-account>.workers.dev/sse
Start using the
search_arke
tool directly in the playground
Example Conversations
Finding Historical Documents
User: "Find documents about the Space Shuttle Discovery missions"
Claude (using search_arke):
Returns relevant file units, digitized speeches, and mission records with extracted text content.
Researching Presidential Libraries
User: "Show me Clinton administration documents about Japan relations in the 1990s"
Claude (using search_arke):
Returns presidential library materials with dates, locations, and full text content.
Architecture
Project Structure
API Endpoints
/sse
- Server-Sent Events endpoint for MCP protocol (recommended)/mcp
- Standard HTTP MCP endpoint/
- Server information and health check
Development
Type Checking
Code Formatting
Deploy to Production
Performance
Search latency: ~500-900ms (including vector search, entity fetching, and formatting)
Namespace fetching: Cached at server initialization
Concurrent searches: Fully supported via Cloudflare Workers
Limitations
Maximum 100 results per query (topK parameter)
Searches are read-only (no write operations)
Rate limits apply per Cloudflare Workers free tier (or your plan)
Related Projects
Arke Search API: search.arke.institute - The underlying search service
Arke IPFS API: api.arke.institute - Entity manifest and metadata retrieval
MCP Specification: modelcontextprotocol.io
Contributing
Contributions welcome! Please open issues or pull requests on GitHub.
License
MIT License - see LICENSE file for details
Support
For questions or issues:
Open a GitHub issue
Contact the Arke Institute team
Check the MCP documentation
Acknowledgments
Built with Cloudflare Workers
Powered by Arke Institute search infrastructure
NARA data from the National Archives
This server cannot be installed
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
Enables semantic search across the Arke Institute's extensive archive of NARA records and presidential libraries using natural language queries. Provides access to millions of historical documents, photographs, and records with OCR'd content and complete metadata.