scifinder-route-mcp
Integrates with OpenAI-compatible APIs for LLM-based extraction of reaction steps and generating embeddings for semantic search, using configurable endpoints for chat completions and embeddings.
Allows use of PostgreSQL as the primary database backend with optional pgvector support for vector storage and similarity search, including connectivity status and fallback to SQLite.
Provides optional Redis as a queue backend for durable job processing, configurable via queue.redis_url as an alternative to the default SQLite queue.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@scifinder-route-mcpscan inbox for new SciFinder exports"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
scifinder-route-mcp
NAS-hosted MCP server for indexing and searching reaction-step-level synthesis routes from local SciFinder exports. It is designed to run long-term on Docker/NAS with a read-only inbox, durable SQLite queue fallback, optional external OCR/LLM/vector/parser/structure-recognition APIs, and an operational Admin Web UI.
GHCR visibility note: if anonymous pull fails, open GitHub → Packages →
scifinder-route-mcp→ Package settings → Change visibility → Public. The compose file is already configured forghcr.io/kettly1260/scifinder-route-mcp:latest.
Quick Start With Prebuilt Image
The published Docker image targets both linux/amd64 and linux/arm64.
git clone https://github.com/kettly1260/scifinder-route-mcp.git
cd scifinder-route-mcp
cp .env.example .env
mkdir -p nas-data nas-inbox
docker compose -f docker-compose.image.yml up -dThen open:
Admin Web UI: http://<nas-host>:8001/
MCP SSE: http://<nas-host>:8000/ssePut SciFinder exports into nas-inbox, then click Scan Inbox in the Admin Web UI or call the MCP scan_inbox tool. The image compose file uses image: only and does not build locally.
Local Build Deployment
docker compose up -d --buildPersistent paths:
./nas-data -> /data
./nas-inbox -> /inbox (read-only in the container)
./nas-data/uploads -> /data/uploads (HTTP upload and sidecar staging)Parsing is asynchronous in the NAS profile. Jobs are stored durably in SQLite; after a container restart, interrupted running jobs are re-queued. Poll get_parse_job_status or list_parse_jobs until completion.
Environment and Runtime Config
Copy .env.example to .env. Docker-level settings such as published ports, volumes, container network, and restart policy belong in .env/Compose only. The Admin Web UI never edits Docker files and never controls host Docker.
Hot application config is read from /data/config.yaml; copy config.example.yaml to ./nas-data/config.yaml if desired. Hot-reloadable sections include:
server.async_jobs, server.max_workers, server.storage_backend
queue.backend, queue.redis_url
security.allow_external_paths, security.token, security.users
ingest.scan_extensions
integrations.*
extraction.llm_schema_version, extraction.llm_prompt_profile, extraction.llm_cost_limit_usd
thresholds.verification_confidence_threshold
retention.evidence_retention_days, retention.cache_retention_daysUse MCP tools get_config, update_config, validate_config, and reload_config, or use the Admin Web UI.
Admin Web UI
The Admin Web UI provides operational controls for:
- health/status cards and mounted storage diagnostics
- token-protected config changes
- queue status, recent jobs, failed-job retry
- HTTP upload endpoint for sidecar/client upload
- LLM endpoint/model/enable toggle, schema version, prompt profile, cost limit
- embedding endpoint/model, vector rebuild, vector index status and errors
- OCR endpoint/model, OCR backlog status
- document parser endpoint/model, parser fallback and endpoint health
- structure recognition endpoint/model health
- PostgreSQL URL/backend status with SQLite fallback
- DOI low-confidence queue count
- evaluation latest metrics
- SQLite backup, retention dry-run cleanup, NAS storage usage
- compound registry count and search via MCPMCP Tools
Implemented tools:
health_check
get_config
update_config
validate_config
reload_config
scan_inbox
register_document
upload_document
get_parse_job_status
list_parse_jobs
retry_parse_job
retry_failed_jobs
search_reaction_steps
get_reaction_step
get_reaction_provenance
record_doi_verification
reparse_document
export_evaluation_set
compute_evaluation_metrics
get_evaluation_status
rebuild_vector_index
get_vector_index_status
semantic_search_reaction_steps
search_compounds
get_compound
merge_compounds
search_by_smiles
recognize_structure_image
backup_database
get_storage_usage
cleanup_evidence_cache
test_integration_endpointFeature Matrix
Area | Status | Notes |
Docker/NAS SSE service | Implemented | Compose and prebuilt-image compose supported. |
GHCR multi-arch image workflow | Implemented |
|
Read-only inbox scanning | Implemented |
|
HTTP upload staging | Implemented |
|
Sidecar watcher | Implemented |
|
Durable queue | Implemented | SQLite queue is default; restart recovery and retry tools. Redis is optional/degraded via config status, not required. |
SQLite storage | Implemented | Source documents, jobs, reaction steps, provenance, DOI verification, vector rows, compounds, metrics. |
PostgreSQL backend | Runnable degraded integration |
|
pgvector | Optional/degraded | SQLite stores embeddings as JSON and cosine-searches them; Postgres/pgvector reports endpoint/backend status. |
PDF/HTML/MHTML/text parsing | Implemented | Built-in parser remains fallback. |
External document parser | Implemented |
|
OCR worker | Implemented adapter |
|
Rule extraction | Implemented | Candidate blocks and structured fields. |
LLM JSON structuring | Implemented adapter | OpenAI-compatible |
Embedding/vector index | Implemented adapter | OpenAI-compatible |
Compound registry | Implemented | CAS/SMILES/InChIKey text extraction, alias registry, reaction roles; RDKit optional. |
Image structure recognition | Implemented adapter |
|
Multi-user authorization | Implemented |
|
Evaluation metrics | Implemented | JSONL gold-set metrics and latest metric status. |
Backup/retention | Implemented | SQLite backup, storage usage, evidence/cache cleanup dry-run. |
Endpoint health checks | Implemented | LLM, embedding, OCR, parser, structure recognition, Postgres. |
External API Schemas
All external services are optional. If a service is not configured or fails, the server returns a degraded/skipped/error status instead of crashing the process.
Embedding endpoint: POST <endpoint>/embeddings
{"model":"bge-m3","input":["text"]}Expected response can be OpenAI-like:
{"data":[{"embedding":[0.1,0.2]}]}LLM endpoint: POST <endpoint>/chat/completions, OpenAI-compatible. The assistant content must be strict JSON with reaction-step fields.
OCR endpoint: POST <endpoint>/ocr
{"model":"mineru-layout","file_path":"/data/uploads/file.pdf"}Expected response:
{"text":"OCR text", "confidence":0.85}Document parser endpoint: POST <endpoint>/parse
{"model":"parser-name","file_path":"/data/uploads/file.pdf"}Expected response:
{"file_type":"pdf","title":"...","doi":"10....","chunks":[{"text":"...","page_number":1,"parser_name":"external","parser_version":"1"}]}Structure recognition endpoint: POST <endpoint>/recognize
{"model":"decimer","image_path":"/data/evidence/page1.png"}Expected response:
{"structures":[{"smiles":"CCO","confidence":0.7}]}Sidecar Watcher
Create sidecar.yaml on a client machine:
watch_dir: /path/to/scifinder/exports
server_url: http://nas-host:8001
token: change-me
include_patterns:
- "*.pdf"
- "*.html"
settle_seconds: 3
upload_mode: http
poll_seconds: 2Run:
scifinder-route-sidecar sidecar.yamlThe sidecar polls by default and does not require watchdog, making it suitable for Windows/macOS/Linux clients.
Authorization
Legacy single-token mode:
SCIFINDER_ROUTE_TOKEN=change-meMulti-user token mode:
SCIFINDER_ROUTE_USERS=alice:viewer-token:viewer,bob:operator-token:operator,root:admin-token:adminRoles:
viewer search/read/status
operator scan/reparse/retry/vector/evaluation/integration tests
admin config/backup/cleanup/secret operationsDevelopment
python -m pytest -qOptional Docker check:
docker compose build
docker compose -f docker-compose.image.yml configMaintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/kettly1260/scifinder-route-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server