Subindex MCP Server
Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| SUBINDEX_API_KEY | Yes | API key from subindex.ai/account. Required. | |
| SUBINDEX_API_URL | No | Override the API base URL for dev/staging environments. |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": false
} |
| prompts | {
"listChanged": false
} |
| resources | {
"subscribe": false,
"listChanged": false
} |
| experimental | {} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| get_balanceA | Return the current account credit balance in USD. |
| get_usageA | Return API usage history. Dates in YYYY-MM-DD format. |
| upload_fileA | Upload a file to Subindex. RECOMMENDED: always try with file_path first. If the server can read the file (stdio/uvx transport), it uploads in one step and returns session_id
With file_path (one step — stdio/uvx transport): upload_file(filename="data.xlsx", file_type="excel", file_size=12345, file_path="/abs/path/data.xlsx") → server reads + uploads → returns session_id, s3_key Without file_path (two step — HTTP/Railway transport or any remote server): upload_file(filename="data.xlsx", file_type="excel", file_size=12345) → returns upload_url + curl_command → run the curl_command (requires shell/Bash access), then call start_table_validation(session_id, s3_key, filename) Note: the two-step path requires shell access to run curl. If you have no shell (e.g. Claude Desktop), use the stdio/uvx transport instead so file_path works. file_type must be one of: "excel", "csv", "pdf" The presigned URL expires in ~15 minutes — run curl immediately. |
| start_table_validationA | Confirm the upload and detect matching prior configs. Call this immediately after upload_file completes (or after the curl upload finishes when using HTTP/Railway transport). Returns config_matches with match_score — if score >= 0.85 a prior config can be reused directly. instructions: Optional natural-language description of what to validate and how (e.g. "This table lists clinical trials — validate that trial IDs, phase, and primary endpoints are accurate"). When provided, the upload interview is bypassed: the AI reads the table structure + instructions and generates a config directly without asking clarifying questions. Preview is auto-triggered immediately after. config_id: Optional ID of a known prior configuration to reuse directly. When provided, skips matching and the interview entirely — applies the config and queues the preview immediately. Response includes preview_queued=true and job_id. Use when you already know the config_id (e.g. from a previous job's get_results response). Config generation and the 3-row preview are free. Full validation is charged at approve_validation — you still see the cost at preview_complete before anything is billed. If balance is insufficient at that point, approve_validation returns an insufficient_balance error. |
| get_job_statusA | One-shot job status check. Prefer wait_for_job for tracking long-running jobs. Use this for quick status inspections or as a fallback when wait_for_job is not appropriate. For polling loops, wait_for_job is more efficient — it holds the connection, emits live MCP progress notifications, and handles the multi-phase pipeline (table-maker → preview) automatically. Key statuses: queued / processing → call wait_for_job instead of re-polling manually preview_complete → approve_validation (or refine_config) completed → get_results failed → check error field |
| get_job_messagesA | Fetch live progress messages for a running job. Pass since_seq from the last message to receive only new messages. |
| wait_for_jobA | Wait for a job to reach a terminal state, emitting live MCP progress notifications. Preferred over manually looping get_job_status. The MCP host shows a live progress indicator while this tool holds the connection — no extra token cost. Architecture ──────────── Every poll cycle does two things in sequence:
This separation means messages drive the visual indicator (they are more real-time) while status is the authoritative source for workflow transitions. Neither endpoint is used as a shortcut to skip the other; both are polled every cycle so transient failures in one don't cause false terminations. Progress is always monotonically non-decreasing. Within a phase, msg_progress can oscillate (e.g. QC triggers new row-discovery rounds in the table-maker), but the emitted value is clamped to last_emitted. Across phases a geometric slice scheme is used so the bar never goes backward regardless of how many phases occur. Progress geometry (lazy split) ────────────────────────────── Starts with the full 0–99 range so single-phase jobs (e.g. full validation after approve_validation) map their native 0–100% directly across the whole bar. On each intermediate phase transition, 80% of the current range is "spent" on the completed phase and the remaining 20% is handed to the next phase — keeping progress monotonic for any number of QC re-discovery rounds or pipeline stages. True terminal always emits exactly 100. Terminal states: preview_complete, failed, completed-without-intermediate-step. Intermediate: completed + current_step in (Config Generation, Table Making, Claim Extraction, …) — tool advances phase and keeps polling. Returns the same payload shape as get_job_status so downstream tools (approve_validation, get_results, etc.) apply directly. job_id: the session_id value returned by upload_file / start_table_validation / start_table_maker. "job_id" and "session_id" are the same string — every workflow uses session_id as its job identifier throughout the pipeline. timeout_seconds: max wall time before returning last known state (default 900). Upload-interview + config-gen phases and large table previews can take up to 15 minutes — set timeout_seconds=900 or higher for long-running jobs. poll_interval: seconds between poll cycles (default 10) warmup_seconds: when > 0, applies synthetic sqrt-curve progress from 0→70% over this many seconds during the pre-message phase (before the first progress message or intermediate step arrives). Use this when the pipeline has a silent setup phase (e.g. instructions= mode where the backend runs an internal AI interview + config generation before preview messages begin). The warmup is automatically disabled once the first intermediate step completes (phase-split takes over). For instructions= mode, pass 300. |
| approve_validationA | Approve a preview and start full validation processing. job_id: the session_id value — "job_id" and "session_id" are the same string. approved_cost_usd MUST be provided and must match the estimated cost from the preview_complete response. This prevents accidental billing without first reviewing preview results and the cost estimate. Workflow:
|
| get_resultsA | Fetch the final validated/enriched results for a completed job. job_id: the session_id value — "job_id" and "session_id" are the same string. Automatically downloads table_metadata.json and embeds it inline so no separate HTTP fetch is needed. Key fields in the response: results.markdown_table — START HERE. Self-contained markdown document: full validated table (all rows, all values), confidence icons, viewer/download links, and a guide to navigating citations. Read this first. results.metadata.rows[] — per-row data keyed by row_key; each cell has value, confidence, comment (with citations). (legacy files may use full_value; both are equivalent) results.interactive_viewer_url — share with humans; renders sources + confidence. results.download_url — enriched Excel file for offline sharing. |
| update_tableA | Re-run validation on a previously processed table (update in place). source_version can pin a specific prior result version; omit for latest. |
| start_reference_checkA | Submit a reference-check job to fact-check text or a document. For inline text: start_reference_check(text="The claims to fact-check...") For a PDF or document, upload it first then pass the s3_key: upload_file(file_path, file_type="pdf") → returns s3_key start_reference_check(s3_key=s3_key) Do NOT call start_table_validation for PDFs — that starts the table validation pipeline, which is not what you want for a reference check. Designed for text with 4 or more factual claims; fewer claims may produce low-quality results. Three phases:
Set auto_approve=True to skip the approval gate and run straight through to completion automatically. |
| start_table_makerA | Start a Table Maker conversation to generate a research table. Describe the table you want in natural language, e.g.: 'Create a table of AI startups that raised Series A in 2024 with columns: company name, funding amount, investors, product description.' auto_start: When True, the AI skips the confirmation step and generates the table immediately from the message alone, without asking clarifying questions or showing a structure for approval. Use when the message fully describes the desired table and no back-and-forth is needed. |
| get_conversationA | Poll a conversation for new messages or a status change. Key statuses: processing → poll again in ~15s user_reply_needed → send_conversation_reply trigger_execution → preview is auto-queued; switch to get_job_status |
| send_conversation_replyA | Send a user reply in an ongoing conversation (interview or table-maker). After sending, poll get_conversation for the AI's next response. |
| refine_configA | Refine the generated validation config using natural language instructions. Example instructions: 'Add a column for LinkedIn URL. Remove the revenue column. Make email validation stricter.' Set defer_preview=True if you plan to do structural editing (exclude_row, add_pending_row, etc.) before the preview — this prevents a premature auto-preview from firing before your edits are complete. |
| wait_for_conversationA | Wait for a conversation turn to complete, emitting live synthetic progress. Preferred over manually polling get_conversation. Since conversation processing has no native progress signal, this tool emits time-based synthetic progress — advancing quickly at first, then slowing as it approaches expected_seconds — so the MCP host shows a "still thinking" indicator rather than a frozen bar. Returns when any of these conditions are met: user_reply_needed=True → AI asked a question; call send_conversation_reply trigger_execution=True → AI approved execution; preview is auto-queued, switch to wait_for_job(session_id) Non-processing status → unexpected terminal (inspect status field) Timeout → returns last known state with _wait_timeout note Applies to all conversation types: upload interview, table-maker interview, config refinement. expected_seconds: typical AI response time for this turn (default 120). First table-maker turn (research + planning): ~120–180s. Upload interview first turn (CSV analysis + plan): ~90–150s. Follow-up confirmations ("yes, proceed"): ~30–60s. poll_interval: seconds between status checks (default 8). timeout_seconds: max wall time before returning (default 900). Upload interview turns can take up to 15 minutes — set accordingly. |
| get_preview_stateA | Get the current structural editing state of a session. Returns the preview table, excluded rows, pending rows, ignored columns, and current row order so you can review before triggering the full validation run. |
| exclude_rowA | Exclude a row from the full validation run. Call with confirmed=False first to see a warning, then re-call with confirmed=True to apply. Reversible via include_row at any point before approving the full validation run. |
| include_rowA | Re-include a previously excluded row. No confirmation needed. Can be called at any point before approving the full validation run. |
| add_pending_rowA | Add a new entity as a pending row to be included in the full validation run. The source Excel is NOT modified — the row is stored in session state and injected in-memory at validation time. Fully reversible before approving the full run. The full-run cost quote updates to include the new row count. |
| reorder_preview_rowsA | Set the output row order for the validation run. This is a display preference — it does not affect which rows are validated or the validation itself. Rows not in the list are sorted to the end. |
| trigger_previewA | Trigger a preview run after structural editing is complete. Call this after finishing row/column edits (exclude_row, add_pending_row, add_column, etc.) to explicitly queue a preview job. This clears skip_auto_preview and queues the preview. Then call wait_for_job(session_id) to track progress. |
| add_validated_rowsA | Add new rows to a completed validation table. Deduplication runs against existing rows. If confirmed=False, returns a cost quote (N_new_rows x per_row_rate from last run). If confirmed=True, appends rows to source Excel, runs validation on new rows only, and merges results into the output Excel. Only available after full validation completes (status=completed). |
| discover_rowsA | Discover and add new rows to an existing validated table using AI-powered search. Uses the existing table's config and validated data to plan a targeted row discovery run. The planner derives search strategy from the config, then RowDiscovery finds candidates and QC filters them. If confirmed=False, returns a cost estimate. If confirmed=True, enqueues the discovery pipeline (planner -> search -> QC -> pending_rows). Discovered rows land in pending_rows with source='row_discover'. Run add_validated_rows to validate them, or trigger a preview to see them. |
| patch_columnA | Add a new column to a completed validation table. If confirmed=False, returns a cost ceiling estimate (max-case: all rows x all columns x per_cell_cost x 1.25; likely much less if run within 1 day of original validation due to cache). If confirmed=True, adds column header to source Excel, runs validation with updated config (old columns hit cache, new column fully validated), merges new column results into output Excel. No QC on column patch runs (single-column QC is not meaningful; full-table QC ran on the original validation). Only available after full validation completes (status=completed). |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
| generate_table | Generate a new research table from a natural language description. description: What the table should cover (topic, entities, scope). columns: Optional comma-separated list of columns to include. |
| validate_file | Validate an existing Excel or CSV file with AI fact-checking. file_path: Absolute path to the local .xlsx or .csv file. instructions: Optional description of what to validate (bypasses the interview). |
| fact_check_text | Fact-check a text passage by extracting and verifying its claims. text: The text to fact-check (works best with 4+ factual claims). |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/EliyahuAI/mcp-server-subindex'
If you have feedback or need assistance with the MCP directory API, please join our Discord server