Skip to main content
Glama

research-loop

research-loop is a self-contained slice of an AI research platform: an AI-moderated interview turns a discussion guide into a real conversation, each finished session is automatically distilled into a summary, chapters, highlights and tags, and every transcript folds into a semantically searchable repository where every answer cites the exact transcript moment it came from. The whole repository is exposed as an MCP server, and an eval harness keeps the AI honest with published numbers.

It is one coherent product that touches all four project areas in Great Question's internship posting: semantic search across interview content, a realtime agentic AI moderator, MCP tool structuring, and evals across the tools and the moderator.

Requirements: Node 20–24. Node 26 breaks next build (a Node 26 × Next 15.5.19 resolver incompatibility — see Known issues); npm run dev, npm run typecheck, and npm test are unaffected.


60-second reviewer quickstart

The fastest path is MCP — the demo is meant to be queried, not clicked. Point Claude at the deployed server:

claude mcp add --transport http loop https://research-loop-ten.vercel.app/api/mcp \
  --header "Authorization: Bearer <token-from-application>"

The live endpoint is gated by a bearer token (shared in the application materials). Running locally, leave MCP_BEARER_TOKEN unset and drop the --header. Once added, suggested first question:

"Ask the repository: why does this candidate want to work at Great Question?"

Claude calls the ask_repository tool and answers with cited quotes drawn from the candidate's own AI-moderated interview — each citation deep-links to the exact transcript moment. The demo is the cover letter.

The five MCP tools: list_sessions, get_session, search_repository, ask_repository, get_eval_results.

Run it locally

git clone <repo-url> research-loop && cd research-loop
npm install
cp .env.example .env.local        # then fill in keys (see below)
npm run db:init                   # apply the SQLite schema
npm run seed                      # seed guides + interview transcripts
npm run dev                       # http://localhost:3000

npm run db:init and the transcript-seeding stage of npm run seed run without any API keys. The analysis stage (embeddings + summary/chapters/ highlights/tags) only runs when keys are present; after adding keys, run npm run seed -- --analyze-only to backfill it.

Environment keys (see .env.example for the annotated list):

A single OpenRouter key powers everything — its OpenAI-compatible API serves both chat (Claude + GPT models) and embeddings.

Var

Needed for

OPENROUTER_API_KEY

everything — analysis, Ask synthesis, embeddings, eval judge/rerank

ANTHROPIC_MODEL

quality model, OpenRouter id (default nex-agi/nex-n2-pro:free — runs free)

ANTHROPIC_FALLBACK_MODELS

comma-separated free fallback chain, tried in order only when the prior model 429s/errors (default openai/gpt-oss-120b:free,google/gemma-4-31b-it:free)

ANTHROPIC_FAST_MODEL

cheap model for rerank + persona bots (default openai/gpt-4o-mini)

OPENAI_EMBED_MODEL

embeddings (default openai/text-embedding-3-small)

DATABASE_URL

libSQL file (default file:./data/research-loop.db)

PUBLIC_BASE_URL

base for deep links in MCP/Ask responses

MCP_BEARER_TOKEN

optional — gate the MCP endpoint; unset = open

Voice needs a direct OpenAI key. OpenRouter does not proxy OpenAI's Realtime API, so with an OpenRouter key voice mode is unavailable and the UI falls back to text mode (the dependable path anyway — see limitations).

No .env is committed and no key is required to typecheck or run the tests — imports stay lazy with respect to the environment, so the token-free test suite is green on a clean machine.


Related MCP server: MCP Audio RAG Server

What's inside

┌──────────────────────────────────────────────────────────────┐
│ Next.js 15 app (App Router) — one deployable artifact         │
│                                                               │
│  /                       landing + "start interview"          │
│  /interview/[id]         text moderator (voice: experimental) │
│  /sessions               completed sessions list              │
│  /sessions/[id]          transcript + chapters + highlights   │
│  /ask                    repository Q&A with cited quotes     │
│  /api/mcp                MCP server (Streamable HTTP)          │
│  /api/...                session + interview + ask endpoints   │
└───────────────┬───────────────────────────────────────────────┘
                │
   ┌────────────┼─────────────────┬────────────────────┐
   ▼            ▼                 ▼                    ▼
 OpenAI      OpenRouter        OpenRouter           libSQL / SQLite
 Realtime    (chat: analysis,  (embeddings:         sessions, segments,
 (voice —    Ask synthesis,    text-embedding-      embeddings (blob),
 direct      judge, rerank)    3-small)             chapters, highlights,
 OpenAI key                                         tags, eval_runs
 only)

The single OpenRouter key serves both chat and embeddings; only the experimental voice path needs a direct OpenAI Realtime key.

Routes

Route

Kind

Purpose

/

page

Landing: pitch, a card per guide, the claude mcp add line

/interview/[id]

page

Text-mode interview client; ?mode=voice for the experimental WebRTC voice client

/sessions

page

List of all sessions with status, label, guide, date

/sessions/[id]

page

Transcript + summary + chapters + highlights + tags; #t=<ms> scroll-highlights a moment

/ask

page

Natural-language Q&A with inline cited quotes

/api/mcp

route

MCP Streamable HTTP endpoint (GET/POST/DELETE)

/api/sessions

route

POST create a session

/api/interview/[id]/turn

route

POST one text-mode moderator exchange

/api/interview/[id]/segment

route

POST persist one voice transcript segment

/api/interview/[id]/end

route

POST finish + analyze a session

/api/ask

route

POST repository Q&A (cited answer)

/api/realtime/token

route

POST mint an ephemeral OpenAI Realtime token

The moderator brain — the "realtime agentic AI moderator" pattern

The moderator isn't a script reader. Its behavior is the sum of three things: instructions (lib/moderator/instructions.ts — warm/neutral persona, a probing rule of ≤2 follow-ups on shallow answers, no leading questions, a time-box, a consent open and an "anything I expected you to ask?" close), explicit state (ModeratorState in lib/moderator/textLoop.ts tracks covered topic ids and probes-per-topic so guide progress is real state, not vibes), and structured tool outputs (each turn returns a Zod-validated { utterance, covered_topic_ids, probe_topic_id, phase }). The same buildInstructions drives both the voice (Realtime) session and the text loop, so a prompt change is felt in both modes and is exercised by the evals. That instructions + state + structured-output loop is exactly what "realtime agentic AI moderator" means in the posting.


Eval results

Numbers below are from a npm run evals -- --quick run on the default models (quality/judge nex-agi/nex-n2-pro:free — the free quality model the demo actually ships, rerank openai/gpt-4o-mini, embeddings openai/text-embedding-3-small). --quick is a 3-persona / 10-retrieval / 6-Ask subset; the full suite (npm run evals) covers all 12 personas, ~28 retrieval pairs and ~20 Ask questions. Regenerate any time — evals/REPORT.md is overwritten on each run. The harness fails soft without a key.

The three suites (see evals/ and the moderator/retrieval/faithfulness design):

Moderator quality — scripted participant personas (terse, rambly, off-topic, hostile, over-sharer, …) run automated text interviews against the moderator; an LLM judge (rubric 1–5 with rationale and two calibration examples) scores each dimension; mean over seeds.

Dimension

Score (1–5)

Coverage

4.00

Probing

3.67

Neutrality

5.00

Flow

4.67

Retrieval quality — hand-written question → gold-segment pairs over the seeded sessions, embedding-only vs. embedding + rerank.

Pipeline

recall@5

recall@10

MRR

Embedding-only

90%

100%

0.814

Embedding + rerank

100%

100%

0.950

Citation faithfulnessask_repository answers checked by a verifier: does every cited quote exist and actually support the claim it's attached to?

Metric

Value

% claims cited

100%

% citations faithful

100%

% answers w/ all quotes verbatim

100%

Evals are how I knew when to stop prompt-tuning — prompt changes were kept or reverted based on these numbers. The rerank's lift (MRR 0.81 → 0.95) is exactly the kind of signal the harness exists to surface.


Decisions & tradeoffs

  • Realtime API, not a hand-rolled STT→LLM→TTS pipeline. OpenAI's Realtime API gives natural low-latency voice with built-in turn-taking; the moderator "brain" lives in the session instructions + tool calls. A hand-rolled pipeline would be more controllable but cost days of latency-tuning the demo doesn't need. Voice is shipped as experimental; text mode is the dependable path (see limitations).

  • No vector DB, on purpose. Embeddings are stored as float32 blobs and scored with brute-force cosine over a few hundred segments — microseconds, no infra to babysit. At Great Question scale (tens of thousands of interview hours) this flips: you'd want an ANN index (HNSW/IVF), smarter chunking than one-row-per-turn, and any retrieval change gated behind the retrieval eval before it ships. Knowing when not to reach for a vector DB is the point.

  • Small MCP surface, on purpose. Five tools, not twenty-five. Each description states when to use it and when not to (e.g. search_repository says "use ask_repository instead when you want a synthesized answer"), and documents its exact response shape. lib/mcp/tools.ts is the single source of truth, consumed by both the route handler and the tests.

  • In-memory moderator state. Covered-topic / probe tracking lives in a per-process map keyed by session id. A restart mid-interview resets that tracking, but the transcript is durable in the DB and is replayed into every moderatorStep call, so the model re-derives context — graceful degradation, not breakage. A multi-process deployment would persist it.

  • PII scrub before any third-party call. lib/pii.ts redacts emails / phones / long digit runs, and analysis scrubs once and reuses the scrubbed text for both the model call and embeddings, so raw text never leaves the process — mirroring Great Question's PII-masking approach, cheaply.

  • Text-mode fallback de-risks voice. The same guide and the same brain run over a plain text loop, so the demo works with no mic and a live-demo mic failure can't sink it.


Security

The deployed app is public, so it's hardened to a level a reviewer can trust at a glance:

  • Secrets never reach the client. The OpenRouter key and Turso token are server-only env vars; no NEXT_PUBLIC_ exposure, nothing in the browser bundle. Verified against every "use client" component.

  • Every API route validates input with Zod (typed bodies, length caps), all DB access is parameterized (no SQL injection), and PII is scrubbed before any third-party call.

  • The MCP endpoint requires a bearer token (MCP_BEARER_TOKEN) in production; unauthorized requests get a 401.

  • Per-IP rate limiting on the cost-bearing and write endpoints, and generic error responses — internal errors are logged server-side, never returned to the caller (no stack traces, paths, or DB/model details leak).

  • Security headers on every response: HSTS, X-Content-Type-Options: nosniff, X-Frame-Options: DENY, Referrer-Policy, Permissions-Policy, and a CSP locking down frame-ancestors/object-src/base-uri.

  • Free by construction — no paid model in the quality path. The quality model (analysis, Ask synthesis, eval judge) defaults to a free OpenRouter reasoning model (nex-agi/nex-n2-pro:free), and when it hits its daily cap complete() in lib/llm/anthropic.ts transparently walks a chain of other free models (openai/gpt-oss-120b:freegoogle/gemma-4-31b-it:free, deliberately different providers) until one answers. So the live demo can't flake when one free model is rate-limited, and the quality pipeline still costs $0 — there's no card-on-file exposure to run up at all. (Only the rerank step and embeddings touch a paid model, and at pennies; see below.)

  • Cost containment. The cost-bearing endpoints (ask_repository and the interview turns) sit behind the MCP bearer token / rate limiting, and OpenRouter enforces a hard credit cap as a final backstop — so even with the pennies-level rerank + embedding spend, nothing can run up a meaningful bill.

  • npm audit is clean of runtime risk. The remaining advisories are all in dev/build tooling (vitest/vite test runner, postcss used only at build time) — none ship to production, and the only "fix" downgrades Next.js to v9, so it's intentionally not applied.

Honest caveats: rate limiting is in-memory per serverless instance (best-effort against a single-source flood, not a hard global cap — a production deploy would use a shared store), and the interview endpoints are intentionally open so reviewers can run a live interview.

Known issues

  • next build fails on Node 26. Node 26 changed fs.readlinkSync on regular files (EINVAL → EISDIR), which trips Next 15.5.19's own module resolver with EISDIR: illegal operation on a directory, readlink …. It is not an app-code bug — npm run dev, npm run typecheck, and npm test all pass on Node 26. Use Node 20–24 to build (.nvmrc pins 22, and CI builds on Node 22). Don't "fix" it by upgrading deps.

Honest limitations

  • Voice mode is experimental and disabled on the OpenRouter key. The WebRTC + Realtime path is wired end to end, but OpenRouter doesn't proxy OpenAI's Realtime API, so the live demo runs in text mode (the dependable, fully-exercised path). Supply a direct OPENAI_API_KEY to enable voice.

  • Research participants are synthetic. Three of the four seeded sessions (Maya, Tomáš, Priya) are clearly labeled (synthetic); only the candidate self-interview is a real person. Synthetic data is labeled everywhere it appears, including in the seed guard that requires the (synthetic) label.

  • Single-process state. Moderator progress state is in-memory (above).

  • No auth on interview links beyond unguessable ids; the MCP endpoint is open unless MCP_BEARER_TOKEN is set. The demo data is non-sensitive by design.

  • Eval n is small. ~12 personas, ~25 retrieval pairs, ~20 ask questions — enough to catch regressions and guide prompt tuning, not a statistical claim.

  • Timestamps are estimated, not measured. Seeded transcripts have no real audio; turn durations are derived from word count (~150 wpm) so the timeline is deterministic and the gold set is stable.


Repo map

research-loop/
├── app/                    Next.js routes (pages + /api, incl. /api/mcp)
├── lib/
│   ├── moderator/          guide, instructions builder, text-loop brain
│   ├── analysis/           summary/chapters/highlights/tags pipeline
│   ├── search/             cosine, retrieve, ask-with-citations
│   ├── mcp/                the 5 tool defs (single source of truth)
│   ├── llm/                OpenRouter client (chat + embeddings) + wrappers
│   ├── db/                 schema, libSQL client, typed queries
│   ├── ui/                 small markdown + time helpers
│   ├── env.ts              lazy typed env access
│   └── pii.ts              PII scrub
├── evals/                  eval harness (npm run evals) → evals/REPORT.md
├── scripts/                init-db, seed, seed-data fixtures
└── tests/                  vitest, token-free (CI-safe)

Scripts

Script

What it does

Keys?

npm run dev

Next.js dev server

no

npm run build

production build (Node 20–24 only, see Known issues)

no

npm run typecheck

tsc --noEmit

no

npm test

vitest, token-free

no

npm run db:init

apply the SQLite schema

no

npm run seed

seed guides + transcripts; analysis stage runs only with keys

partial

npm run evals

run the eval suites → evals/REPORT.md (spends tokens)

yes

Working on this repo with an agent? See AGENTS.md for the map, the load-bearing contracts, and how to verify a change.

License

MIT — see LICENSE.

A
license - permissive license
-
quality - not tested
B
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/JerryWhites/research-loop'

If you have feedback or need assistance with the MCP directory API, please join our Discord server