system_design.md•8.39 kB
# SYSTEM_DESIGN.md — CLEAN SCOPE (Google‑Only, No Docker/Final‑Stage)
> Technical design for the **Google‑only MCP Router** MVP. This version removes containerization (Docker/k8s), production rollout/CD, and any final‑stage features.
## 1. High‑Level Architecture (MVP)
```mermaid
flowchart LR
Caller((Caller)) -->|PSTN/VoIP| VoiceAgent[Voice Agent (Vapi/ElevenLabs)]
VoiceAgent -->|MCP Tool Calls| MCP[MCP Router Service]
subgraph Google Cloud
Cal[Google Calendar API]
Mail[Gmail API]
end
MCP --> Cal
MCP --> Mail
MCP --> TokenStore[(Encrypted Token Store)]
MCP --> Metrics[(Metrics/Logs/Traces)]
```
**Responsibilities**
- **Voice Agent**: ASR/TTS/turn taking. Minimal orchestration.
- **MCP Router**: exposes 5 curated tools; validates/authorizes; enforces policy; unifies errors; OAuth; idempotency.
- **Google**: system of record (Calendar events, Gmail send).
## 2. Tech Stack — Rationale & Versions (No Containers)
| Layer | Choice | Version | Rationale |
|---|---|---:|---|
| Language | **TypeScript** | ≥ 5.5 | Strong typing, great ecosystem |
| Runtime | **Node.js** | LTS (≥ 20) | Modern TLS/crypto/fetch |
| HTTP | **Fastify** | ≥ 4 | High‑perf, schema‑first |
| JSON Schema | **ajv** | ≥ 8 | Fast validation |
| OAuth | **openid-client** | ≥ 6 | PKCE, refresh flows |
| Google SDK | **googleapis** | ≥ 136 | Calendar/Gmail wrappers |
| Time/TZ | **luxon** | ≥ 3 | IANA TZ, DST‑safe |
| Cache/Idempotency | **Redis** (or SQLite MVP) | 7.x | Keys, TTL, small set |
| Config | **zod**/**convict** | latest | Typed env parsing |
| Logging | **pino** | ≥ 9 | JSON logs |
| Metrics | **prom‑client** | latest | Prometheus exporter |
| Tracing | **OpenTelemetry** | 1.x | W3C traceparent |
| Testing | **vitest** + **nock** | latest | Fast tests, HTTP mocks |
> Deployment is assumed simple (VM or PaaS). CI is retained; CD/canary out of scope.
## 3. Service Structure
```
src/
server.ts # Fastify bootstrap + MCP registry
mcp/
index.ts # tool wiring & error adapter
tools/
calendar.find_free_slots.ts
calendar.create_event.ts
calendar.cancel_event.ts
email.send.ts
time.resolve.ts
schemas/
calendar.find_free_slots.request.json
calendar.find_free_slots.response.json
...
adapters/
google-calendar.ts # freebusy, events.list/insert/delete
gmail-send.ts # send MIME via Gmail
policy/
calendars.json # allowlist; working hours
rules.ts # overlap, weekday guards
auth/
oauth.ts # login URL, callback, refresh
token-store.ts # AES‑GCM encrypt/decrypt, rotate
templates/
meeting_confirm.hbs
cancel_confirm.hbs
util/
time.ts # luxon helpers; clamp; ISO
errors.ts # unified error taxonomy
idempotency.ts # (key,fingerprint)→outcome
test/
unit/*.spec.ts
contracts/*.spec.ts
README.md
```
## 4. MCP Layer & Contracts
- **Registry**: `domain.verb_noun` → {requestSchema, handler, responseSchema}.
- **Wrapper**: returns `{ok, data?}|{ok:false, error}` uniformly.
- **Error codes**: `INVALID_ARGUMENT`, `UNAUTHENTICATED`, `FORBIDDEN`, `NOT_FOUND`, `CONFLICT`, `RATE_LIMITED`, `INTERNAL`.
## 5. OAuth & Token Model
- Web OAuth 2.0 (PKCE). Scopes: `calendar.events`, `gmail.send`.
- Token store encrypts refresh tokens with **AES‑256‑GCM**; key material from OS KMS/secret store.
- Refresh before expiry; on `401`, refresh once then surface `UNAUTHENTICATED`.
- Monthly rotation in staging; production strategy out of scope.
## 6. Business Policy Enforcement
- **Work hours**: Mon–Fri 08:00–18:00 America/Chicago.
- **Calendar allowlist**: enforced via `calendars.json`.
- **Overlap prevention**: freebusy merge → `409 CONFLICT` on collision.
- **Idempotency**: mutating tools require `idempotency_key`; TTL=24h cache.
## 7. Slotting Logic (Free/Busy)
```mermaid
flowchart TD
A[Window] --> B[Clamp to work hours]
B --> C[Fetch freebusy (allowlisted)]
C --> D[Merge intervals]
D --> E[Invert to free]
E --> F[Filter >= duration]
F --> G[Return slots]
```
- Normalize to input TZ; return ISO‑8601 with offset.
- Attendees: if domain‑visible, merge; otherwise lower `confidence`.
## 8. Performance & Caching (MVP targets)
- p95 budgets: resolve ≤150ms; find_free_slots ≤700ms; create_event ≤900ms; email.send ≤350ms.
- HTTP keep‑alive + connection pooling; short‑TTL (e.g., 15s) cache for repeat freebusy.
- Schema cache by `$id`.
## 9. Observability (kept minimal)
- **Logs**: pino JSON, include `{tool, request_id, latency_ms, code}`; mask emails.
- **Metrics**: Prometheus — `tool_calls_total{tool,code}`, `tool_latency_ms_bucket{tool}`, `oauth_refresh_total`.
- **Tracing**: OpenTelemetry auto‑instrument; propagate `traceparent`.
## 10. Error Normalization Map (examples)
| Provider | HTTP | Provider Code | Unified Code |
|---|---:|---|---|
| Calendar | 403 | rateLimitExceeded | RATE_LIMITED |
| Calendar | 409 | conflict | CONFLICT |
| Gmail | 401 | invalidCredentials | UNAUTHENTICATED |
| Gmail | 400 | failedPrecondition | INVALID_ARGUMENT |
## 11. Tool Schemas (excerpt)
```json
// calendar.find_free_slots.request
{
"type": "object",
"additionalProperties": false,
"required": ["window_start_iso", "window_end_iso"],
"properties": {
"duration_min": {"type": "integer", "minimum": 5, "maximum": 480, "default": 30},
"window_start_iso": {"type": "string", "format": "date-time"},
"window_end_iso": {"type": "string", "format": "date-time"},
"attendees": {"type": "array", "items": {"type": "string", "format": "email"}},
"work_hours_only": {"type": "boolean", "default": true},
"tz": {"type": "string", "default": "America/Chicago"}
}
}
```
```json
// calendar.find_free_slots.response
{
"type": "object",
"required": ["ok"],
"properties": {
"ok": {"type": "boolean"},
"data": {
"type": "object",
"properties": {
"slots": {"type": "array", "items": {"type": "object", "required": ["start_iso","end_iso"], "properties": {"start_iso": {"type": "string", "format": "date-time"}, "end_iso": {"type": "string", "format": "date-time"}}}},
"source_calendar_ids": {"type": "array", "items": {"type": "string"}},
"confidence": {"type": "string", "enum": ["HIGH","MEDIUM","LOW"]}
}
},
"error": {"$ref": "#/definitions/error"}
},
"definitions": {
"error": {
"type": "object",
"required": ["code","message"],
"properties": {
"code": {"type": "string", "enum": ["INVALID_ARGUMENT","UNAUTHENTICATED","FORBIDDEN","NOT_FOUND","CONFLICT","RATE_LIMITED","INTERNAL"]},
"message": {"type": "string"},
"provider_code": {"type": "string"},
"retry_after_ms": {"type": "integer"}
}
}
}
}
```
> Apply analogous schemas for `calendar.create_event`, `calendar.cancel_event`, `email.send`, `time.resolve`.
## 12. Testing Strategy (kept)
- Contract: AJV round‑trips for all tools.
- Policy: off‑hours `FORBIDDEN`; overlap `CONFLICT`; duration bounds.
- Adapter: Calendar (ok/rateLimitExceeded/5xx), Gmail (ok/401 refresh).
- Idempotency: replay `create_event`/`cancel_event`.
- Time: deterministic with `now_iso`, DST boundaries.
## 13. Configuration & Envs (excerpt)
| Name | Example | Description |
|---|---|---|
| `DEFAULT_TZ` | America/Chicago | IANA TZ |
| `ALLOWLIST_CALENDAR_IDS` | primary,work@domain.com | Allowed calendars |
| `WORK_HOURS_START` | 08:00 | Local time |
| `WORK_HOURS_END` | 18:00 | Local time |
| `WORK_DAYS` | Mon,Tue,Wed,Thu,Fri | Enforced weekdays |
| `GOOGLE_CLIENT_ID` | … | OAuth client id |
| `GOOGLE_CLIENT_SECRET` | … | OAuth secret |
| `GOOGLE_REDIRECT_URI` | http://localhost:8080/oauth/google/callback | Exact match |
| `TOKEN_ENC_KEY` | base64:… | AES‑GCM key material |
| `REDIS_URL` | redis://localhost:6379 | Idempotency/cache |
| `LOG_LEVEL` | info | Logging level |
## 14. Out‑of‑Scope (explicit)
- Containers (Docker/k8s), CD/canary, multi‑region HA.
- Slack/Discord/SMS; inbox reads; rescheduling; ICS attachments.
## 15. Open Questions (MVP)
- Grammar‑only `time.resolve` vs. ML fallback?
- Holds (tentative events) vs. commit‑on‑consent?
- How to convey confidence to the voice agent for ambiguous times?