Skip to main content
Glama
emave

MCP OAuth Test

by emave

MCP + OAuth 2.1 (PKCE) multi-tenant mockup

A self-contained research mockup: one Node process is BOTH an OAuth 2.1 Authorization Server (PKCE/S256) AND an MCP Resource Server with per-tenant tool filtering and an append-only audit log.

Run

npm install
npm run dev        # server on http://localhost:3000

Related MCP server: production-grade-mcp-agentic-system

Drive it (CLI harness)

npm run harness -- --tenant tenant-a --tool echo --args '{"message":"hi"}'
npm run harness -- --tenant tenant-b --tool export_report --args '{"reportId":"r1"}'
npm run harness -- --tenant tenant-a --tool export_report --args '{"reportId":"r1"}'   # denied (cross-tenant)
npm run harness -- --tenant tenant-a --tool admin_purge --args '{"confirm":true}' --scope "tools:call admin"

Use with Claude Code

The server speaks the OAuth flow Claude Code's remote-MCP client expects (Streamable HTTP, RFC 9728/8414 discovery, PKCE S256, Dynamic Client Registration).

npm run dev            # http://localhost:3000
claude mcp add --transport http mockup http://localhost:3000/mcp

Then trigger the connection (e.g. list tools). Claude Code will:

  1. hit /mcp, get 401 + WWW-Authenticate pointing at the resource metadata;

  2. discover the authorization server and dynamically register itself (POST /register);

  3. open a browser to the login/consent page at /authorize.

Sign in with a demo user (password demo):

user

tenant

can call

alice

tenant-a

echo, search_documents, admin_purge (has admin)

bob

tenant-a

echo, search_documents (admin_purge denied — no admin)

carol

tenant-b

echo, search_documents, export_report

After authorizing, Claude Code exchanges the code for a token and calls tools scoped to that user's tenant. The login is the authority that grants scope — there is no self-service admin.

Mock fidelity gaps (deliberate)

Plaintext passwords, no CSRF tokens, no sessions/cookies, open client registration, no refresh-token rotation, and a programmatic GET /authorize?tenant_id=…&sub=… shortcut used by the CLI harness/tests. These are research-mockup simplifications, not production patterns. (See spec §7 / §12.)

Test

npm test                       # unit + e2e
npx tsx tests/benchmark.ts 16 500   # audit-throughput benchmark (server must be running)

Flow

discovery/authorize (PKCE challenge, mock tenant login) → redirect with code → /token (PKCE verify, JWT minted with tenant_id) → /mcp initialize + tools/list (filtered per tenant) → tools/call (re-checked against tenant policy, audited before responding).

Note: every denied tool call (cross-tenant or scope-gated) is recorded in the audit log as tool_denied; tools/list remains filtered per tenant.

Bottlenecks & tradeoffs

See docs/superpowers/specs/2026-06-29-mcp-oauth-multitenant-design.md §11 for the full analysis. Headline findings:

  1. Audit log is the throughput ceiling — every tools/call writes one append-only row synchronously before responding. SQLite has a single writer, so concurrent calls serialize on the write lock.

  2. Stateless JWT vs revocation lag — local verification means no per-request DB hit, but tokens stay valid until expiry; you cannot have stateless, revocable, and cheap all at once.

  3. Authorization-code consume race — codes must be single-use; atomic consume adds another write-serialization point.

  4. Multi-tenant noisy-neighbor — all tenants share one process, CPU, and audit-write lock; A tenant hammering tools degrades everyone.

  5. Self-contained fidelity gap — because the RS verifies tokens it minted itself, no real network/trust boundary; JWKS rotation, clock skew, discovery-cache staleness, or AS-downtime behavior are not observable.

  6. Streamable HTTP session state — stateless mode scales horizontally but loses server→client push; sessionful mode enables push but grows memory and requires durable binding.

Measured throughput (audit-write ceiling)

concurrency 1:  584 calls/s
concurrency 16: 1360 calls/s

Interpretation: only ~2.3x throughput at 16x parallelism — sub-linear scaling.
The synchronous, single-writer SQLite audit append serializes tool calls;
that append (written before each tool result returns) is the throughput ceiling.

Detailed analysis, including secondary tradeoffs (#7–10) and out-of-scope items, in §11–12 of the spec.

Mock fidelity gaps (cross-ref spec §12)

These are deliberate simplifications — the enforcement paths are real, the authority backing them is not:

  • Scopes are self-service: /authorize persists whatever scope the client requests (mock login, no consent authority), so the scope×tenant gating demonstrates the enforcement path, not authoritative scope granting.

  • Refresh-token grant is not client-bound: no client_id check on refresh, and the refresh flow is not integration-tested.

  • Localhost HTTP only: no TLS; token confidentiality relies on loopback isolation alone.

F
license - not found
-
quality - not tested
B
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/emave/mcp-oauth-test'

If you have feedback or need assistance with the MCP directory API, please join our Discord server