Skip to main content
Glama

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern

Written by on .

Agent Identity
MCP Security
OAuth Delegation
AI Access Control

  1. The Setup That Feels Safe
    1. How Most Teams Actually Wire This
      1. The Fix That Feels Right (And Still Isn't)
        1. The Real Question
          1. One Token, Three Questions
            1. What I'd Actually Check in My Own Stack
              1. My Take
                1. Acknowledgements

                  Chatbots answer questions. Agents act on them. That shift sounds small until you trace what it means for access control, which is exactly what Sahan Dilshan and Hasini Samarathunga, both engineers at WSO2, did at MCP Dev Summit Mumbai 2026. Their talk walks through one mundane internal tool, an HR chat assistant, and shows exactly how it leaks a CEO's salary to an intern. Not through a jailbreak. Through wiring that looked correct at every step.

                  The Setup That Feels Safe

                  Picture an HR assistant behind SSO, internal only, talking to an MCP server that holds salaries and performance reviews. The CEO has full access to her own context. The intern has access only to the records they are authorized to view. It feels safe because every individual piece of it is doing its job correctly.

                  Then the intern asks the bot what the CEO makes. And the bot answers.

                  Same bot, same kind of question, two people who are supposed to get two different answers, and it doesn't know the difference. No prompt injection needed. No clever wording. Just a straightforward question to a system that was never actually checking who was asking.

                  How Most Teams Actually Wire This

                  Here's the wiring that produces that failure, and it's the wiring almost every team building agent-to-MCP integrations ships by default. The agent authenticates to the MCP server with its own credentials: client_credentials, an API key, a personal access token. The server validates that token, sees a trusted client, and executes. Every box in that chain does exactly what it's supposed to do. The user exists only inside the prompt text. Never in the security context the server actually checks.

                  Walk it step by step and the failure is almost boring:

                  1. The request. The intern asks: "What's the CEO's salary?"

                  2. Tool selection. The agent picks get_salary() and calls the MCP server using its own token.

                  3. Execution. The server validates the trusted client and runs the query against HR as a service account with full access.

                  4. The leak. The HR service returns the data, because as far as it's concerned, a trusted, fully-privileged caller asked for it.

                  The leak isn't a bug in any one component. It's what every component does correctly, in sequence, with no step anywhere that asks who's actually behind the request.

                  The Fix That Feels Right (And Still Isn't)

                  The obvious next move: forward the user's token instead of the agent's. Now the user finally shows up in the request. The server sees Alice, scoped to exactly what she's allowed to see. Feels right.

                  Except now the agent disappears entirely:

                  1. Alice logs in through the agent.

                  2. The agent receives Alice's token, sub: alice.

                  3. The agent forwards that token straight to the MCP server.

                  4. The server validates it, confirms it's Alice, and scopes the response to her access.

                  5. But the agent can replay that exact same token on its own, with no human anywhere near the keyboard.

                  6. Either way, the token still just says sub: alice.

                  The audit log reads sub=alice → get_salary(emp=self) → 200 OK, and there's no way to tell from that line whether Alice asked, or her agent decided to ask again on her behalf while she was getting coffee.

                  Dilshan and Hasini's talk names three tempting non-fixes that all fail for related reasons:

                  • Trust the agent's own credentials. The classic confused deputy: you've authorized the agent, not the user, so every request runs at full privilege regardless of who triggered it.

                  • Forward the user's identity. Scopes the response correctly but erases the agent from the log, so you can no longer distinguish a real user from a replaying agent.

                  • Guardrails in the system prompt. Not a security boundary at all. A prompt is a suggestion an attacker can walk straight through with the right injection.

                  The Real Question

                  Their reframe cuts through all three: whose identity actually gets checked? Today, in most of these stacks, it's the agent's identity that gets checked, and it carries broad scope, so every user who talks to that agent inherits something close to god-mode. The identity that gets ignored is the one that should matter most: who actually asked, and what they're personally entitled to see.

                  The fix isn't picking one of the two identities. It's carrying both, at the same time, in a way the server can actually distinguish.

                  One Token, Three Questions

                  This is where the talk gets specific instead of just diagnostic. Every access decision needs to answer three questions: who made the decision, who authorized it, and whose scope applies.

                  Token type

                  Who decided?

                  Who authorized?

                  Whose scope?

                  Agent token only

                  Agent

                  Agent

                  Agent (god-mode, one broad scope)

                  User token only

                  Invisible (was it Alice, or the agent?)

                  User

                  User

                  Delegated (OBO)

                  Agent

                  User

                  User

                  A delegated, on-behalf-of (OBO) token answers all three correctly at once. The token carries sub: alice for who's being acted on behalf of, scope: hr:read_self for what Alice personally is entitled to, and a nested act: { sub: hr-agent } claim naming the agent that's actually making the call. The resource server reads that and sees, precisely, "Alice, via the HR agent." Not Alice alone. Not the agent alone. Both, named, in one token.

                  {
                    "sub": "alice",
                    "scope": "hr:read_self",
                    "act": {
                      "sub": "hr-agent"
                    },
                    "aud": "hr-mcp-server"
                  }

                  The payoff shows up immediately in the logs:

                  • Service-account audit line: read salary, no chain, no subject, no meaning — functionally useless after the fact.

                  • OBO audit line: hr-agent on-behalf-of alice (intern) → get_salary(emp=CEO) → DENIED by policy — names who asked, who acted, what they tried, and what happened.

                  The pattern chains, too. When one agent calls another, the act claim nests: agent A acting for the user, agent B acting for agent A, a backend service acting for agent B, each hop adding another act layer rather than replacing the one before it. The full chain travels in the token, so a downstream audit log names every actor that touched the request, not just whoever happened to make the last call.

                  What I'd Actually Check in My Own Stack

                  Their five-point closer is short on purpose, and it's worth taking literally rather than nodding past:

                  1. Agents act, they don't just answer, which means the access model that worked for a Q&A chatbot doesn't automatically carry over.

                  2. The agent's identity is not the user's identity, full stop. Conflating the two is exactly the mistake the HR demo walks through.

                  3. Delegate with on-behalf-of tokens rather than choosing between agent-only or user-only.

                  4. Auditability has to be designed in, not bolted on after an incident makes it urgent.

                  5. IAM doesn't stop prompt injection. It bounds the blast radius when injection, or any other failure mode, inevitably happens anyway.

                  Their live demo, an actual HR assistant repo with a runtime toggle for turning enforcement on and off, makes this concrete rather than theoretical. With enforcement on, the intern's question about the CEO's salary gets denied at the resource server, even after a human nominally "approves" the escalation, because the server checks scope, sub-ownership, and audience independently of whatever the approval flow said. Flip enforcement off and repeat the same question as the same intern, and the leak happens immediately. Same agent, same code, same question. The only thing that changed is whether the resource server actually enforced who was asking.

                  That's the detail that stuck with me most. Human approval is not authorization. The server is.

                  My Take

                  What I appreciated about this talk is how unglamorous the failure mode is. There's no exotic jailbreak, no adversarial suffix, no model behaving strangely. Two completely reasonable-looking architectures, agent-token-only and user-token-forwarded, both fail in ways that are easy to miss in a design review because each one looks like it solved the problem it was aimed at. The agent-token version looks secure because the server validates something. The user-token version looks secure because the right person shows up in the log. Neither actually answers all three questions a real access decision needs answered.

                  Real incidents back this up at a scale that's hard to wave off as theoretical. Meta's AI support assistant was tricked into reassigning account recovery emails earlier this year, leading to a wave of Instagram account takeovers that the company is still cleaning up. Around the same time, a security firm's autonomous agent found its way into McKinsey's internal AI platform Lilli through unauthenticated API endpoints and a SQL injection flaw, reaching tens of millions of internal messages in about two hours. Neither incident needed a sophisticated attacker. Both needed a system that trusted whichever caller showed up with a valid-looking token, exactly the gap this talk is pointing at.

                  Acknowledgements

                  This article draws on a talk delivered by Sahan Dilshan, Associate Technical Lead, and Hasini Samarathunga, Senior Software Engineer, both at WSO2, at MCP Dev Summit Mumbai 2026. Their live demo, an HR assistant built to show exactly this enforcement gap in action, is open source at github.com/sahandilshan/hr-assistant-sample. Thanks to the Agentic AI Foundation and the Linux Foundation for organizing an event where this kind of grounded, demo-backed security talk gets stage time.

                  Written by Om-Shree-0709 (@Om-Shree-0709)