We don't need to hack your AI Agent to hack your AI Agent

Back

Software Assurance Vulnerability Disclosure AI

2026-03-16 • 6 minute read

Linus Neumann @linus-neumann

Principal

Rene Rehme @rene-rehme

Ethical Hacker

Nina Piontek

Security Expert

Matthias Marx @matthias-marx

Pentesting Team lead

…and we don't need an AI agent for that either :)

Research by: Rene Rehme, Nina Piontek, Matthias Marx

Enterprise AI assistants are being deployed at remarkable speed. The pressure to ship something usable — a chat interface backed by a large language model, connected to internal knowledge bases and identity systems — has companies standing up new services in days or weeks. This pace creates an underappreciated attack surface: not the AI itself, but the surrounding infrastructure. The most severe vulnerabilities in AI assistant deployments often have nothing to do with prompt injection, adversarial inputs, or model manipulation.

The Setup

Prompted by the recent media attention to McKinsey's AI agent being hacked in only two hours, we went ahead and conducted a routine review of a publicly accessible AI assistant operated by a large enterprise organisation, we identified a backend API URL embedded in a JavaScript asset loaded by the application's frontend. This is a common and often unremarkable finding — backends have URLs, and it's not always avoidable for client-side code to know where to send requests. In this case, however, what sat behind that URL turned out to be the keys to the kingdom.

The backend was built on Django, the popular Python web framework. More specifically, it was running with Django's debug mode enabled — in production, exposed to the public internet.

I. Debug Mode as an Information Firehose

Django's debug mode is an extremely powerful development aid. When enabled, any unhandled server error causes Django to return a richly formatted HTML error page, complete with a full stack trace, the values of local variables at every frame in the call stack, and — most critically — a dump of all environment variables available to the process.

This is invaluable when you're sitting at your laptop trying to understand why a migration failed. It is catastrophic when the server is internet-facing.

Sending a malformed GET request to a single API endpoint was enough to trigger a server error and receive this debug page. No authentication was required. No special tooling was needed. The contents were wonderful:

Full API route listing — every endpoint the backend exposed, including undocumented internal routes
Application credentials — specifically, ADMIN_USERNAME and ADMIN_USER_PWD, the credentials for the Django administrative superuser account, stored directly as environment variables. This was of course helpful, even though at this point we realized that simply consulting Wikipedia's list of the most common passwords would have done the trick as well.
System prompt contents — the full initial instructions used to configure the AI model's behaviour, including internal guidelines and operational constraints

The system prompt exposure alone is significant. LLM system prompts often encode security-relevant assumptions ("always refuse to discuss X", "never output raw database entries"), internal business logic, and the architectural shape of the application. This information is directly useful to anyone attempting prompt injection or model manipulation — even if such an attack had otherwise been the harder path.

II. Credentials to Administrative Control

With the exposed admin credentials in hand, we authenticated to the Django admin interface. This is the standard, built-in Django admin panel — a fully-featured management UI that, in this deployment, had not been restricted, rate-limited, or placed behind an additional authentication factor.

The breadth of access was comprehensive. We could:

Enumerate, create, modify and delete all user accounts on the platform
Read every conversation — all chat messages between users and the AI assistant, including any sensitive information users had shared with the system
Access all user-uploaded files associated with the platform
Modify AI agent configuration — the personas, model parameters, and behavioural settings driving the assistant
Inspect and alter authentication settings — including OAuth associations, permission groups, and session data
Read full access and usage logs — including shared URL access patterns

From a threat model perspective, this represents complete compromise of the application layer. Every user of the platform, every piece of data they had shared with the AI, and the integrity of the AI's behaviour were all within reach.

III. Lateral Movement via Exposed Identity Tokens

The impact did not stop at the application boundary. Within the admin interface, we identified OAuth tokens belonging to authenticated users — tokens issued by the organisation's enterprise identity provider, Microsoft Entra ID.

These tokens were stored and displayed in plaintext within the administrative interface, apparently as part of the platform's social authentication integration. The tokens were live and valid.

Decoding one such token revealed that its granted scopes included User.Read, User.Read.All, and User.ReadBasic.All. The User.Read.All and User.ReadBasic.All scopes permit querying profile information for all users in the tenant via the Microsoft Graph API. We confirmed this by issuing a sample API call, which returned complete user profile data.

The affected tenant contained millions of employee accounts, a number reminding us to better start our write-up and submit it swiftly to prevent any illicit abuse.

In effect, a misconfigured Django debug flag had created a direct path from the online AI agent to the personal and professional profile data of the organisation's entire current and prior workforce — and we didn't even have to do any fancy AI hacking.

The Vulnerability Chain

It is worth tracing the full path, because each step required essentially no skill:

Public JS asset
    → discover backend URL
        → Unauthenticated GET request triggers debug error page
            → Environment variables expose admin credentials
                → access Admin panel
                    → see live OAuth tokens
                        → Query Microsoft Graph
                            → Access Millions of user profiles

The chain is entirely composed of well-known, documented issues. None of them are novel. None of them require bypassing AI-specific protections. For all we know, the poor agent was certainly not at fault and may not have even been able to witness what was happening.

Why This Happens in AI Deployments

Hasty AI deployments amplify a familiar pattern: Speed pressure from management keeps the focus on the AI model's capabilities, leaving surrounding infrastructure as an afterthought — and security thinking concentrated where attention is, rather than where exposure is.

AI applications also accumulate powerful credentials by design: document access, calendar access, identity tokens. These are legitimate requirements, but they make the credential surface far larger than a conventional web application. A single environment variable dump can expose access to systems well beyond the application itself.

Responsible Disclosure

We disclosed these findings to the affected organisation promptly. The issues described in this post have since been remediated.

Are you deploying an AI assistant? We're happy to help. We'll start with the boring stuff — infra config, credentials, environment hygiene — and once that's sorted, move on to the fancy part: prompt injection, adversarial inputs, and making your model say things it really shouldn't.

We are a security research and consulting team specialising in vulnerability research, offensive assessments, and emerging technologies. If our research publications or capabilities interest you, reach out or join our team.