How an AI Agent Hacked McKinsey's Lilli and Exposed 46 Million Messages

Photo by Tima Miroshnichenko on Pexels Source

An autonomous AI agent, with no credentials, no insider knowledge, and no human guidance after launch, broke into McKinsey’s internal AI platform and gained full read-write access to the production database in two hours. The breach exposed 46.5 million chat messages covering strategy, M&A, and client engagements, 728,000 confidential files, 57,000 user accounts, and 95 system prompts that controlled how the chatbot responded to McKinsey’s 40,000+ consultants. The security firm CodeWall disclosed the findings on March 9, 2026, and the attack method was not some novel zero-day exploit. It was SQL injection, a vulnerability class that has existed since the 1990s.

The twist: the vulnerability sat in a spot that OWASP ZAP, one of the most widely used security scanners, did not flag. Standard tooling missed it. An AI agent found it in minutes.

The Attack: SQL Injection Through JSON Field Names

CodeWall’s AI agent started the same way any external attacker would: by probing publicly accessible surfaces. It discovered Lilli’s API documentation, which was exposed to the internet. Within that documentation, the agent identified 22 API endpoints that required no authentication at all. No API key, no OAuth token, no session cookie. Open endpoints on a production system holding tens of millions of confidential records.

One of these unauthenticated endpoints handled user search queries. The agent began testing it and noticed something unusual: the JSON field names in API requests were being reflected verbatim in database error messages. This is a red flag that developers learn to avoid in their first semester of web security coursework.

Here is why this matters technically. When you send a JSON payload like {"username": "john"} to an API, the value (john) is typically parameterized in the SQL query. McKinsey’s developers did parameterize the values correctly. But the keys, the field names themselves (username), were being concatenated directly into the SQL string. The Decoder describes this as “a decades-old technique” applied in a context that modern scanners do not check.

The agent ran 15 blind iterations, each time crafting a slightly different JSON key to extract more information from the error messages. By the fifteenth attempt, production data was flowing back. Full read-write access to the entire database.

What Standard Scanners Missed

This is the detail that should alarm every security team running DAST tools in their CI/CD pipeline. OWASP ZAP, which is the de facto open-source web application security scanner, tested Lilli and found nothing. The reason: ZAP injects test payloads into parameter values, not parameter names. It tested {"username": "' OR 1=1--"} but never {"' OR 1=1--": "value"}. The attack surface was in a dimension that the scanner was not programmed to check.

CodeWall’s AI agent did not have this blind spot. It observed error messages, hypothesized about the query structure, and adapted its payloads over 15 iterations. This is exactly the kind of reasoning that makes AI agents effective at offensive security: they do not follow a fixed playbook. They learn from responses and adjust in real time.

The Scale of Exposure: 46.5 Million Messages in Plaintext

The numbers are staggering even by enterprise breach standards. According to multiple reports, CodeWall’s agent accessed:

46.5 million chat messages covering strategy discussions, M&A analysis, and client engagements, all stored in plaintext
728,000 files containing confidential client data
57,000 user accounts with associated metadata
95 system prompts that defined Lilli’s behavior, guardrails, citation methods, and response formatting

McKinsey’s Lilli is not a side project. It is the firm’s flagship AI platform, described internally as a knowledge management tool that helps consultants access McKinsey’s proprietary research and frameworks. Tens of thousands of consultants across the firm use it daily. Every question they asked Lilli, every strategic scenario they explored, every client name they mentioned, was sitting in that database.

The Writable Prompts Problem

The most dangerous finding was not the data exposure. It was the fact that Lilli’s 95 system prompts were writable through the same SQL injection vulnerability. These prompts control everything about how Lilli responds: what guardrails it applies, how it cites sources, what information it surfaces, what it refuses to answer.

Because the SQL injection gave read-write access, an attacker could have modified these prompts with a single HTTP request. No deployment pipeline, no code review, no infrastructure access needed. Just one crafted API call to rewrite the instructions that governed how Lilli answered every query from every consultant.

The implications of writable system prompts at this scale are severe. An attacker could have:

Poisoned strategic recommendations by subtly biasing Lilli’s outputs on M&A targets, market sizing, or competitive analysis
Disabled safety guardrails so Lilli would freely share confidential information it was supposed to restrict
Modified citation behavior so Lilli would attribute fabricated data to real McKinsey research
Installed persistent backdoors in the prompt logic that would survive application restarts

This is not theoretical. The technical capability was demonstrated. The only reason it did not happen is that CodeWall was conducting an authorized red team engagement, not an actual attack.

Why AI Agents Find What Scanners Miss

The McKinsey breach illustrates a structural shift in how vulnerabilities get discovered. Traditional DAST (Dynamic Application Security Testing) tools like ZAP, Burp Suite, or Qualys WAS follow predefined test cases. They inject known attack patterns into known parameter positions. They are fast, consistent, and very good at finding the vulnerabilities they are programmed to find.

AI agents operate differently. CodeWall’s agent selected its own target, discovered the API surface autonomously, formed hypotheses about backend behavior based on error messages, and iterated on attack vectors that no scanner had in its rulebook. It found a vulnerability class (SQL injection via JSON keys) that was technically well-known but practically invisible to automated tools.

This creates an asymmetry that defenders need to internalize. If your security testing relies exclusively on scanners, you are testing against yesterday’s attack patterns. Offensive AI agents test against the actual behavior of your system, including edge cases that no human wrote a detection rule for.

The Red Team Acceleration

CodeWall completed the entire attack in two hours. A human penetration tester conducting the same assessment would typically spend days on reconnaissance alone, possibly a week on the full engagement. The AI agent compressed the entire kill chain, from target selection to full database access, into 120 minutes.

This speed advantage matters for defenders too. If organizations deployed similar agents defensively, running continuous red team exercises against their own infrastructure, they could find these vulnerabilities before external attackers do. The problem is that very few organizations have adopted this approach yet. Most enterprises still run quarterly or annual penetration tests, schedules that leave months-long windows where new vulnerabilities go undetected.

McKinsey’s Response and What It Tells Us

McKinsey acted quickly once notified. According to their statement, they confirmed the vulnerability and patched it within hours. The remediation included taking the development environment offline, removing public access to API documentation, and patching the unauthenticated endpoints.

McKinsey’s investigation, supported by a third-party forensics firm, found no evidence that client data was accessed by unauthorized parties beyond CodeWall’s controlled test. That is good news for McKinsey’s clients, but it raises an uncomfortable question: how long were those 22 unauthenticated endpoints exposed before CodeWall found them? The API documentation was publicly accessible. The vulnerability was exploitable without credentials. If CodeWall’s agent found it in minutes, what else might have found it first?

Lessons for Every Enterprise Running AI Platforms

The McKinsey incident is not unique. It is a preview of what will happen to every organization that treats AI platform security as a bolt-on rather than a foundational requirement. Three patterns from this breach apply broadly:

First, API surface management for AI platforms is different from traditional applications. AI chatbots often expose more endpoints than a typical web app because they need to handle document retrieval, user context, prompt management, and feedback loops. Each of these is an attack surface. McKinsey had 22 unauthenticated endpoints. How many does your AI platform have?

Second, traditional security scanners are necessary but insufficient. ZAP found nothing. That does not mean ZAP is broken. It means the vulnerability was outside ZAP’s test matrix. Organizations need to supplement automated scanning with adversarial AI testing, manual review of API designs, and threat modeling specific to AI system architectures.

Third, system prompts are infrastructure, not configuration. Storing system prompts in the same database as user data, accessible through the same query layer, is a design choice that should never survive a security review. Prompts should be versioned, access-controlled, and stored separately from runtime data. If someone can write to your prompts, they control your AI.

Frequently Asked Questions

How was McKinsey’s Lilli AI agent hacked?

Security firm CodeWall’s autonomous AI agent discovered publicly exposed API documentation for McKinsey’s Lilli platform. It found 22 unauthenticated endpoints and exploited a SQL injection vulnerability in JSON field names (not values) to gain full read-write access to the production database within two hours. Standard security scanners like OWASP ZAP missed this vulnerability because they only test injection in parameter values, not parameter names.

What data was exposed in the McKinsey Lilli breach?

CodeWall’s agent accessed 46.5 million chat messages covering strategy, M&A, and client engagements in plaintext, 728,000 confidential files, 57,000 user accounts, and 95 system prompts that controlled how Lilli responded to McKinsey’s 40,000+ consultants. The system prompts were also writable, meaning an attacker could have modified how Lilli behaved.

What is SQL injection through JSON field names?

Most SQL injection attacks target parameter values in API requests. In the McKinsey Lilli breach, the values were properly parameterized, but the JSON field names (keys) were concatenated directly into SQL queries. This is a well-known vulnerability class that standard automated scanners typically do not test for, which is why OWASP ZAP missed it during testing.

Why are writable AI system prompts a security risk?

System prompts define how an AI chatbot behaves, what guardrails it follows, how it cites sources, and what information it shares. If an attacker can write to these prompts, they can silently alter the AI’s behavior for all users without deploying code or changing infrastructure. In McKinsey’s case, 95 system prompts were writable through the SQL injection flaw, which could have been used to poison strategic recommendations, disable safety guardrails, or fabricate citations.

How can enterprises protect their AI platforms from similar attacks?

Three key measures: First, audit all API endpoints for authentication requirements, especially on AI platforms that often expose more surfaces than traditional apps. Second, supplement automated security scanners with adversarial AI red teaming, since standard tools missed the McKinsey vulnerability. Third, treat system prompts as infrastructure by storing them separately from user data, version-controlling them, and restricting write access.

The Attack: SQL Injection Through JSON Field Names#

What Standard Scanners Missed#

The Scale of Exposure: 46.5 Million Messages in Plaintext#

The Writable Prompts Problem#

Why AI Agents Find What Scanners Miss#

The Red Team Acceleration#

McKinsey’s Response and What It Tells Us#

Lessons for Every Enterprise Running AI Platforms#

Frequently Asked Questions#

How was McKinsey’s Lilli AI agent hacked?#

What data was exposed in the McKinsey Lilli breach?#

What is SQL injection through JSON field names?#

Why are writable AI system prompts a security risk?#

How can enterprises protect their AI platforms from similar attacks?#