Promptfoo is a CLI tool that throws thousands of automated attacks at your AI agent and tells you which ones get through. It scans for 50+ vulnerability types including prompt injection, PII leakage, RBAC bypass, and unauthorized tool execution. With 18,000+ GitHub stars, 350,000+ users, and confirmed adoption at over 25% of Fortune 500 companies, it has become the default open-source choice for AI red teaming. On March 9, 2026, OpenAI announced it was acquiring Promptfoo for approximately $86 million, the clearest signal yet that agent security testing has moved from nice-to-have to infrastructure.
This matters because most teams building AI agents still ship without any adversarial testing at all. They run a few manual prompts, check that the happy path works, and deploy. Promptfoo automates the part where someone tries to break your agent before a real attacker does.
How Promptfoo Works: Plugins, Strategies, and LLM-as-Judge
Promptfoo’s red teaming architecture has three components that work together: plugins generate adversarial inputs, strategies determine delivery techniques, and an LLM judge grades the results.
Plugins: What to Attack
Each plugin targets a specific vulnerability type. The prompt injection plugin generates inputs designed to override system instructions. The PII plugin probes for personal data leakage. The RBAC plugin tests whether the agent respects permission boundaries. You enable the ones relevant to your application in a YAML config file:
redteam:
plugins:
- prompt-injection
- pii-direct
- rbac
- shell-injection
- excessive-agency
strategies:
- jailbreak
- base64
- crescendo
Promptfoo ships with over 50 plugins covering security, privacy, harmful content, bias, and industry-specific compliance checks for healthcare, finance, and insurance applications.
Strategies: How to Attack
Strategies wrap plugin-generated payloads in delivery techniques designed to bypass content filters. Base64 encoding, leetspeak obfuscation, multi-turn escalation (Crescendo), and Meta’s GOAT framework for conversational attacks are all built in. A January 2026 release added emoji encoding and a “Mischievous User” strategy that simulates subtly manipulative users across multi-turn conversations.
The most powerful strategy is Hydra, Promptfoo’s adaptive attack system. Hydra maintains persistent memory across an entire scan, pivoting between conversation branches and refining attacks based on previous responses. If a direct prompt injection fails, Hydra automatically tries indirect approaches, encoding tricks, and multi-step escalation. This is closer to how a human red teamer actually operates: adapting strategy based on what the target reveals.
LLM-as-Judge Grading
After each attack, an LLM (GPT-5 by default) evaluates the agent’s response to determine whether the vulnerability was exploited. Did the agent leak PII? Did it follow the injected instruction? Did it execute an unauthorized action? The judge produces a pass/fail verdict with an explanation, and results aggregate into a dashboard showing your agent’s overall security posture.
This matters because manual review of thousands of attack-response pairs is not feasible. Automated grading lets you run comprehensive scans as part of your CI/CD pipeline, catching regressions before they reach production.
What Promptfoo Actually Catches
The vulnerability taxonomy maps directly to the OWASP LLM Top 10 and covers six categories that matter for agent builders.
Security and access control. Prompt injection (direct and indirect), SQL injection, shell injection, SSRF, BOLA (Broken Object Level Authorization), RBAC bypass, debug access exposure, and system prompt extraction. These are the vulnerabilities that let attackers take control of your agent or access data they should not see.
Data privacy. Direct PII exposure, session data leakage across users, and compliance-specific checks for COPPA and FERPA. If your agent handles any personal data, and most business agents do, these probes find the places where that data leaks.
Excessive agency. This is the OWASP risk category that is unique to agents. Can the agent be convinced to take actions beyond its intended scope? Can it be tricked into calling tools it should not have access to? Promptfoo tests for goal hijacking, unauthorized tool execution, and privilege escalation through conversation manipulation.
Harmful content and bias. Hate speech generation, self-harm content, radicalization pathways, and bias across age, gender, disability, and race. These probes are essential for customer-facing agents where a single toxic response becomes a PR incident.
Misinformation. Hallucination detection, false claim generation, competitor impersonation, and unauthorized professional advice (medical, legal, financial). The tool checks whether your agent confidently states things that are not true, a failure mode that traditional tests rarely catch.
Industry-specific compliance. Pre-built plugin sets for healthcare (HIPAA violations, incorrect medical knowledge), finance (calculation errors, compliance violations), insurance (PHI disclosure, coverage discrimination), and e-commerce (pricing manipulation, inventory claims).
Promptfoo vs. PyRIT vs. Garak: Which Red Teaming Tool Fits
Three open-source tools dominate AI red teaming in 2026. They overlap, but each optimizes for a different workflow.
Promptfoo (MIT, 18K+ stars) is the generalist. It combines red teaming with general-purpose evaluation, runs from a CLI with YAML configs, and integrates natively into CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins). Best for: teams that want red teaming as part of their development workflow, not as a separate security exercise. Its 50+ plugins and adaptive Hydra system make it the most comprehensive scanner.
PyRIT (MIT, Microsoft) is the programmatic orchestration framework. Written in Python, it treats red teaming as a coding exercise: you write Python scripts that define attack campaigns using orchestrators, scorers, and converters. Best for: security researchers and red team professionals who want full control over attack logic and already work in Python. PyRIT offers more granular programmatic control but requires more setup than Promptfoo’s declarative YAML approach.
Garak (Apache 2.0, NVIDIA) is the vulnerability scanner. It ships with 120+ probe modules that test model endpoints for known vulnerability patterns. Best for: testing raw model behavior before integration into an agent. Garak focuses on model-level vulnerabilities rather than the application-layer issues (tool misuse, RBAC bypass, agentic excessive agency) that Promptfoo targets.
If you are building agents and want security testing in your pipeline, start with Promptfoo. If you are a security team running structured red team campaigns, evaluate PyRIT. If you are evaluating base model safety before building on top of it, Garak fills that gap.
The OpenAI Acquisition: What Changes and What Does Not
On March 9, 2026, OpenAI announced the acquisition of Promptfoo. The deal valued the 11-person startup at approximately $86 million based on its $85.5 million post-money valuation from a July 2025 funding round led by a]16z. The technology is being integrated into OpenAI Frontier, the enterprise agent management platform launched in February 2026 that already counts Uber, State Farm, Intuit, and Thermo Fisher Scientific among its early customers.
Three things to note about what this means.
Promptfoo stays open source. OpenAI confirmed the MIT license remains. The GitHub repository continues to accept contributions. This is consistent with OpenAI’s pattern of maintaining open-source acquisitions (they did the same with Rockset’s query engine). Whether this holds in three years is a separate question, but for now, the tool remains free.
The integration target is enterprise, not developer tooling. Frontier is OpenAI’s play for regulated industries: healthcare, financial services, government. These organizations need auditable security testing for their AI agents. Promptfoo’s automated red teaming plugged directly into that gap. The 25%+ Fortune 500 adoption gave OpenAI instant enterprise distribution.
Vendor lock-in risk is real. Promptfoo currently supports 30+ LLM providers, including Anthropic, Google, Meta, and open-source models via Ollama. The open-source promise protects this for now. But the commercial roadmap will inevitably prioritize OpenAI’s own models and APIs. Teams building on non-OpenAI stacks should monitor the project closely for any changes to multi-provider support.
For agent builders today, the practical impact is minimal. Promptfoo works the same way it did last week. The acquisition validates the category: if OpenAI is spending $86 million on agent security testing, your team should probably be spending some engineering time on it too.
Getting Started: A 15-Minute Red Team Scan
Install Promptfoo and run your first scan against any LLM-powered endpoint:
npx promptfoo@latest init --no-interactive
npx promptfoo@latest redteam init
This generates a promptfooconfig.yaml with sensible defaults. Point it at your agent’s API endpoint:
targets:
- id: https
config:
url: "https://your-agent.example.com/api/chat"
method: POST
headers:
Authorization: "Bearer {{env.API_KEY}}"
body:
message: "{{prompt}}"
responseParser: "json.response"
Run the scan:
npx promptfoo@latest redteam run
npx promptfoo@latest redteam report
Promptfoo generates adversarial test cases, sends them to your endpoint, grades the responses, and produces a report showing which vulnerability categories your agent is exposed to. The free tier includes 10,000 red team probes per month, enough for continuous scanning during development.
For CI/CD integration, add it to your GitHub Actions workflow:
- name: Red team scan
run: npx promptfoo@latest redteam run --ci
env:
API_KEY: ${{ secrets.AGENT_API_KEY }}
A failing scan blocks the deployment, just like a failing unit test. That is the workflow shift Promptfoo enables: treating security as a continuous gate, not a quarterly audit.
Frequently Asked Questions
What is Promptfoo and what does it do?
Promptfoo is an open-source CLI tool for red teaming and evaluating AI agents and LLM applications. It automatically generates thousands of adversarial attacks against your AI system, testing for 50+ vulnerability types including prompt injection, PII leakage, RBAC bypass, and excessive agency. It uses an LLM-as-judge system to grade whether each attack succeeded and produces security reports.
Is Promptfoo still open source after the OpenAI acquisition?
Yes. OpenAI confirmed that Promptfoo remains open source under the MIT license after the March 2026 acquisition. The GitHub repository continues to accept community contributions. The technology is being integrated into OpenAI Frontier, their enterprise platform, but the open-source tool remains available independently.
How does Promptfoo compare to PyRIT and Garak?
Promptfoo is the generalist with 50+ plugins and CI/CD integration via YAML config. PyRIT (Microsoft) is a Python framework for programmatic red team campaigns with more granular control. Garak (NVIDIA) is a model-level vulnerability scanner with 120+ probes focused on base model safety rather than application-layer agent issues. For agent builders wanting security in their dev pipeline, Promptfoo is typically the best starting point.
What vulnerability types can Promptfoo detect in AI agents?
Promptfoo detects 50+ vulnerability types across six categories: security and access control (prompt injection, SQL injection, RBAC bypass), data privacy (PII exposure, session leaks), excessive agency (goal hijacking, unauthorized tool use), harmful content and bias, misinformation (hallucinations, false claims), and industry-specific compliance issues for healthcare, finance, and insurance applications.
How much does Promptfoo cost?
Promptfoo’s open-source CLI is free under the MIT license. The hosted service offers a free tier with 10,000 red team probes per month, which is sufficient for development-stage testing. Paid enterprise tiers are available through OpenAI Frontier for organizations needing higher volume, persistent dashboards, and enterprise support.
