Photo by Roman Synkevych on Unsplash Source

One in seven pull requests on GitHub now involves an AI reviewer. That number was one in ninety a year ago. The shift did not come from a single product launch or feature announcement. It came from a quiet architectural bet: treating the repository itself as the execution environment for AI agents.

GitHub Next, the company’s R&D lab, published the framework behind this bet on February 5, 2026. They call it Continuous AI: natural-language rules combined with agentic reasoning, executed continuously inside your repository through GitHub Actions. Not a chatbot. Not an autocomplete engine. A new category of CI that handles the judgment calls your YAML pipelines never could.

Related: Software Factories: When AI Agents Build Software Without Human Review

What Continuous AI Actually Is

Traditional CI is deterministic. Tests pass or fail. Builds succeed or break. Linters flag violations against a static ruleset. Continuous AI targets the work that cannot be expressed as a rule or a flow chart: code review that understands intent, documentation that stays in sync with code changes, issue triage that reads natural language, and fault analysis that explains why a CI run failed rather than just reporting that it did.

Idan Gazit, head of GitHub Next, frames it this way: “Any time something can’t be expressed as a rule or a flow chart is a place where AI becomes incredibly helpful.” The insight is that most developer workflows already run on event triggers (push, pull request, schedule). What they lack is reasoning at the trigger point.

GitHub Next identifies eight categories of Continuous AI:

  • Continuous Documentation: Detecting when code and docs drift apart, then generating updates
  • Continuous Code Improvement: Flagging performance anti-patterns like regex compilation inside loops
  • Continuous Triage: Labeling and routing issues using NLP instead of static keyword matching
  • Continuous Summarization: Weekly project digests synthesizing commits, PRs, and CI results
  • Continuous Fault Analysis: Explaining CI failures in plain language, not just printing stack traces
  • Continuous Quality: Enforcing coding standards that go beyond what a linter can check
  • Continuous Accessibility: Catching accessibility regressions on every deploy
  • Continuous Team Motivation: Celebrating milestones (yes, really)

Each one runs as a GitHub Action, triggered by repository events, processed by an LLM, and constrained by explicit permissions. The output is always an artifact a developer already knows how to review: a PR comment, an issue label, a commit suggestion.

The gh aw CLI: Writing Agent Rules in Markdown

The technical core of Continuous AI is gh aw, a CLI tool from GitHub Next that turns natural-language rules into GitHub Actions workflows. The workflow is three steps.

Step 1: Write Rules in Markdown

You create a .github/agents/ directory and drop in markdown files. Each file is a rule:

# Auto-label bug reports
When a new issue is opened, read its title and body.
If it describes a crash, unexpected behavior, or error,
add the "bug" label. If it's a feature request, add "enhancement".

No YAML syntax. No regex matchers. Plain English (or German, or any language the underlying model supports).

Step 2: Compile to Secure Workflows

Running gh aw compile transforms your markdown rules into .lock.yml files: standard GitHub Actions workflows with the AI reasoning baked in. The compiled output enforces read-only permissions by default, sandboxed execution, tool allowlisting, and network isolation.

Step 3: Trigger via GitHub Actions

The compiled workflows fire on standard GitHub events: pull_request, issues, push, schedule. The LLM processes the event payload against your rule, produces output, and writes it back as a PR comment, issue label, or commit suggestion.

Supported AI engines include Copilot, Claude, Codex, and custom processors. You pick the model per rule.

Related: AI Agent Frameworks Compared: LangGraph, CrewAI, AutoGen

Copilot, Claude, and Codex as Repository Agents

The Continuous AI framework is model-agnostic, but three agents already have first-class GitHub integration.

GitHub Copilot Coding Agent

The Copilot coding agent works like a junior developer you assign issues to. You tag an issue with @copilot, or mention it in VS Code with @github Open a pull request to refactor this query generator. The agent boots a VM, clones your repo, analyzes the codebase using RAG powered by GitHub’s code search index, and pushes commits to a draft PR.

James Zabinski, DevEx Lead at EY, describes the workflow: “The Copilot coding agent is opening up doors for human developers to have their own agent-driven team, all working in parallel.”

The key constraint: the agent never merges code. It creates PRs. As Gazit puts it, “The PR is the existing noun where developers expect to review work.” Human approval is required before any change lands on a protected branch.

Claude and Codex on GitHub

As of February 4, 2026, Anthropic’s Claude and OpenAI’s Codex are available as coding agents for Copilot Pro+ and Enterprise customers. No additional subscription. You assign issues to them the same way you assign to Copilot, or mention @claude or @codex for review feedback. Each session consumes one premium request.

This makes GitHub the first major platform where three competing AI models run as peer agents inside the same repository, all governed by the same permission model, all producing the same artifact type: pull requests.

Related: GPT-5.3-Codex vs. Claude Opus 4.6: The Coding Agent Wars

The Numbers: AI Review Is Already Mainstream

A study analyzing 40.3 million pull requests found that AI agent participation in PRs grew from 1.1% in February 2024 to 14.9% by November 2025. A 14x increase in 18 months. Three agents control 72% of all AI review activity: CodeRabbit (~33%), Copilot (~29%), and Gemini (~10%).

The productivity data is equally blunt. A study of 4,800 developers found tasks completed 55% faster with Copilot. PR turnaround time dropped from 9.6 days to 2.4 days at organizations using automated review: a 75% reduction. Copilot now generates an average of 46% of code written by its users, rising to 61% for Java developers.

GitHub Next ran their own experiment to validate Continuous AI economics: they generated 1,400 tests across 45 days for approximately $80 in LLM token costs, achieving near-complete coverage. That is the cost profile that makes continuous, always-on AI economically viable even for small teams.

An academic study of 8,031 AI-authored PRs found that agents modify CI/CD configurations in only 3.25% of all changes. The overwhelming majority of agent work is application code, not infrastructure. GitHub Actions accounts for 96.77% of all CI/CD changes made by AI agents, which means agents have minimal competence with Jenkins, CircleCI, or other CI systems. PRs with CI/CD changes merged at 67.77%, versus 71.80% for non-CI/CD changes, suggesting reviewers are more cautious when agents touch pipeline configs.

What Continuous AI Gets Wrong (For Now)

GitHub’s own documentation is refreshingly honest about the limits. Agent mode is not good for altering domain invariants without human review, redesigning service boundaries, replacing logic that requires institutional knowledge, or debugging deep runtime issues.

The security surface is real. Every agent is an identity with credentials. Only slightly more than half of AI-generated code is correct and secure. When prompts are ambiguous, LLMs optimize for the shortest path to a passing result, even if that means using dangerous functions. Source code accounts for 42% of AI risk-related data policy violations, as developers upload proprietary code to AI services without reviewing what gets sent.

GitHub has built guardrails: agents run read-only by default, cannot push to default branches (only branches they create), require human approval before CI/CD workflows execute, and have internet access limited to customizable trusted destinations. The requestor cannot approve their own agent’s pull request. These constraints are meaningful, but they only work if teams actually enforce them.

The 3.25% CI/CD change rate also reveals a deeper gap. Agents are good at writing application code. They struggle with infrastructure, build systems, and the glue that connects repositories to production. If your bottleneck is flaky tests or broken deploys rather than writing feature code, Continuous AI will not solve it yet.

How GitLab and Atlassian Compare

GitHub is not the only platform building agentic CI. GitLab Duo now includes merge request summaries, root cause analysis, and autonomous workflow agents in its Premium and Ultimate tiers. Duo’s advantage is that it inherits GitLab’s DevSecOps stack: every AI suggestion gets cross-checked against SAST/DAST gates and license policies automatically, while Copilot relies on external scanners like SonarQube or CodeQL.

Atlassian’s Rovo Dev takes a different angle. It understands project context from Jira, Confluence, and Bitbucket together. When a pipeline test fails, Rovo Dev triages the failure, attempts a fix, generates a PR, and re-runs the merge. Natural-language pipeline steps in Bitbucket Cloud are coming in 2026, augmenting static scripts with AI reasoning.

The competitive pattern is clear: every major DevOps platform is converging on the same idea. The repository and its surrounding tools become the agent’s workspace. The PR becomes the agent’s output format. Human review becomes the control mechanism. GitHub’s advantage is scale (4.7 million paid Copilot users, 90% of Fortune 100 companies) and the fact that their agent permission model already supports multiple competing AI models in a single workflow.

Related: What Are AI Agents? A Practical Guide for Business Leaders

Getting Started with Continuous AI

If you want to try this on an existing repository, the ramp is gentle:

  1. Install the CLI: gh extension install github/gh-aw
  2. Create a rule directory: .github/agents/ in your repo
  3. Write your first rule: Start with something low-risk, like auto-labeling issues or generating PR summaries
  4. Compile: gh aw compile generates the Actions workflow
  5. Merge the workflow file and let it run on the next event trigger

The awesome-continuous-ai repository catalogs the ecosystem: tools like Penify.dev for documentation, DiffBlue for automated testing, CodeRabbit for review, and GenAIScript for writing custom Continuous AI workflows in JavaScript.

Start with read-only tasks. Triage, summarization, and documentation are safe. Move to code suggestions once you have confidence in the model’s output quality for your specific codebase. And keep humans in the approval loop for anything that touches production branches.

Frequently Asked Questions

What is Continuous AI on GitHub?

Continuous AI is a framework from GitHub Next that combines natural-language rules with agentic reasoning, executed continuously inside repositories through GitHub Actions. Unlike traditional CI which handles deterministic tasks (pass/fail tests), Continuous AI handles judgment-heavy work like code review, documentation updates, issue triage, and fault analysis using LLMs.

How does the gh aw CLI tool work?

The gh aw CLI from GitHub Next has a three-step workflow: write natural-language rules in markdown files inside a .github/agents/ directory, compile them into secure GitHub Actions workflows using gh aw compile, and run them automatically on repository events like pull requests, pushes, or schedules. Supported AI engines include Copilot, Claude, and Codex.

Can Claude and Codex run as agents directly on GitHub?

Yes. As of February 4, 2026, Anthropic’s Claude and OpenAI’s Codex are available in public preview on GitHub for Copilot Pro+ and Enterprise customers. You can assign issues to them, mention @claude or @codex for review feedback, and each session consumes one premium request. No additional subscription is required.

What percentage of GitHub pull requests involve AI reviewers?

A study of 40.3 million pull requests found that AI agent participation grew from 1.1% in February 2024 to 14.9% by November 2025, meaning roughly one in seven PRs now involves an AI reviewer. The top three AI review agents are CodeRabbit (33% share), Copilot (29%), and Gemini (10%).

Is Continuous AI safe to use in production repositories?

GitHub has built security constraints into the system: agents run read-only by default, cannot push to default branches, require human approval before CI/CD workflows execute, and have limited internet access. However, only slightly more than half of AI-generated code is considered correct and secure, so human review of agent-produced PRs remains critical.