The OpenAI Agents SDK shipped its most important update on February 5, 2026, and most developers missed it. Version 0.8.0 added three features that move the SDK from “interesting experiment” to “production-viable framework”: tool-level human approval via needs_approval, full run state serialization through RunState, and codex tool integration for agents that can write and execute code autonomously.

Before v0.8, building an approval workflow with the OpenAI Agents SDK meant hacking around the framework. You had to intercept tool calls, manage state yourself, and stitch everything back together when the human finally responded. Now it is a first-class feature. Three lines of code give you a production-grade human-in-the-loop pattern.

Here is what changed, why it matters, and how to use each feature with working code.

Related: Human-in-the-Loop AI Agents: When to Let Agents Act and When to Hit Pause

What v0.8.0 and v0.8.1 Actually Ship

The v0.8.0 release merged over 30 pull requests from 13 contributors. The headline features are HITL support and RunState management, but the release also includes changes that affect existing code.

Breaking behavior changes you need to know about:

Synchronous tool functions now run on worker threads via asyncio.to_thread() instead of the event loop. If your tool relies on thread-local state or thread affinity, it will break silently. The fix: migrate to async implementations, or make your thread dependencies explicit.

MCP tool failures are now handled differently. The default behavior returns a model-visible error string instead of raising an exception. If your error handling depends on catching exceptions from MCP tool calls, set mcp_config={"failure_error_function": None} to restore the old fail-fast behavior.

Other additions in v0.8.0:

  • Structured agent tool input support for typed parameters
  • Max turns error handlers so agents fail gracefully instead of raising MaxTurnsExceeded
  • Session customization parameters for persistent conversations
  • MCP tool meta resolver for dynamic tool discovery
  • Image response support from MCP servers
  • Configurable tool error formatting

Version 0.8.1, released the next day on February 6, patched in four fixes: codex_tool thread reuse within run contexts, configurable max turns in the REPL, persistence fixes for streamed run-again tool items, and backward compatibility for handoff target resolution in run-state agent maps.

The SDK has since continued shipping at a rapid pace, reaching v0.12.5 on PyPI as of March 2026. But v0.8.0 is the version where the SDK became genuinely usable for enterprise workflows.

needs_approval: Tool-Level Human Oversight in Three Lines

The simplest HITL pattern in any framework. You add needs_approval=True to a tool decorator, and the agent pauses before executing that tool. No state machines. No custom middleware. No webhook infrastructure.

from agents import Agent, Runner, function_tool

@function_tool(needs_approval=True)
def transfer_funds(from_account: str, to_account: str, amount: float):
    """Transfer funds between accounts. Requires human approval."""
    return banking_api.transfer(from_account, to_account, amount)

agent = Agent(
    name="Finance Agent",
    instructions="Help users manage their accounts.",
    tools=[transfer_funds],
)

result = await Runner.run(agent, input="Transfer $5,000 from checking to savings")

When the agent decides to call transfer_funds, the run pauses. The result object contains an interruptions list with every pending approval:

if result.interruptions:
    for item in result.interruptions:
        print(f"Agent: {item.agent.name}")
        print(f"Tool: {item.raw_item.name}")
        print(f"Args: {item.raw_item.arguments}")
        # Human reviews and decides
        result.state.approve(item)  # or result.state.reject(item)

    # Resume from where the agent paused
    result = await Runner.run(agent, state=result.state)

Dynamic approval with a function. You do not always want blanket approval requirements. A $50 transfer might be fine. A $50,000 transfer needs a human. Pass a callable instead of a boolean:

async def check_amount(run_context, params, call_id):
    import json
    args = json.loads(params)
    return args.get("amount", 0) > 1000

@function_tool(needs_approval=check_amount)
def transfer_funds(from_account: str, to_account: str, amount: float):
    """Transfer funds between accounts."""
    return banking_api.transfer(from_account, to_account, amount)

Now the agent transfers small amounts without interruption, but pauses for anything over $1,000. This is the pattern that makes HITL practical at scale: not every action needs a human, only the ones that cross a risk threshold.

Related: AI Agent Permission Boundaries: The Compliance Pattern Every Enterprise Needs

RunState: Pause an Agent, Serialize It, Resume Tomorrow

The approval flow above works within a single process. But production workflows rarely stay in one process. A customer submits a request at 2 PM. The compliance officer reviews it at 9 AM the next day. Your server may restart, redeploy, or scale down in between.

RunState solves this by making the entire agent execution state serializable. You can write it to a database, a message queue, or a file, then reconstruct the exact execution context later.

# Agent hits an approval point
result = await Runner.run(agent, input="Delete all inactive customer records")

if result.interruptions:
    # Serialize the entire state
    state_json = result.state.to_json()
    # Store it wherever you want
    await database.save("pending_approval_123", json.dumps(state_json))

# --- Hours or days later, different process ---

state_data = json.loads(await database.load("pending_approval_123"))
restored_state = await RunState.from_json(state_data, starting_agent=agent)

# Review and approve
for item in restored_state.get_interruptions():
    restored_state.approve(item)

# Resume execution
result = await Runner.run(agent, state=restored_state)

What gets serialized. RunState captures: the current turn number, the active agent, all generated items and model responses, the original user input, approval states, usage metrics, and session identifiers. It is a complete snapshot of where the agent was when it paused.

Context serialization. If your agent uses a custom context object, RunState handles common types automatically: dictionaries pass through directly, Pydantic models serialize via model_dump(), and dataclasses use asdict(). For custom objects, provide explicit serializer and deserializer functions:

state_json = result.state.to_json(
    context_serializer=lambda ctx: {"user_id": ctx.user_id, "session": ctx.session_id},
)

restored = await RunState.from_json(
    state_data,
    starting_agent=agent,
    context_deserializer=lambda d: MyContext(user_id=d["user_id"], session=d["session_id"]),
)

Partial approval. You do not need to resolve every interruption at once. Approve two out of five pending items, resume, and the agent continues executing the approved calls while the remaining three stay paused. This enables multi-level approval chains where a manager approves routine items and an executive handles the high-value ones.

The Codex Tool: Agents That Write and Run Code

The codex_tool integration, refined in v0.8.1 with thread reuse, connects your agent to OpenAI’s Codex for workspace-scoped tasks. The agent can run shell commands, edit files, and invoke MCP tools autonomously.

This is not a simple “generate code and paste it” feature. The codex_tool creates a sandboxed execution environment where the agent can:

  • Run shell commands in a controlled workspace
  • Read and modify files within scope
  • Use MCP tools available in the environment
  • Maintain context across multiple operations via thread reuse (added in v0.8.1)

Thread reuse was the critical v0.8.1 fix. Without it, every codex_tool invocation started from scratch. With it, the agent builds up context across a conversation: it reads a file, understands the codebase, makes changes, and verifies them, all within the same thread context.

Combining codex_tool with needs_approval. The real power comes from mixing these features. Let the agent write code freely, but require approval before it executes anything:

from agents_extensions.experimental.codex import codex_tool

code_tool = codex_tool(needs_approval=True)

agent = Agent(
    name="DevOps Agent",
    instructions="Help engineers with infrastructure changes.",
    tools=[code_tool],
)

The agent generates a database migration script, pauses, shows the human the exact SQL it plans to execute, and only proceeds after approval. This is the pattern every team running AI-assisted DevOps needs.

Building a Production Approval Workflow

Here is a complete example that combines all three features into a workflow you could deploy behind a web API:

import json
from agents import Agent, Runner, function_tool, RunContext

# Define tools with different approval levels
@function_tool
def read_customer_data(customer_id: str):
    """Read customer information. No approval needed."""
    return db.get_customer(customer_id)

@function_tool(needs_approval=True)
def update_customer_record(customer_id: str, field: str, value: str):
    """Update customer record. Requires approval."""
    return db.update_customer(customer_id, field, value)

async def high_value_check(run_context, params, call_id):
    args = json.loads(params)
    return args.get("amount", 0) > 500

@function_tool(needs_approval=high_value_check)
def issue_refund(customer_id: str, amount: float, reason: str):
    """Issue a refund. Approval needed for amounts over $500."""
    return billing.refund(customer_id, amount, reason)

support_agent = Agent(
    name="Customer Support",
    instructions="Handle customer requests efficiently and accurately.",
    tools=[read_customer_data, update_customer_record, issue_refund],
)

# API endpoint: start a request
async def handle_request(user_message: str) -> dict:
    result = await Runner.run(support_agent, input=user_message)

    if result.interruptions:
        state_json = result.state.to_json()
        request_id = await store_pending(state_json, result.interruptions)
        return {
            "status": "pending_approval",
            "request_id": request_id,
            "pending": [
                {
                    "tool": item.raw_item.name,
                    "args": item.raw_item.arguments,
                }
                for item in result.interruptions
            ],
        }

    return {"status": "completed", "output": result.final_output}

# API endpoint: approve and resume
async def handle_approval(request_id: str, decisions: dict) -> dict:
    state_data = await load_pending(request_id)
    state = await RunState.from_json(state_data, starting_agent=support_agent)

    for item in state.get_interruptions():
        tool_name = item.raw_item.name
        if decisions.get(tool_name) == "approved":
            state.approve(item)
        else:
            state.reject(item, rejection_message=decisions.get(f"{tool_name}_reason", ""))

    result = await Runner.run(support_agent, state=state)

    if result.interruptions:
        # More approvals needed (multi-step workflow)
        return await handle_request_continued(result)

    return {"status": "completed", "output": result.final_output}

This pattern integrates with Temporal, Restate, or DBOS for durable execution. If your server crashes between the approval request and the human response, the serialized state in your database means nothing is lost.

Related: AI Agent Frameworks Compared: LangGraph, CrewAI, AutoGen

How v0.8 Stacks Up Against the Competition

The HITL landscape across frameworks looks different now. LangGraph has had interrupt() for graph-level checkpoints since mid-2025, and its approach gives you more control over where in a complex graph the agent pauses. CrewAI offers human_input=True at the task level. Pydantic AI builds approval into its type-safe tool definitions.

The OpenAI Agents SDK’s approach is more opinionated and narrower: tool-level approval with serializable state. You do not get LangGraph’s graph-level interrupts or CrewAI’s task-level human input. But you get a simpler mental model and arguably the cleanest API for the most common use case: “pause before this specific tool runs.”

The v0.8 release also matters because of shipping velocity. The SDK went from v0.0.1 to v0.12.5 in about seven weeks. That is not just version inflation. Each release has added substantive features: MCP support, handoffs, guardrails, tracing, websocket transport, tool search integration. For teams already in the OpenAI ecosystem, the SDK is becoming harder to ignore.

Source

Frequently Asked Questions

What is needs_approval in the OpenAI Agents SDK?

needs_approval is a parameter on the @function_tool decorator that tells the SDK to pause the agent run before executing that tool. It accepts either True (always require approval) or an async function that returns a boolean for dynamic, conditional approval. When triggered, the run returns interruptions that a human can approve or reject via RunState.approve() and RunState.reject().

How do you serialize and resume an agent run in the OpenAI Agents SDK?

Use RunState.to_json() to serialize the entire execution state to a JSON-compatible dictionary, store it in a database or queue, then reconstruct it later with RunState.from_json(). The serialized state includes the current turn, active agent, all generated items, model responses, approval states, and usage metrics. After restoring, resolve any pending approvals and pass the state back to Runner.run().

What is the codex_tool in the OpenAI Agents SDK?

The codex_tool connects an agent to OpenAI’s Codex for running shell commands, editing files, and invoking MCP tools within a sandboxed workspace. Version 0.8.1 added thread reuse, which lets the agent maintain context across multiple operations in the same conversation instead of starting fresh each time.

How does OpenAI Agents SDK HITL compare to LangGraph interrupt?

LangGraph’s interrupt() works at the graph level, pausing execution at specific nodes in a state graph. The OpenAI Agents SDK’s needs_approval works at the tool level, pausing before specific tool calls. LangGraph gives more control over complex multi-step workflows. The OpenAI SDK offers a simpler API for the most common pattern: stopping before a sensitive tool runs.

What version of the OpenAI Agents SDK introduced human-in-the-loop support?

Version 0.8.0, released on February 5, 2026, introduced human-in-the-loop support with the needs_approval parameter, RunState serialization, and interruption handling. Version 0.8.1 followed the next day with codex_tool thread reuse and additional fixes.