Photo by Taylor Vick on Unsplash Source

OpenAI’s Responses API shipped three upgrades in February 2026 that quietly changed what “building with OpenAI” means. Hosted shell containers give every agent its own Debian 12 terminal. Server-side compaction keeps agents coherent across sessions that span 5 million tokens and 150 tool calls. And SKILL.md support lets agents load modular, versioned playbooks at runtime. Combined, these three features turn the Responses API from a model endpoint into a full agent platform, complete with compute, memory, and reusable capabilities.

If you have been tracking the Assistants API deprecation, this is the answer to “what replaces it?” The Responses API is not a marginal improvement. It is OpenAI’s bet that the winning AI platform is the one that provides the entire agent runtime, not just the model weights.

Related: OpenAI Kills the Assistants API: MCP Won the Agent Interoperability War

Hosted Shell: Every Agent Gets a Terminal

The most visible upgrade is container_auto, a parameter you set in your Responses API request that provisions an OpenAI-hosted Debian 12 container for your agent. Not a sandboxed code interpreter. A full terminal environment with Python 3.11, Node.js 22, Java 17, Go 1.23, and Ruby 3.1 pre-installed.

{
  "model": "gpt-4.1",
  "tools": [
    {
      "type": "shell",
      "container": "container_auto"
    }
  ],
  "input": "Fetch the latest exchange rates and generate a CSV report"
}

That single API call gives your agent a terminal where it can pip install packages, run scripts, download files, process data, and write outputs to /mnt/data. The container has controlled internet access, so agents can pull dependencies and hit external APIs. When the session ends, the container is torn down. No persistent state leaks between runs.

Why This Matters More Than Code Interpreter

OpenAI’s Code Interpreter (now called “code runner”) has existed since 2023, but it was always a constrained sandbox. Python only, limited packages, no network access, no multi-language support. The hosted shell is categorically different:

  • Multi-language execution. Your agent can write a Python script to process data, a Node.js service to serve it, and a bash script to glue them together, all in one session.
  • Dependency installation. pip install pandas, npm install puppeteer, apt-get install ffmpeg, all possible inside the container.
  • Network access. Agents can curl APIs, clone repositories, and download datasets. This unlocks workflows that were impossible in the old sandbox.
  • File I/O with artifacts. Everything written to /mnt/data is available as a downloadable artifact after the session completes.

For teams building data pipelines, report generators, or code analysis tools on OpenAI’s platform, the hosted shell eliminates the need to run your own sandbox infrastructure. That is one less container orchestration layer to maintain, one less security boundary to worry about.

Related: AI Agent Sandboxing: MicroVMs, gVisor, and WASM for Safe Code Execution

Server-Side Compaction: Memory That Does Not Fade

Long-running agents have a fundamental problem: context windows are finite. Once a conversation exceeds the model’s token limit, you either truncate (losing early context) or summarize (losing precision). Both options degrade agent performance on multi-step workflows.

OpenAI’s server-side compaction takes a different approach. When the token count crosses a configured threshold, the model analyzes its prior conversation state and produces a compressed representation that preserves key facts, decisions, and intermediate results. This is not truncation or naive summarization. The compacted state is an encrypted, token-efficient artifact that the model is specifically trained to produce and consume.

{
  "model": "gpt-4.1",
  "context_management": {
    "compact_threshold": 100000
  },
  "input": "Continue analyzing the dataset from where we left off"
}

Triple Whale’s Proof Point

E-commerce analytics platform Triple Whale was one of the early testers. Their agent, Moby, ran a session involving 5 million tokens and 150 tool calls with no accuracy drop. That is roughly equivalent to a human analyst working through a complex investigation across hundreds of data queries, remembering every finding, every dead end, and every intermediate conclusion.

Before compaction, agents would start hallucinating or losing track of earlier findings after 20-30 tool calls. Triple Whale reported that Moby could “dig deeper, longer, and further without losing the thread.” For workflows like financial reconciliation, security log analysis, or multi-table data exploration, this changes what agents can reliably handle.

How Compaction Differs from Truncation

ApproachWhat happensRisk
TruncationOldest messages droppedLoses early context; agent forgets setup instructions
Naive summaryLLM summarizes historyLoses precision; numbers and specifics get lost
Server-side compactionModel produces trained compressed statePreserves key facts in encrypted token-efficient format

The practical difference is that compaction is trained behavior, not prompt engineering. OpenAI’s latest models (GPT-4.1 and above) are specifically fine-tuned to produce and consume compaction artifacts. The result is not a lossy summary but a structured state checkpoint that the model can expand back into working context.

Agent Skills: Modular Capabilities via SKILL.md

The third upgrade is native support for the SKILL.md standard, the same open specification that Anthropic created for Claude Code and that OpenAI adopted for Codex. A skill is a directory containing a SKILL.md manifest (YAML frontmatter + markdown instructions) plus optional scripts, templates, and reference files.

---
name: quarterly-report
description: Generate formatted quarterly financial reports
version: 1.2.0
---

# Quarterly Report Skill

## When to Use
Activate when the user requests quarterly financials,
revenue breakdowns, or period-over-period comparisons.

## Instructions
1. Query the data warehouse for the specified quarter
2. Calculate YoY and QoQ growth rates
3. Generate charts using matplotlib
4. Export to PDF with company branding

When a skill is loaded into a Responses API session, the agent consults its instructions whenever a matching task arises. Skills are model-invoked: the agent reads the available skill manifests and decides which ones to activate based on context. You do not manually trigger them.

Skills + Shell = Repeatable Agent Workflows

The real power emerges when skills and the hosted shell combine. A skill can reference scripts bundled in its directory. The agent reads the SKILL.md instructions, identifies the relevant scripts, and executes them inside the hosted shell container. This means you can package an entire workflow (data fetching, transformation, report generation) as a versioned skill that any agent can pick up and run.

Consider a compliance audit skill:

compliance-audit/
  SKILL.md
  scripts/
    check_gdpr_fields.py
    validate_retention_policy.py
  references/
    eu_ai_act_requirements.md

An agent with this skill loaded can run the full audit inside its shell container, reference the EU AI Act requirements document for context, and produce a structured report. The skill is versioned, testable, and shareable across teams.

This is materially different from MCP tool integration. MCP gives agents structured access to external services (databases, APIs, file systems). Skills give agents procedural knowledge: step-by-step instructions for how to do specific tasks. MCP is the toolbox. Skills are the training manual. Both integrate with the Responses API, and the combination of MCP tools + SKILL.md instructions + hosted shell compute is what makes the Responses API an agent platform rather than just an inference endpoint.

Related: AI Agent Skills Marketplace: The New Plugin Ecosystem

The Platform Play: Brain, Office, Memory, Manual

VentureBeat’s framing of this upgrade captures the strategy precisely: OpenAI is no longer just selling a “brain” (the model). It is selling the “office” (the hosted shell container), the “memory” (server-side compaction), and the “training manual” (skills).

This is a vertical integration play. Before these upgrades, using OpenAI for agents meant: call the model, get tokens back, handle everything else yourself. You needed your own sandbox for code execution, your own context management for long sessions, and your own skill/prompt management system. Now OpenAI bundles all three into the API layer.

What This Means for Architecture Decisions

If you are building agents on OpenAI, the calculus has changed:

Before: Use GPT-4 for inference, run your own Docker containers for code execution, implement custom context windowing, manage prompt templates in your repo.

After: Use the Responses API with container_auto for execution, compact_threshold for context management, and SKILL.md files for reusable capabilities. Your infrastructure footprint shrinks to “call the API and manage your skills directory.”

The tradeoff is lock-in. Every feature that moves from your infrastructure to OpenAI’s platform makes migration harder. If you use hosted shell extensively, switching to Anthropic or an open-source model means rebuilding your execution layer. If you rely on server-side compaction, you need to implement your own context management. The convenience is real, but so is the dependency.

For teams that want portability, the SKILL.md standard is the escape hatch. Because it is an open specification supported by both OpenAI and Anthropic, your skills work across providers. The shell and compaction features are OpenAI-specific, but the skills layer is portable.

Related: AI Agent Frameworks Compared: LangGraph, CrewAI, AutoGen

Who Should Use These Features (And Who Should Not)

Use hosted shell if you are building agents that need to execute arbitrary code, install dependencies, or produce file artifacts, and you do not want to manage your own sandbox infrastructure. Data analysis agents, report generators, and code review bots are natural fits.

Skip hosted shell if you need persistent environments between API calls (containers are ephemeral), you need GPU access for ML workloads, or your security requirements prohibit running code on third-party infrastructure.

Use compaction if your agents run multi-step workflows with more than 20-30 tool calls, or if you need agents to maintain coherence across sessions that exceed 100K tokens. Financial analysis, security investigation, and research synthesis agents benefit most.

Skip compaction if your agent interactions are short (under 50K tokens) or your workflows are stateless. Compaction adds latency when triggered and costs additional tokens for the compression step.

Use SKILL.md if you want reusable, versioned agent capabilities that work across OpenAI and Anthropic products. Skills are especially valuable for teams standardizing agent behavior across projects.

Skip SKILL.md if your agent performs a single, well-defined task that does not vary. Not everything needs to be a skill. A simple function call suffices for straightforward operations.

Frequently Asked Questions

What is OpenAI’s Responses API?

The Responses API is OpenAI’s primary API for building AI agents. It replaced the Chat Completions endpoint for agent workloads and is the successor to the deprecated Assistants API. It supports built-in tools (web search, file search, code execution), MCP server integration, hosted shell containers, server-side compaction, and SKILL.md agent skills.

What is container_auto in the OpenAI Responses API?

container_auto is a parameter that provisions an OpenAI-hosted Debian 12 container for your agent. It includes Python 3.11, Node.js 22, Java 17, Go 1.23, and Ruby 3.1 pre-installed, with controlled internet access and the ability to install additional dependencies. The container is ephemeral and torn down after the session ends.

How does server-side compaction work in the Responses API?

Server-side compaction triggers when the token count crosses a configured threshold. The model analyzes its prior conversation state and produces a compressed, encrypted representation that preserves key facts and decisions. Unlike truncation, compaction is trained behavior: GPT-4.1 and above are specifically fine-tuned to produce and consume these compressed state artifacts.

What is the SKILL.md format for AI agent skills?

SKILL.md is an open standard for defining modular agent capabilities. A skill is a directory containing a SKILL.md file with YAML frontmatter (name, description, version) and markdown instructions, plus optional scripts and reference files. Both OpenAI and Anthropic support the same specification, making skills portable across providers.

Is the OpenAI Responses API replacing the Assistants API?

Yes. OpenAI deprecated the Assistants API in August 2025 with a hard sunset date of August 26, 2026. The Responses API, paired with the Conversations API for stateful interactions, is the official replacement. OpenAI published a migration guide covering the key architectural differences.