Photo by Magda Ehlers on Pexels Source

Every coding agent ships its own API. Claude Code uses a CLI with JSON output. Codex has a REST endpoint. Amp speaks its own protocol. If you are building a platform that needs to run multiple coding agents in sandboxed environments, you are writing and maintaining three, four, or five separate integrations. Rivet’s Sandbox Agent SDK collapses that into a single HTTP/SSE interface: install a 15MB Rust binary inside any sandbox, and every coding agent speaks the same language.

The project hit 1,200 GitHub stars in under two months. Nathan Flurry, Rivet’s CTO, describes it as “the Vercel AI SDK for coding agents.” That comparison is instructive: just as the Vercel AI SDK abstracts away model provider differences for chat completions, the Sandbox Agent SDK abstracts away agent provider differences for autonomous coding sessions.

Related: AI Agent Sandboxing: MicroVMs, gVisor, and WASM for Safe Code Execution

The Problem: API Fragmentation Meets Unsafe Execution

Running a coding agent autonomously means giving an LLM the ability to execute arbitrary shell commands on a real filesystem. That alone is reason enough to sandbox it. But the fragmentation problem is equally painful for anyone building on top of these agents.

Claude Code outputs structured JSON events through its CLI. OpenAI’s Codex exposes a different REST API. Amp, OpenCode, and Cursor each handle sessions, permissions, and event streaming differently. Want to let users pick their preferred coding agent? You need to build and maintain separate integrations for each one, handle different event schemas, and normalize the output into a consistent format your application understands.

A Hacker News thread on sandboxing coding agents documented agents attempting to create fake npm tarballs with forged SHA-512 hashes, masking failures with shell operators, cloning workspaces to bypass file restrictions, and building userland networking stacks to circumvent container-level network controls. According to Anthropic’s own engineering blog, sandboxing reduces permission prompts by 84%, which matters for truly autonomous operation.

The Sandbox Agent SDK addresses both problems: isolation from the sandbox provider, control from the unified API.

How the Sandbox Agent SDK Works

The architecture is straightforward. A single Rust binary (sandbox-agent) runs inside your sandbox environment. Your application connects to it over HTTP and receives events via Server-Sent Events (SSE). The binary handles all the complexity of starting, configuring, and communicating with whichever coding agent you specify.

Two Modes of Operation

HTTP Server mode is the simplest. Start the binary:

sandbox-agent server --token "$TOKEN" --host 127.0.0.1 --port 2468

Then connect from any language via REST. Create a session, send prompts, stream events. The full OpenAPI spec documents every endpoint.

Embedded TypeScript SDK mode wraps the binary in a Node.js package:

import { SandboxAgent } from "sandbox-agent";

const agent = await SandboxAgent.start();
const session = await agent.createSession({
  agent: "claude-code",
  permissionMode: "auto-approve"
});

for await (const event of session.streamEvents()) {
  console.log(event.type, event.data);
}

Both modes expose the same normalized event schema, so switching between them requires no changes to your event handling code.

The Normalized Event Schema

This is where the SDK delivers the most value. Regardless of which agent you run, you get a consistent set of events:

  • session.started / session.ended for lifecycle management
  • item.started / item.delta / item.completed for messages and tool calls with streaming
  • question.requested / question.resolved for human-in-the-loop interactions
  • permission.requested / permission.resolved for tool execution approvals

Every event follows the same structure. Your application parses one schema, not five. Persist sessions to Postgres, ClickHouse, or Rivet Actors. Replay them later for debugging. Build analytics across agents without writing per-agent normalization logic.

Related: AI Coding Agents Compared: Cursor, Claude Code, GitHub Copilot & More

Supported Agents and Sandbox Providers

The SDK currently supports six coding agents: Claude Code, Codex, OpenCode (experimental), Cursor, Amp, and Pi. Each agent type maps to a configuration object you pass when creating a session. Switching agents is a one-line config change, not an integration rewrite.

On the sandbox side, the SDK is deliberately provider-agnostic. It runs inside:

Sandbox ProviderIsolation ModelCold Start
E2BFirecracker microVMs (same tech as AWS Lambda)~150ms
DaytonaDocker containers with persistent workspaces27-90ms
Docker SandboxesMicroVMs with network isolationVaries
Vercel SandboxesCloud sandboxes for coding agentsVaries
Any Linux environmentWhatever you configureN/A

Rivet even reverse-engineered Docker’s undocumented microVM API to add support before Docker officially documented it. The install is a single curl command that downloads the binary.

The key distinction: E2B, Daytona, and Docker provide the isolation. The Sandbox Agent SDK provides the agent control layer on top. You pick your sandbox, install the SDK binary, and get a unified API regardless of the combination.

Who Built This and Why It Matters

Rivet started as a Y Combinator W23 company building infrastructure for multiplayer games. Nathan Flurry had independently built infrastructure serving 15 million monthly active users and 20,000 concurrent players for games like Krunker.io while still in high school. The pivot to AI agent infrastructure makes sense when you consider the overlap: both problems require stateful, long-running processes with real-time event streaming and crash recovery.

Their flagship open-source product, Rivet Actors (5,200 GitHub stars), provides stateful serverless primitives similar to Cloudflare Durable Objects. The Sandbox Agent SDK builds on this: when running with Rivet Actors, you get automatic transcript persistence across crashes, real-time event broadcasting to connected clients, and horizontal scaling for concurrent agent sessions.

The broader signal here is that coding agent infrastructure is becoming its own category. Just as we saw API gateways, container orchestrators, and observability platforms emerge as cloud computing matured, the AI coding agent ecosystem is generating the same kind of middleware layer. The Sandbox Agent SDK sits at the intersection of two fast-moving spaces: sandbox isolation and agent API standardization.

Ironclad uses Rivet’s platform for their Contract AI assistant. On the community side, Gigacode, an experimental companion project, wires OpenCode’s terminal UI to any coding agent through the SDK.

Getting Started: From Zero to Running Agent in Five Minutes

Assuming you have an E2B account (or any Linux environment):

# Install the binary
curl -fsSL https://raw.githubusercontent.com/rivet-dev/sandbox-agent/main/install.sh | sh

# Extract credentials for your preferred agent
sandbox-agent credentials extract-env --export

# Start the server
sandbox-agent server --token "your-secret-token" --port 2468

From your application:

# List available agents
curl http://localhost:2468/agents \
  -H "Authorization: Bearer your-secret-token"

# Create a session
curl -X POST http://localhost:2468/sessions \
  -H "Authorization: Bearer your-secret-token" \
  -H "Content-Type: application/json" \
  -d '{"agent": "claude-code", "permissionMode": "auto-approve"}'

# Send a prompt and stream events
curl -N http://localhost:2468/sessions/{id}/events \
  -H "Authorization: Bearer your-secret-token"

The built-in inspector UI at /ui/ provides a web-based debugger for examining sessions and event payloads during development. This is useful for understanding the exact event flow before you build your parsing logic.

What Is Missing (For Now)

The SDK is at v0.4.0 and still early. A few gaps worth noting:

No Python SDK yet. The TypeScript SDK works well, and the HTTP API is language-agnostic, but a native Python client is listed on the roadmap. For Python-heavy teams, you will write raw HTTP calls for now.

Agent support varies. Claude Code and Codex integration is solid. OpenCode support is experimental. The level of feature parity across agents depends on how much each agent exposes through its own interface.

No built-in cost tracking. The SDK normalizes events but does not aggregate token usage or cost across agents. You will need to build that yourself from the event stream data.

Enterprise features are Rivet-specific. Crash recovery, transcript persistence, and horizontal scaling require Rivet Actors. Running the standalone binary gives you the API normalization but not the infrastructure layer.

These are the kinds of gaps you would expect from a two-month-old project. The core value proposition, one API for multiple coding agents in sandboxes, works today.

Frequently Asked Questions

What is the Sandbox Agent SDK?

The Sandbox Agent SDK is an open-source Rust binary by Rivet that runs inside sandbox environments and exposes a universal HTTP/SSE API for controlling coding agents like Claude Code, Codex, Amp, OpenCode, Cursor, and Pi. Instead of integrating with each agent’s proprietary API, you write one integration against the SDK.

Which coding agents does the Sandbox Agent SDK support?

As of v0.4.0, the SDK supports Claude Code, Codex, OpenCode (experimental), Cursor, Amp, and Pi. Switching between agents is a configuration change, not an integration rewrite.

Which sandbox providers work with the Sandbox Agent SDK?

The SDK is sandbox-agnostic and runs inside E2B (Firecracker microVMs), Daytona (Docker containers), Docker Sandboxes, Vercel Sandboxes, or any Linux environment. It provides the agent control layer; the sandbox provider handles isolation.

How is the Sandbox Agent SDK different from E2B or Daytona?

E2B and Daytona are sandbox providers that supply isolated execution environments. The Sandbox Agent SDK is an agent control layer that runs inside those sandboxes. E2B gives you the microVM; the SDK gives you the HTTP API to start, control, and stream events from whichever coding agent runs inside it.

Is the Sandbox Agent SDK production-ready?

The SDK is at v0.4.0 and still early-stage. Core functionality (session management, event streaming, multi-agent support) works reliably. Enterprise features like crash recovery and horizontal scaling require Rivet Actors. A Python SDK is on the roadmap but not yet available.