Agent architectures by example
It's possible to cross the service boundary without rebuilding everything OpenCode provides. Depending on the use case, you may need to implement only some of the layers.
The single biggest design decision is whether you are building a stateful or stateless agent. Statefulness can be achieved with an agent being "always on", being hosted on a VPS for example. But that's not scalable: you end up paying even when the agent is idle.
Alternatively relying on ephemeral environments comes with a persistence challenge: how do you persist the state when the environment is torn down?
Part 5 walks through real projects to illustrate how agents are assembled from different technical bricks, reviewing a variety of architectural choices.
Claude in the Box: the job agent
Agent Framework: Claude Agent SDK
Cloud services: Cloudflare Worker + Cloudflare Sandbox
Layers: transport + artifacts persistence
Link: github.com/craigsdennis/claude-in-the-box
Description:
- This is a job agent, not a chatbot. No conversation, no back-and-forth during execution, no session to resume.
- Use case: a job that is best performed by an agent, i.e. extract structured data from a document.
- A ~100-line project that wraps the Claude Agent SDK.
User journey: the client sends a POST request with a prompt and stays connected. The agent's raw output streams back in real time: progress messages, tool calls, intermediate results. When the agent finishes, the Worker collects the final output files (the artifacts) and stores them in KV and returns it to the client.
Technical flow:
- The Worker receives the POST and spins up a Cloudflare Sandbox.
- The agent runs inside the sandbox using the Claude Agent SDK's
query()function. It reads, writes files, runs bash commands — all within the container. - The agent's stdout is streamed back through the Worker to the client as chunked HTTP. This is the live feed — a mix of everything the agent does.
- When the agent finishes, the Worker reads the output files (e.g.
fetched.md,review.md) from the sandbox filesystem. The Worker stores them in Cloudflare KV (keyed by a cookie) so the client can retrieve them after the sandbox is destroyed.
Browser → HTTP POST
→ Cloudflare Worker (~100 lines)
→ Cloudflare Sandbox
→ Claude Agent SDK query()
← streams stdout back
→ reads artifacts → stores in KV
→ destroys sandbox
Highlight: Why Cloudflare requires two layers: Worker + Sandbox?
Cloudflare Workers are like application "valets":
- They are the frontdoor for internet traffic (they handle HTTP requests) and decide what to do / which services to call. In technical terms, they route, orchestrate and connects to Cloudflare services like KV and Durable Objects.
- Additional benefit: Worlers sleep between requests and bills only for the time it runs — cheap and instant.
- Limitation: it runs in a V8 isolate — a lightweight JavaScript sandbox with no filesystem, no shell, and a 30-second CPU time limit. It cannot run the Claude Agent SDK.
The Sandbox is the opposite:
- It is a full Ubuntu container with bash, Node.js, a filesystem, and no time limit — everything the agent needs.
- But it has no public URL. It cannot receive requests from the internet or talk to Cloudflare services directly.
Neither can do the whole job alone. The Worker provides the service boundary (HTTP endpoint, streaming, artifact storage). The Sandbox provides the execution environment (bash, filesystem, long-running agent). The ~100 lines of glue between them wire up the HTTP endpoint, bridge the stream, and collect artifacts.
Server layers implementation
| Layer | Status | Implementation |
|---|---|---|
| Authentication | Skipped | Anyone can call the endpoint. |
| Network resilience | Skipped | If the connection drops, the work is lost. |
| Transport | Implemented (minimal) | Chunked HTTP streaming — the user watches progress in real time, but cannot send anything back. |
| Routing | Skipped | No session IDs, no conversations to switch between. Each request is independent. |
| Persistence | Partial | Final artifacts only (stored in KV). No conversation history, no ability to resume. |
| Lifecycle | Skipped | The agent dies with the request. Close the tab and the work stops. |
sandbox-agent: the adapter
Agent Framework: Agent-agnostic (supports Claude Code, Codex, OpenCode, Amp)
Cloud services: None — runs inside any sandbox (designed to be embedded)
Layers: transport + partial routing
Link: github.com/rivet-dev/sandbox-agent
Description:
- This is a transport adapter. It solves one problem (giving every coding agent a unified HTTP+SSE transport) and leaves everything else to the consumer.
- Use case: when a developer wants to deploy a variety of coding agents in sandboxes, this provides a built-in transport solution. The developer doesn't need to understand each agent's native protocol, and doesn't need to change anything when switching sandbox providers.
Technical flow:
- The daemon starts inside a sandbox and listens on an HTTP port.
- The client creates a session via REST, specifying which agent to run (Claude Code, Codex, OpenCode, Amp).
- The daemon spawns the agent process and translates its native protocol into a universal event schema with sequence numbers.
- Events stream to the client over SSE.
- When the agent needs approval (e.g. to run a bash command), the daemon converts the blocking terminal prompt into an SSE event. The client replies via a REST endpoint.
- If the client disconnects, it reconnects and resumes from the last-seen sequence number.
Your App (anywhere)
| HTTP + SSE
v
+--[sandbox boundary]-------------------+
| sandbox-agent (Rust daemon) |
| claude | codex | opencode |
| [filesystem, bash, git, tools...] |
+----------------------------------------+
Highlight: the Transport layer
Transport is how a client and a server exchange data over a network. There is a spectrum of transport modes, from simplest to most capable:
| Mode | What the user experiences | Interaction | Reconnection |
|---|---|---|---|
| HTTP request/response | Submit a task, wait, get the full result when done. No progress updates while the agent works. | One-shot. | N/A. |
| Chunked HTTP streaming | Submit a task, watch the agent's output stream in real time — like a terminal in the browser. | Watch only — the user cannot send input mid-stream. | None. Connection drops = work lost. |
| Server-Sent Events (SSE) | Same real-time streaming, but the connection survives drops. The browser reconnects automatically and resumes from the last event. | Watch + interact via separate requests (e.g. approve a command via a button click). | Built-in (automatic). |
| WebSocket | Full interaction while the agent works — approve commands, provide context, cancel tasks. Multiple users can watch the same session. | Bidirectional, real-time. | Application must implement. |
Claude-in-the-Box uses chunked HTTP streaming. sandbox-agent outputs SSE. Ramp Inspect uses WebSocket. Each step up adds capability and complexity.
Now, the agents that sandbox-agent supports speak different native protocols — none of which are network transports:
- JSONL on stdout — Claude Code and Amp run as child processes, spawned per message. They write one JSON object per line to stdout.
- JSON-RPC over stdio — Codex runs a persistent server process (
codex app-server) that communicates via structured JSON-RPC requests and responses over stdin/stdout. Still a local process — not network-accessible. - HTTP server — OpenCode already runs its own HTTP+SSE server (see Part 4). It is network-accessible without translation. For OpenCode, sandbox-agent is not necessary.
Server layers implementation
| Layer | Status | Implementation |
|---|---|---|
| Authentication | Skipped | Runs inside a sandbox — assumes the sandbox boundary provides isolation. |
| Network resilience | Partial | SSE sequence numbers allow clients to reconnect and resume from last-seen event. |
| Transport | Implemented | HTTP + SSE — structured event stream with sequence numbers for reconnection. REST endpoints for approvals/cancellation. |
| Routing | Partial | In-memory session management — multiple sessions per daemon, but no persistent session registry. |
| Persistence | None | If the daemon crashes or the sandbox is destroyed, there is no way to recover or reconnect to a conversation. |
| Lifecycle | Minimal | Agent process managed by the daemon, but no background continuation beyond the sandbox's lifetime. |
Ramp Inspect — the full production stack
Agent Framework: OpenCode
Cloud services: Modal Sandbox VMs + Cloudflare Durable Objects + Cloudflare Workers
Layers: transport + routing + persistence + lifecycle + authentication + network resilience (all layers)
Link: builders.ramp.com/post/why-we-built-our-background-agent
Description: Ramp's internal background coding agent that creates pull requests from task descriptions. Reached ~30% of all merged PRs within months.
User journey: an engineer describes a task in Slack, the web UI, or a Chrome extension. The agent works in the background — the engineer can close the tab, switch clients, come back later from a different device. When done, the agent posts a PR or a Slack notification. Multiple engineers can watch the same session simultaneously.
Technical flow:
- Each task gets a session — one session = one Durable Object + one Modal VM + one conversation. The session ID is the permanent address for the task.
- The client connects via WebSocket to a Cloudflare Worker, which routes the connection to the session's Durable Object.
- The DO is the hub: it holds WebSocket connections from all clients watching this session, stores conversation history in embedded SQLite, and forwards messages to the Modal VM. When the agent produces output, the DO broadcasts it to every connected client.
- The VM runs OpenCode with a full dev environment: git, npm, pytest, Postgres, Chromium, Sentry integration.
- The agent works independently of any client connection. If all clients disconnect, the VM keeps running.
- On completion, the agent posts results via Slack notification or GitHub PR.
- Modal VMs have a 24-hour maximum TTL. Before the VM is terminated, its state is captured through Modal's snapshot API — a full point-in-time capture of the filesystem (code, dependencies, build artifacts, environment). The snapshot can be restored into a fresh VM days later.
Clients (Slack, Web UI, Chrome Extension, VS Code)
→ Cloudflare Workers
→ Durable Object (per-session: SQLite, WebSocket Hub, Event Stream)
→ Modal Sandbox VM (OpenCode agent, full dev environment)
Highlight: Durable Objects as the coordination layer
In Part 4, we saw that OpenCode is a single-server agent — it has session management, persistence, and transport, but all scoped to one machine. To make it globally accessible, you need global routing, persistent state that survives restarts, and WebSocket management across clients. This is the gap Ramp filled with Durable Objects.
A Durable Object is a stateful micro-server with a globally unique ID (while Workers are stateless). Any request from anywhere in the world can reach a specific DO by its ID — Cloudflare routes it automatically. Each DO has its own embedded SQLite database (up to 10 GB), and it can hold WebSocket connections. It runs single-threaded, which matches the agent pattern: one session = one sequential execution context.
What makes DOs useful for agents specifically:
- Global routing without a registry. The DO ID is the session address. No load balancer, no session-affinity configuration, no lookup table. A client in Tokyo and a client in New York both reach the same DO by passing the same ID.
- State that survives hibernation. When no clients are active, the DO hibernates — it is evicted from memory but the WebSocket connections are kept alive at Cloudflare's edge, and the SQLite data persists. Billing stops. When a client sends a message, the DO wakes up, the message is delivered, and processing continues. The client does not know the DO was hibernating.
- Re-attach for free. If a client actually disconnects (browser closed, network drop), a new connection to the same DO ID restores the session. The conversation history is in SQLite. Cloudflare's Agents SDK (which builds on DOs) goes further: it automatically syncs state on reconnection and can resume streaming from where it left off.
Why a Modal VM is required on top of the DO:
A DO is a lightweight JavaScript runtime — it cannot run bash, access a filesystem, or execute agent tools. It is the coordination layer (routing, state, WebSocket), not the execution layer. Code execution happens in a separate VM or container. This is why Ramp pairs DOs with Modal VMs: the DO routes and remembers, the VM computes.
Server layers implementation
| Layer | Status | Implementation |
|---|---|---|
| Authentication | Internal only | Restricted to Ramp employees — no public access. |
| Network resilience | Implemented | WebSocket with DO hibernation — connections survive idle periods, clients reconnect seamlessly. |
| Transport | Implemented | WebSocket — bidirectional, real-time, multiple clients connect to the same session simultaneously. |
| Routing | Implemented | Cloudflare Durable Objects — per-session, globally routed, guaranteed affinity by session ID. |
| Persistence | Implemented (two layers) | DO SQLite for conversation state + Modal snapshots for full VM state (code, deps, environment). |
| Lifecycle | Implemented (full) | Agent survives client disconnection — background continuation is the core design principle. |
Cloudflare Moltworker — the platform provides the layers
Agent engine: Pi SDK (LLM abstraction + core agent loop)
Agent product: OpenClaw (personal AI assistant built on Pi SDK — multi-channel gateway, session management, skills platform)
Cloud services: Cloudflare Worker + Durable Objects + Sandbox + R2 + AI Gateway
Layers: ALL (transport, routing, persistence, lifecycle, authentication, network resilience)
Link: github.com/cloudflare/moltworker — blog: blog.cloudflare.com/moltworker-self-hosted-ai-agent
Description:
- OpenClaw (previously Moltbot, ex-Clawbot, ex-Clawdis) is all the rage since January: a personal assistant that you can work with from your messaging app. There are different options for hosting, the first being you own computer or a VPS. Cloudflare Moltworker project provides an option to deploy it on Cloudflare ecosystem.
- The stack has three layers: Pi SDK provides the agent engine (LLM calls, tool execution, agent loop). OpenClaw builds a complete personal assistant on top of Pi — multi-channel inbox (WhatsApp, Telegram, Slack, Discord), its own session management, a skills platform, and companion apps. Moltworker is the deployment layer — it packages OpenClaw into a Cloudflare container, handles authentication (Cloudflare Access), persists state to R2, and proxies requests from the internet to the agent.
User journey: the user accesses their agent via a browser, protected by Cloudflare Access (Zero Trust). They chat with the agent, which can browse the web, execute code, and remember context across sessions. They can close the browser and come back — conversations persist. The agent can also run autonomously on a cron schedule with no client connected at all.
Technical flow:
- The browser connects through Cloudflare Access, which enforces identity-based authentication before any request reaches the application.
- The Worker receives the request and routes it to the appropriate Durable Object instance.
- The Durable Object establishes a WebSocket connection with the client and manages the container lifecycle — same pattern as Ramp (DO → compute), but here the compute is a Cloudflare Container instead of a Modal VM.
- The container (a full Linux VM) runs the OpenClaw agent. It has an R2 bucket mounted at
/data/moltbotvia s3fs for persistent storage. - When the user goes idle, the container sleeps (configurable via
sleepAfter). The Durable Object hibernates without dropping the WebSocket. - On the next message, the DO wakes, the container restarts, and the R2 mount provides continuity — session memory and artifacts survive the restart.
Internet → Cloudflare Access (Zero Trust)
→ Worker (V8 isolate, API router)
→ Durable Object (routing, state, WebSocket)
→ Container (Linux VM, managed via Sandbox)
→ /data/moltbot → R2 Bucket (via s3fs)
→ OpenClaw (Pi SDK agent)
Highlight: how persistence works with ephemeral compute
Both Ramp and Moltworker face the same problem: the agent runs in an ephemeral machine (Modal VM or Cloudflare Container) that will eventually be destroyed. How do you keep state across restarts?
The 2 projects made different design decisions:
- With Modal, and its snapshot feature, the full state of the VM is saved and restored. There is no need to think ahead what information needs to be saved and restored.
- Cloudflare Containers don't have the same feature. So the approach with Moltworker is to provide an additional persistance layer: the agent has a sort of virtual drive that rely on a Coudflare R2 bucket (a storage product similar to AWS S3). Meaning that part of the filesystem (located
/data/moltbot) it is automatically saved. But not all of it.
| Ramp (Modal) | Moltworker (Cloudflare) | |
|---|---|---|
| What dies | VM is terminated after 24-hour TTL | Container filesystem is wiped on sleep |
| Conversation state | Stored in Durable Object (SQLite) — survives VM restarts | Stored in Durable Object (SQLite) — survives container restarts |
| Code, deps, environment | Modal snapshot API — full point-in-time capture of the VM filesystem. Taken before termination, restored into a fresh VM later. | R2 bucket mounted at /data/moltbot via s3fs — everything written there survives. No snapshot, just continuous persistence. |
| What survives | Everything (full VM state frozen and restored) | Only what's explicitly written to /data/moltbot |
| What's lost | Nothing (if snapshotted before termination) | Anything on the container filesystem outside the R2 mount |
| Trade-off | Full fidelity but requires snapshot orchestration | Simpler but selective — you must design for it |
Server layers implementation
| Layer | Status | Implementation |
|---|---|---|
| Authentication | Implemented | Cloudflare Access (Zero Trust) — identity-based access control before any request reaches the application. |
| Network resilience | Implemented | DO hibernation keeps WebSocket alive during idle periods. Container wakes on next message. |
| Transport | Implemented | WebSocket (via Durable Objects) + HTTP API for the entrypoint Worker. |
| Routing | Implemented | Durable Object instance IDs — globally routable, all requests for same ID reach the same location. |
| Persistence | Implemented | Multi-layer: DO SQLite for conversation, R2 bucket mounted via s3fs for artifacts and session memory. |
| Lifecycle | Implemented | Agent survives client disconnection. DO hibernates. Containers sleep/wake. Cron enables autonomous runs. |
What to keep in mind
- Not every use case needs all the layers. Claude in the Box ships a useful product with just HTTP streaming and KV storage.
- Transport is a spectrum — pick the simplest that fits. Chunked HTTP for job agents (Claude in the Box), SSE for streaming with reconnection (sandbox-agent), WebSocket for bidirectional interaction and multiplayer (Ramp, Moltworker). Each step up adds capability and complexity.
- Background continuation requires decoupling the agent from the HTTP handler. The agent runs in its own process or container, not inside the request.
- Statefulness is the main design choice and the principal source of complexity: resumable conversations require persistent routing (so the client finds the right session), storage and coordination layers that outlive the agent execution environment.