Recommendation

Make Cloudflare Containers the default place Anvil writes code.

Keep Superset as an explicit handoff tool for heavyweight debugging, human takeover, or workflows that need a full developer machine. The normal path should be a Cloudflare-hosted code session with a Forge worktree, Codex, GitHub, Slack progress, verification, and PR creation wired together by the Anvil worker.

Target Architecture

This mirrors the strongest Ramp Inspect pattern: each task gets a sandbox with the repo, dependencies, internal tools, verification, screenshots, Slack and GitHub integration, and enough session state to resume without a laptop being involved.

Ingress Slack, GitHub, Linear

Incoming tasks keep their thread, requester, repo, branch, and permission context.

Control plane Anvil Think Gateway

Classifies task, enforces policy, chooses Cloudflare runner or Superset tool.

Session actor Durable Object

Owns run journal, WebSocket/live status, callback auth, and container lifecycle.

Execution Cloudflare Container

Runs Codex against a Forge worktree, executes tests, captures diffs and artifacts.

Outputs PR, Slack, artifacts

Worker-owned tools create branches, PRs, comments, screenshots, and final updates.

Boundary Decision

Do not move all of Superset into Cloudflare. Move the default autonomous code execution path. Leave Superset as a deliberate action the agent can call when the task needs a human-visible workstation or broader local environment.

Cloudflare runner handles normal coding work.

Branch creation, repo edits, tests, typechecks, lint, PR creation, review response, screenshots through a browser service, and Slack follow-up.

Superset remains a tool.

The Think gateway exposes create_superset_workspace for tasks that explicitly request it, need Mac-only tools, need a persistent GUI handoff, or exceed Cloudflare container limits.

Forge runtime patterns should be reused.

The existing Forge worker runtime already separates Worker-held credentials from a DB-free runtime container. Anvil should follow the same callback-token, event-journal, R2/archive, and narrow proxy model.

Run Lifecycle

1. Admit

Persist a task record first, attach Slack thread metadata, map the requester to repo permissions, and reserve capacity before starting a container.

2. Prepare worktree

Prefer an Artifacts repo fork per task. Fallback to a warm container image or R2 session snapshot with a fresh GitHub App installation token.

3. Execute

Start the per-session container, mount the worktree, run Codex, and route privileged operations through Worker-owned MCP/tools.

4. Verify and publish

Run requested checks, capture diffs/screenshots/logs, push a branch or PR, and send the final Slack update with evidence.

5. Resume

If the user replies later, wake the same Durable Object, restore the Artifacts repo/session archive, and continue with the same thread and branch.

Cloudflare Product Map

Use the smallest set that makes the runner durable and safe. The core runtime is Workers, Durable Objects, Containers, Artifacts, R2, Queues, Workflows, Secrets Store, and AI Gateway.

Workers

Anvil ingress, Think gateway, policy checks, tool proxying, GitHub/Slack webhooks, and public status endpoints.

Core
Durable Objects

One actor per code session for serialization, live state, event journal, WebSockets, callbacks, and container lifecycle.

Core
Containers

Linux execution environment for Codex, repo tools, package managers, tests, and build commands.

Core
Artifacts

Git-compatible, versioned file tree per task. Ideal for isolated worktrees, diffs, resumes, and handoff to Git-aware tools.

Beta / request access
ArtifactFS

Mount large repos without paying the full clone cost at startup. Start with Git clone if Forge remains fast enough.

Beta follow-up
R2

Session archives, logs, screenshots, tarballs, status snapshots, and warm-cache artifacts when Artifacts is unavailable.

Core
Queues

Backpressure and retry buffer between incoming tasks and code-session dispatch.

Use with admission
Workflows

Durable multi-step orchestration for prepare, run, verify, PR, wait-for-human, and resume flows.

Good fit
AI Gateway

LLM usage telemetry, rate limits, retries, fallbacks, and spend controls for Codex/model calls.

Control plane
Secrets Store

Centralize GitHub App, Slack, Superset, and model secrets; inject only short-lived scoped credentials into sessions.

Security
Browser Run

Headless UI verification and screenshots without stuffing a browser into every code container.

Frontend checks
D1 or DO SQLite

Use for Anvil metadata only if existing storage is not enough. Do not duplicate Forge product state unnecessarily.

Metadata

Rollout Plan

Build in thin vertical slices. The first useful milestone is a Cloudflare container that can clone Forge, run Codex, return a diff, and post status back to the same Slack thread.

Phase 0

Inventory and route flag

Document current Anvil Think harness, Superset handoff endpoint, Slack reaction/thread code, GitHub app credentials, and job state. Add a task policy flag: runner = cloudflare | superset, defaulting to Cloudflare only for a small allowlist.

Phase 1

Cloudflare runner spike

Add an Anvil code-session Durable Object and container image with Codex, pnpm, GitHub CLI, Node, and Forge repo prerequisites. Clone Forge, run a no-op or small lint command, stream logs, and archive the workspace to R2.

Phase 2

PR-capable MVP

Use a GitHub App token per session, push a branch, open a PR, post Slack updates, and expose Superset as an explicit fallback tool. Keep all Slack/GitHub writes Worker-mediated.

Phase 3

Artifacts-backed worktrees

Request Cloudflare Artifacts beta access, create one repo per task, fork/import Forge, test ArtifactFS versus clone speed, and use Artifacts as the resume/diff boundary.

Phase 4

Productize background sessions

Add web status UI, multiplayer session links, screenshot/live preview workflow, model and cost controls via AI Gateway, and org-level capacity and security policy.

Design Guardrails

No long-lived secrets in containers.

Containers receive callback JWTs and short-lived GitHub tokens. Slack, Superset, GitHub App private keys, model keys, and Forge DB credentials stay in Worker/Secrets Store.

Durable intake before container start.

Persist queued/admissible work and acquire capacity before starting external compute. This matches Forge's existing lease-first container workflow guidance.

Explicit egress and tool mediation.

Use container outbound interception where possible; require Worker-side tool calls for privileged actions and structured audit events for every write.

Known Risks

Do not assume every Superset workflow moves on day one.

Cloudflare Containers are a good fit for code execution, but some Forge tasks may need Docker-in-Docker, Mac-only tools, long interactive browser sessions, unusual network access, or larger persistent disks. Those should route to Superset until proven in the runner.

Artifacts is still beta.

Treat Artifacts as the target substrate, not the first hard dependency. Start with clone plus R2 session archive; switch once beta access and performance are validated.

Cold start and dependency install time can kill adoption.

Borrow Ramp's lesson: prebuild images or snapshots on a schedule so startup time is mostly model time, not clone and install time.

Verification has to be first-class.

A code runner that only edits files is weaker than Superset. It needs tests, lint, typecheck, browser screenshots, logs, and a clear “could not verify” path.

What I Would Build First

One end-to-end task type: Slack mention to Cloudflare diff.

Trigger from Slack, create a code session DO, start the container, clone Forge, run Codex on a tiny branch-safe prompt, stream logs to Slack, archive the diff to R2, and shut down.

Then add GitHub PR creation.

Once the runner can reliably produce diffs and verification output, wire in branch push and PR creation through the Worker gateway.

Then optimize startup with Artifacts or snapshots.

Measure cold clone/install time first. If it is more than a small fraction of total runtime, add warm images and evaluate ArtifactFS.

Source Notes

  • Ramp Inspect: sandboxed VM per session, prebuilt repo images and snapshots, Slack/GitHub/preview workflows, and ~30% of merged frontend/backend PRs written by Inspect, per the Ramp Builders article.
  • Cloudflare Containers: serverless containers controlled from Workers; useful for full filesystem, Linux-like environments, and resource-intensive code. Docs
  • Durable Object Container API: each container is managed by a Durable Object, which also has storage, alarms, lifecycle, port, and outbound interception APIs. Docs
  • Cloudflare Artifacts: Git-compatible, versioned file trees; best practice is one repo per agent/session/application, with ArtifactFS available for large repos where startup time matters. Docs
  • Workflows, AI Gateway, and Secrets Store cover durable step execution, model observability/control, and centrally managed secrets.