How Autobuild Works
Published May 8, 2026 · Last updated May 12, 2026 · 6 min read
You've submitted a request and Autobuild is running. But what is actually happening? This article explains the mental model: how Autobuild organizes work, who does what, and what keeps your codebase safe while agents operate.
The three levels of work
Every piece of work in Autobuild lives at one of three levels: an initiative, a feature, or an executable.
An initiative is the top-level unit. It represents a meaningful outcome — a feature shipped, a refactor landed, a set of bugs resolved. Its status moves through active, paused, completed, and archived.
A feature sits inside an initiative. It represents a coherent slice of the initiative's scope that can be planned and executed independently. Feature status moves through proposed, planning, approved, in progress, paused, completed, and cancelled.
An executable sits inside a feature. It is the atomic unit of work — one discrete task assigned to an agent, producing a concrete output. Most executables produce a pull request. Others produce documents, configuration files, research, or other artifacts. Executable status moves through planned, queued, in progress, in review, paused, blocked, failed, completed, and cancelled.
The hierarchy gives Autobuild coherent planning at scale. The initiative scopes the goal. Features break it into independently shippable pieces. Executables are the tasks that get done.
The orchestrator
When you create an initiative, Autobuild starts an orchestrator — a dedicated agent thread responsible for the initiative as a whole.
The orchestrator reads your spec and breaks the work down: it creates features, creates executables within those features, and determines the order in which executables can run. Some executables can run in parallel. Others must wait for a dependency to complete first.
When an executable is ready to start, the orchestrator dispatches it to a runner — a subthread that takes on a focused task and reports back when it finishes, fails, or needs input. The orchestrator can dispatch up to five runners concurrently and does not need to poll them. As runners complete and executables move to completed, the orchestrator re-evaluates what's ready to run next. This continues until all features in the initiative are done.
Dependencies and sequencing
Executables are not always independent. An executable can declare that it depends on other executables. When it does, it stays in planned until all of its dependencies reach completed. Only then does it become eligible to run. An executable with unresolved dependencies shows as blocked.
This dependency model is how Autobuild sequences work with logical ordering — writing tests after the code they cover, deploying infrastructure before the application that uses it — while running everything else in parallel.
The sandbox: where agents work
Before an agent can write code, Autobuild needs a place to run it safely. That place is a sandbox — an isolated environment built from a snapshot of your repository.
When you connect a repository, Autobuild builds a snapshot of it. The Settings tab shows each repository and its current sandbox status:
| Status | What it means |
|---|---|
| Available | No snapshot has been built yet |
| Pending | A build has been requested and is queued |
| Building… | The snapshot is actively being built |
| Ready | The sandbox is live and agents can use it |
| Failed | The build failed — you can retry or reconfigure |
When an agent is dispatched to an executable, it works inside this sandboxed environment. It creates a branch, writes code, runs the build, and opens a pull request — all without touching your main branch. Nothing reaches your repository until you review and merge the PR.
This is the core safety guarantee: your main branch is not modified until you approve the pull request.
If your repository requires setup steps before code can run — installing dependencies, running a build script, setting environment variables — you can configure install commands per repository from the Settings tab. These commands run when the sandbox is built.
Pull requests and review
Every executable that produces code creates one pull request. The pull request is the handoff from agent to human.
The Pull Requests tab shows all open PRs from Autobuild executables. Each row shows the PR title and number, which repository it targets, CI status (Passed, Failed, or Pending), review status (Pending review, Approved, Changes requested, or Commented), and the linked executable's current status.
You review and merge the PR on GitHub. Autobuild detects the merge and marks the executable as completed.
Babysit mode
If you want to stay close as an agent works — to redirect it early or check progress — you can enable babysit mode on an open PR. Each PR row includes a Pause babysit action for executables in queued, in progress, or in review status, and a Resume babysit action for executables in paused status. Babysit mode doesn't stop the agent; it keeps you in the loop.
Release strategies
When you create an initiative, you choose a release strategy that controls how PRs are structured:
-
Independent PRs — each executable gets its own PR, merged to the main branch independently as it's ready. Good for isolated changes.
-
Release branch — executables merge into a shared release branch. That branch is then opened as a single PR to your main branch. Good for coordinated changes that should land together.
What Autobuild doesn't do
It doesn't make architectural decisions for you. Autobuild executes tasks as scoped. A precise spec produces precise output. A vague spec produces interpretations.
It doesn't merge pull requests. Every PR requires a human to review and merge. Autobuild tracks the merge; the merge action is yours.
It doesn't overwrite your work. Agents work on branches. If an executable fails or produces something unwanted, you can close the PR and cancel the executable — nothing else changes in your codebase.
It doesn't guarantee CI will pass. If CI fails, the CI badge shows Failed. You can update the executable's status or address the failure manually.