Autonomous Coding Orchestrator

Decisions

Pending: Human-in-the-loop — always approve PRs or auto-merge with confidence threshold?
Pending: Polling interval — fixed, adaptive, or event-driven?
Pending: Crash recovery — resume in-flight jobs or restart from scratch?

User Tasks

Summary

Build the central 24/7 process that runs the autonomous coding agency: poll for planned FRs, dispatch them through the agent pipeline (plan → code → test → review), manage the feedback loop, and deliver PRs.

Problem / Motivation

All the pieces of the coding agency (agents, routing, job queue, sandboxes, git workflow) are being built as independent modules. Without an orchestrator to tie them together, nothing runs autonomously. The user must manually invoke each step: score the FR, pick an agent, run it, review the output, create a PR.

The orchestrator is the brain that turns a collection of tools into an autonomous system. It is the difference between “a toolkit” and “a coding agency.”

Proposed Solution

A src/opus/orchestrator/ module containing:

Main Loop: Continuously polls for work, dispatches jobs, monitors progress. Runs as a long-lived process (systemd service on VPS).
Pipeline Executor: Sequences agent stages: Planner → Coder → Tester → Reviewer, with feedback loops on review failure.
Supervisor: Monitors agent health, handles timeouts, manages retries and escalation.
Configuration: Polling intervals, concurrency limits, auto-merge rules, notification settings.

The orchestrator does NOT contain agent logic — it delegates to the existing modules:

FR-045 for complexity scoring
FR-046 for job queuing
FR-054 for LLM calls (indirectly, via agents)
FR-055 for sandboxed execution
FR-057 for code review
FR-058 for git workflow

Open Questions

1. Human-in-the-Loop

Question: Should the orchestrator require human approval for PRs, or auto-merge based on confidence?

Option	Description
A) Always require approval	Every PR needs human review. Safest, but bottlenecks on human availability
B) Auto-merge with confidence threshold	If all tests pass and review confidence >= 90, auto-merge. Faster but riskier
C) Configurable per FR priority	High-priority FRs need approval, low-priority auto-merge. Balanced

Recommendation: Option A for Phase 1-2 — human approves every PR. Option C for Phase 3+ once confidence in the system is established. Auto-merge should never be the default for a new system.

Decision:

2. Polling Strategy

Question: How should the orchestrator discover new work?

Option	Description
A) Fixed interval polling	Check every N seconds. Simple, predictable, wastes cycles when idle
**B) Adaptive polling	Poll frequently when jobs are active, back off when idle. Efficient
C) Event-driven	File watcher on vault/10_features/03_planned/. Instant reaction, more complex

Recommendation: Option B — adaptive polling is the right balance. Start at 30s, back off to 5 min when idle, return to 30s when jobs are detected.

Decision:

3. Crash Recovery

Question: What happens when the orchestrator restarts after a crash?

Option	Description
A) Restart all in-flight jobs	Simple but wastes completed work
B) Resume from last checkpoint	Each pipeline stage checkpoints progress. Resume from last completed stage
C) Mark as failed, let user decide	Conservative — no automatic recovery. User re-triggers manually

Recommendation: Option B — the job registry (FR-046) tracks stage completion. On restart, query for running jobs and resume from their last checkpoint.

Decision:

Phase Overview

Phase	Description	Status
Phase 1	Main loop — poll for planned FRs, dispatch single agent, wait for result	—
Phase 2	Pipeline mode — plan → code → test → review → fix loop	—
Phase 3	Multi-agent parallel execution, concurrent FRs	—
Phase 4	Continuous operation (systemd, crash recovery, health monitoring)	—

Phase 1: Single Agent Dispatch —

Prerequisite: Design doc designs/drafts/autonomous-coding-agency.md must be reviewed and moved to designs/approved/ before implementation starts.

Goal: Build the main loop that picks up a planned FR, dispatches a single coding agent, and reports the result.

File / Feature	Details	Owner	Status
`src/opus/orchestrator/main.py`	Entry point: initialize components, start main loop	opus	—
`src/opus/orchestrator/loop.py`	Main loop: poll vault for planned FRs, pick highest priority, dispatch	opus	—
`src/opus/orchestrator/config.py`	Configuration: polling interval, vault paths, concurrency limit (1 for Phase 1)	opus	—
FR discovery	Read `vault/10_features/03_planned/` directory, parse frontmatter, filter by status	opus	—
Single agent dispatch	Create sandbox (FR-055), invoke coder agent, capture result	opus	—
Status updates	Update FR frontmatter with `status: in-progress` when dispatched, `status: done` when complete	opus	—
Result reporting	Log completion/failure, notify user (stdout for Phase 1)	mv	—
Unit tests	Mock vault, mock agent, test loop dispatches correctly	mv	—

Phase 2: Pipeline Execution —

Goal: Implement the full agent pipeline: Planner → Coder → Tester → Reviewer with feedback loop.

File / Feature	Details	Owner	Status
`src/opus/orchestrator/pipeline.py`	Pipeline executor: sequence agent stages, pass context between stages	opus	—
Planner stage	Planner agent reads FR spec, outputs implementation plan	opus	—
Coder stage	Coder agent executes plan in sandboxed worktree	opus	—
Tester stage	Tester agent runs test suite, reports pass/fail with details	mv	—
Reviewer stage	Reviewer agent (FR-057) reviews diff, outputs structured feedback	mv	—
Feedback loop	On review failure: extract issues, send to coder, re-code, re-test, re-review (max 3 iterations)	mv	—
PR creation	On pipeline success: create PR via FR-058 git workflow	opus	—
Pipeline state	Track current stage per job in registry (FR-046)	opus	—
Integration tests	Full pipeline with mock agents: plan → code → test → review → PR	mv	—

Phase 3: Parallel Execution —

Goal: Run multiple FR pipelines concurrently with independent sandboxes.

File / Feature	Details	Owner	Status
`src/opus/orchestrator/scheduler.py`	Concurrent scheduler: manage multiple pipeline instances	opus	—
Concurrency control	Configurable max concurrent pipelines (default: 3)	opus	—
Resource management	Track sandbox count, LLM API usage, prevent resource exhaustion	opus	—
Priority preemption	Higher-priority job can pause lower-priority one (via FR-046 queue)	opus	—
Inter-pipeline isolation	Each pipeline has its own sandbox, no shared mutable state	opus	—
Progress dashboard	Real-time view of all running pipelines and their current stage	opus	—
Integration tests	Run 3 pipelines concurrently, verify isolation and completion	mv	—

Phase 4: Continuous Operation —

Goal: Run the orchestrator as a production service: crash recovery, health monitoring, notifications.

File / Feature	Details	Owner	Status
`src/opus/orchestrator/service.py`	Systemd-compatible service wrapper: start, stop, reload	opus	—
Crash recovery	On restart: query registry for `running` jobs, resume from last checkpoint	opus	—
Health monitoring	Periodic self-check: is the loop running? Are agents responsive?	opus	—
Notifications	Notify user on: pipeline completion, pipeline failure, stuck agent, crash recovery	mv	—
Metrics	Track: jobs/day, success rate, average pipeline duration, agent utilization	opus	—
Graceful shutdown	On SIGTERM: finish current pipeline stage, save state, exit cleanly	opus	—
Systemd unit file	`opus-orchestrator.service` for deployment on VPS (FR-019)	opus	—

Prerequisites / Gap Analysis

Requirements

Requirement	Description
REQ-0	Design doc reviewed and approved (`designs/approved/autonomous-coding-agency.md`)
REQ-1	Python project scaffold (FR-009) — orchestrator is a Python module
REQ-2	Custom agents (FR-043) — orchestrator dispatches to agents
REQ-3	Job registry (FR-046) — tracks all dispatched work
REQ-4	LLM provider (FR-054) — agents need LLM access
REQ-5	Sandboxed execution (FR-055) — agents need isolated environments

Current State

Component	Status	Details
Python scaffold	—	FR-009 not started
Agent definitions	—	FR-043 not started
Job registry	—	FR-046 not started
LLM abstraction	—	FR-054 not started
Sandbox	—	FR-055 not started

Gap (What’s missing?)

Gap	Effort	Blocker?
Python scaffold (FR-009)	Med	Yes
Agent definitions (FR-043)	High	Yes — orchestrator dispatches to agents
Job registry (FR-046)	Med	Yes for Phase 2+ — Phase 1 can work without it
LLM abstraction (FR-054)	Med	Yes — agents need LLM calls
Sandbox (FR-055)	Med	Yes — agents need isolated environments
VPS (FR-019)	High	Only for Phase 4 (production deployment)

Test

Manual tests

Test	Expected	Owner	Actual	Last
Main loop discovers planned FR in vault	FR detected and selected for dispatch	opus	pending	-
Single agent dispatch creates sandbox and invokes agent	Agent runs in isolated worktree	opus	pending	-
FR status updated to in-progress on dispatch	Frontmatter updated	opus	pending	-
Pipeline sequences stages correctly	Plan → code → test → review in order	opus	pending	-
Review failure triggers feedback loop to coder	Coder receives issues, fixes, re-review	mv	pending	-
Feedback loop exits after max iterations	Pipeline fails gracefully after 3 attempts	opus	pending	-
PR created on successful pipeline completion	PR exists on GitHub with correct description	opus	pending	-
3 concurrent pipelines run without interference	All complete independently	opus	pending	-
Crash recovery resumes in-flight pipeline	Resumes from last completed stage	opus	pending	-
Graceful shutdown saves state and exits	No data loss, clean exit	opus	pending	-

AI-verified tests

Scenario	Expected behavior	Verification method
…	…	…

E2E tests

Scenario	Assertion
…	…

Integration tests

Component	Coverage
…	…

Unit tests

Component	Tests	Coverage
…	…	…

History

Date	Event	Details
2026-03-12	Created	Central piece of autonomous coding agency architecture

References

FR-009 (Python Project Scaffold) — code infrastructure prerequisite
FR-043 (Custom Agents) — agent definitions dispatched by the orchestrator
FR-045 (Complexity Routing) — scoring determines execution strategy
FR-046 (Job Registry & Priority Queue) — job tracking and prioritization
FR-054 (LLM Provider Abstraction) — agents use this for LLM calls
FR-055 (Sandboxed Code Execution) — agents run in isolated environments
FR-057 (Code Review Pipeline) — reviewer stage in the pipeline
FR-058 (Agent Git Workflow) — PR creation and branch management
FR-019 (VPS Deployment) — production hosting for the orchestrator
vault/00_system/designs/drafts/autonomous-coding-agency.md — architecture overview

Opus Vault

Explorer

Autonomous Coding Orchestrator

Decisions

User Tasks

Summary

Problem / Motivation

Proposed Solution

Open Questions

1. Human-in-the-Loop

2. Polling Strategy

3. Crash Recovery

Phase Overview

Phase 1: Single Agent Dispatch —

Phase 2: Pipeline Execution —

Phase 3: Parallel Execution —

Phase 4: Continuous Operation —

Prerequisites / Gap Analysis

Requirements

Current State

Gap (What’s missing?)

Test

Manual tests

AI-verified tests

E2E tests

Integration tests

Unit tests

History

References

Graph View

Table of Contents

Backlinks