Decisions

  • Pending: Human-in-the-loop — always approve PRs or auto-merge with confidence threshold?
  • Pending: Polling interval — fixed, adaptive, or event-driven?
  • Pending: Crash recovery — resume in-flight jobs or restart from scratch?

User Tasks


Summary

Build the central 24/7 process that runs the autonomous coding agency: poll for planned FRs, dispatch them through the agent pipeline (plan → code → test → review), manage the feedback loop, and deliver PRs.

Problem / Motivation

All the pieces of the coding agency (agents, routing, job queue, sandboxes, git workflow) are being built as independent modules. Without an orchestrator to tie them together, nothing runs autonomously. The user must manually invoke each step: score the FR, pick an agent, run it, review the output, create a PR.

The orchestrator is the brain that turns a collection of tools into an autonomous system. It is the difference between “a toolkit” and “a coding agency.”

Proposed Solution

A src/opus/orchestrator/ module containing:

  1. Main Loop: Continuously polls for work, dispatches jobs, monitors progress. Runs as a long-lived process (systemd service on VPS).
  2. Pipeline Executor: Sequences agent stages: Planner → Coder → Tester → Reviewer, with feedback loops on review failure.
  3. Supervisor: Monitors agent health, handles timeouts, manages retries and escalation.
  4. Configuration: Polling intervals, concurrency limits, auto-merge rules, notification settings.

The orchestrator does NOT contain agent logic — it delegates to the existing modules:

  • FR-045 for complexity scoring
  • FR-046 for job queuing
  • FR-054 for LLM calls (indirectly, via agents)
  • FR-055 for sandboxed execution
  • FR-057 for code review
  • FR-058 for git workflow

Open Questions

1. Human-in-the-Loop

Question: Should the orchestrator require human approval for PRs, or auto-merge based on confidence?

OptionDescription
A) Always require approvalEvery PR needs human review. Safest, but bottlenecks on human availability
B) Auto-merge with confidence thresholdIf all tests pass and review confidence >= 90, auto-merge. Faster but riskier
C) Configurable per FR priorityHigh-priority FRs need approval, low-priority auto-merge. Balanced

Recommendation: Option A for Phase 1-2 — human approves every PR. Option C for Phase 3+ once confidence in the system is established. Auto-merge should never be the default for a new system.

Decision:

2. Polling Strategy

Question: How should the orchestrator discover new work?

OptionDescription
A) Fixed interval pollingCheck every N seconds. Simple, predictable, wastes cycles when idle
**B) Adaptive pollingPoll frequently when jobs are active, back off when idle. Efficient
C) Event-drivenFile watcher on vault/10_features/03_planned/. Instant reaction, more complex

Recommendation: Option B — adaptive polling is the right balance. Start at 30s, back off to 5 min when idle, return to 30s when jobs are detected.

Decision:

3. Crash Recovery

Question: What happens when the orchestrator restarts after a crash?

OptionDescription
A) Restart all in-flight jobsSimple but wastes completed work
B) Resume from last checkpointEach pipeline stage checkpoints progress. Resume from last completed stage
C) Mark as failed, let user decideConservative — no automatic recovery. User re-triggers manually

Recommendation: Option B — the job registry (FR-046) tracks stage completion. On restart, query for running jobs and resume from their last checkpoint.

Decision:


Phase Overview

PhaseDescriptionStatus
Phase 1Main loop — poll for planned FRs, dispatch single agent, wait for result
Phase 2Pipeline mode — plan → code → test → review → fix loop
Phase 3Multi-agent parallel execution, concurrent FRs
Phase 4Continuous operation (systemd, crash recovery, health monitoring)

Phase 1: Single Agent Dispatch —

Prerequisite: Design doc designs/drafts/autonomous-coding-agency.md must be reviewed and moved to designs/approved/ before implementation starts.

Goal: Build the main loop that picks up a planned FR, dispatches a single coding agent, and reports the result.

File / FeatureDetailsOwnerStatus
src/opus/orchestrator/main.pyEntry point: initialize components, start main loopopus
src/opus/orchestrator/loop.pyMain loop: poll vault for planned FRs, pick highest priority, dispatchopus
src/opus/orchestrator/config.pyConfiguration: polling interval, vault paths, concurrency limit (1 for Phase 1)opus
FR discoveryRead vault/10_features/03_planned/ directory, parse frontmatter, filter by statusopus
Single agent dispatchCreate sandbox (FR-055), invoke coder agent, capture resultopus
Status updatesUpdate FR frontmatter with status: in-progress when dispatched, status: done when completeopus
Result reportingLog completion/failure, notify user (stdout for Phase 1)mv
Unit testsMock vault, mock agent, test loop dispatches correctlymv

Phase 2: Pipeline Execution —

Goal: Implement the full agent pipeline: Planner → Coder → Tester → Reviewer with feedback loop.

File / FeatureDetailsOwnerStatus
src/opus/orchestrator/pipeline.pyPipeline executor: sequence agent stages, pass context between stagesopus
Planner stagePlanner agent reads FR spec, outputs implementation planopus
Coder stageCoder agent executes plan in sandboxed worktreeopus
Tester stageTester agent runs test suite, reports pass/fail with detailsmv
Reviewer stageReviewer agent (FR-057) reviews diff, outputs structured feedbackmv
Feedback loopOn review failure: extract issues, send to coder, re-code, re-test, re-review (max 3 iterations)mv
PR creationOn pipeline success: create PR via FR-058 git workflowopus
Pipeline stateTrack current stage per job in registry (FR-046)opus
Integration testsFull pipeline with mock agents: plan → code → test → review → PRmv

Phase 3: Parallel Execution —

Goal: Run multiple FR pipelines concurrently with independent sandboxes.

File / FeatureDetailsOwnerStatus
src/opus/orchestrator/scheduler.pyConcurrent scheduler: manage multiple pipeline instancesopus
Concurrency controlConfigurable max concurrent pipelines (default: 3)opus
Resource managementTrack sandbox count, LLM API usage, prevent resource exhaustionopus
Priority preemptionHigher-priority job can pause lower-priority one (via FR-046 queue)opus
Inter-pipeline isolationEach pipeline has its own sandbox, no shared mutable stateopus
Progress dashboardReal-time view of all running pipelines and their current stageopus
Integration testsRun 3 pipelines concurrently, verify isolation and completionmv

Phase 4: Continuous Operation —

Goal: Run the orchestrator as a production service: crash recovery, health monitoring, notifications.

File / FeatureDetailsOwnerStatus
src/opus/orchestrator/service.pySystemd-compatible service wrapper: start, stop, reloadopus
Crash recoveryOn restart: query registry for running jobs, resume from last checkpointopus
Health monitoringPeriodic self-check: is the loop running? Are agents responsive?opus
NotificationsNotify user on: pipeline completion, pipeline failure, stuck agent, crash recoverymv
MetricsTrack: jobs/day, success rate, average pipeline duration, agent utilizationopus
Graceful shutdownOn SIGTERM: finish current pipeline stage, save state, exit cleanlyopus
Systemd unit fileopus-orchestrator.service for deployment on VPS (FR-019)opus

Prerequisites / Gap Analysis

Requirements

RequirementDescription
REQ-0Design doc reviewed and approved (designs/approved/autonomous-coding-agency.md)
REQ-1Python project scaffold (FR-009) — orchestrator is a Python module
REQ-2Custom agents (FR-043) — orchestrator dispatches to agents
REQ-3Job registry (FR-046) — tracks all dispatched work
REQ-4LLM provider (FR-054) — agents need LLM access
REQ-5Sandboxed execution (FR-055) — agents need isolated environments

Current State

ComponentStatusDetails
Python scaffoldFR-009 not started
Agent definitionsFR-043 not started
Job registryFR-046 not started
LLM abstractionFR-054 not started
SandboxFR-055 not started

Gap (What’s missing?)

GapEffortBlocker?
Python scaffold (FR-009)MedYes
Agent definitions (FR-043)HighYes — orchestrator dispatches to agents
Job registry (FR-046)MedYes for Phase 2+ — Phase 1 can work without it
LLM abstraction (FR-054)MedYes — agents need LLM calls
Sandbox (FR-055)MedYes — agents need isolated environments
VPS (FR-019)HighOnly for Phase 4 (production deployment)

Test

Manual tests

TestExpectedOwnerActualLast
Main loop discovers planned FR in vaultFR detected and selected for dispatchopuspending-
Single agent dispatch creates sandbox and invokes agentAgent runs in isolated worktreeopuspending-
FR status updated to in-progress on dispatchFrontmatter updatedopuspending-
Pipeline sequences stages correctlyPlan → code → test → review in orderopuspending-
Review failure triggers feedback loop to coderCoder receives issues, fixes, re-reviewmvpending-
Feedback loop exits after max iterationsPipeline fails gracefully after 3 attemptsopuspending-
PR created on successful pipeline completionPR exists on GitHub with correct descriptionopuspending-
3 concurrent pipelines run without interferenceAll complete independentlyopuspending-
Crash recovery resumes in-flight pipelineResumes from last completed stageopuspending-
Graceful shutdown saves state and exitsNo data loss, clean exitopuspending-

AI-verified tests

ScenarioExpected behaviorVerification method

E2E tests

ScenarioAssertion

Integration tests

ComponentCoverage

Unit tests

ComponentTestsCoverage

History

DateEventDetails
2026-03-12CreatedCentral piece of autonomous coding agency architecture

References

  • FR-009 (Python Project Scaffold) — code infrastructure prerequisite
  • FR-043 (Custom Agents) — agent definitions dispatched by the orchestrator
  • FR-045 (Complexity Routing) — scoring determines execution strategy
  • FR-046 (Job Registry & Priority Queue) — job tracking and prioritization
  • FR-054 (LLM Provider Abstraction) — agents use this for LLM calls
  • FR-055 (Sandboxed Code Execution) — agents run in isolated environments
  • FR-057 (Code Review Pipeline) — reviewer stage in the pipeline
  • FR-058 (Agent Git Workflow) — PR creation and branch management
  • FR-019 (VPS Deployment) — production hosting for the orchestrator
  • vault/00_system/designs/drafts/autonomous-coding-agency.md — architecture overview