Decisions
- Pending: Human-in-the-loop — always approve PRs or auto-merge with confidence threshold?
- Pending: Polling interval — fixed, adaptive, or event-driven?
- Pending: Crash recovery — resume in-flight jobs or restart from scratch?
User Tasks
Summary
Build the central 24/7 process that runs the autonomous coding agency: poll for planned FRs, dispatch them through the agent pipeline (plan → code → test → review), manage the feedback loop, and deliver PRs.
Problem / Motivation
All the pieces of the coding agency (agents, routing, job queue, sandboxes, git workflow) are being built as independent modules. Without an orchestrator to tie them together, nothing runs autonomously. The user must manually invoke each step: score the FR, pick an agent, run it, review the output, create a PR.
The orchestrator is the brain that turns a collection of tools into an autonomous system. It is the difference between “a toolkit” and “a coding agency.”
Proposed Solution
A src/opus/orchestrator/ module containing:
- Main Loop: Continuously polls for work, dispatches jobs, monitors progress. Runs as a long-lived process (systemd service on VPS).
- Pipeline Executor: Sequences agent stages: Planner → Coder → Tester → Reviewer, with feedback loops on review failure.
- Supervisor: Monitors agent health, handles timeouts, manages retries and escalation.
- Configuration: Polling intervals, concurrency limits, auto-merge rules, notification settings.
The orchestrator does NOT contain agent logic — it delegates to the existing modules:
- FR-045 for complexity scoring
- FR-046 for job queuing
- FR-054 for LLM calls (indirectly, via agents)
- FR-055 for sandboxed execution
- FR-057 for code review
- FR-058 for git workflow
Open Questions
1. Human-in-the-Loop
Question: Should the orchestrator require human approval for PRs, or auto-merge based on confidence?
| Option | Description |
|---|---|
| A) Always require approval | Every PR needs human review. Safest, but bottlenecks on human availability |
| B) Auto-merge with confidence threshold | If all tests pass and review confidence >= 90, auto-merge. Faster but riskier |
| C) Configurable per FR priority | High-priority FRs need approval, low-priority auto-merge. Balanced |
Recommendation: Option A for Phase 1-2 — human approves every PR. Option C for Phase 3+ once confidence in the system is established. Auto-merge should never be the default for a new system.
Decision:
2. Polling Strategy
Question: How should the orchestrator discover new work?
| Option | Description |
|---|---|
| A) Fixed interval polling | Check every N seconds. Simple, predictable, wastes cycles when idle |
| **B) Adaptive polling | Poll frequently when jobs are active, back off when idle. Efficient |
| C) Event-driven | File watcher on vault/10_features/03_planned/. Instant reaction, more complex |
Recommendation: Option B — adaptive polling is the right balance. Start at 30s, back off to 5 min when idle, return to 30s when jobs are detected.
Decision:
3. Crash Recovery
Question: What happens when the orchestrator restarts after a crash?
| Option | Description |
|---|---|
| A) Restart all in-flight jobs | Simple but wastes completed work |
| B) Resume from last checkpoint | Each pipeline stage checkpoints progress. Resume from last completed stage |
| C) Mark as failed, let user decide | Conservative — no automatic recovery. User re-triggers manually |
Recommendation: Option B — the job registry (FR-046) tracks stage completion. On restart, query for running jobs and resume from their last checkpoint.
Decision:
Phase Overview
| Phase | Description | Status |
|---|---|---|
| Phase 1 | Main loop — poll for planned FRs, dispatch single agent, wait for result | — |
| Phase 2 | Pipeline mode — plan → code → test → review → fix loop | — |
| Phase 3 | Multi-agent parallel execution, concurrent FRs | — |
| Phase 4 | Continuous operation (systemd, crash recovery, health monitoring) | — |
Phase 1: Single Agent Dispatch —
Prerequisite: Design doc designs/drafts/autonomous-coding-agency.md must be reviewed and moved to designs/approved/ before implementation starts.
Goal: Build the main loop that picks up a planned FR, dispatches a single coding agent, and reports the result.
| File / Feature | Details | Owner | Status |
|---|---|---|---|
src/opus/orchestrator/main.py | Entry point: initialize components, start main loop | opus | — |
src/opus/orchestrator/loop.py | Main loop: poll vault for planned FRs, pick highest priority, dispatch | opus | — |
src/opus/orchestrator/config.py | Configuration: polling interval, vault paths, concurrency limit (1 for Phase 1) | opus | — |
| FR discovery | Read vault/10_features/03_planned/ directory, parse frontmatter, filter by status | opus | — |
| Single agent dispatch | Create sandbox (FR-055), invoke coder agent, capture result | opus | — |
| Status updates | Update FR frontmatter with status: in-progress when dispatched, status: done when complete | opus | — |
| Result reporting | Log completion/failure, notify user (stdout for Phase 1) | mv | — |
| Unit tests | Mock vault, mock agent, test loop dispatches correctly | mv | — |
Phase 2: Pipeline Execution —
Goal: Implement the full agent pipeline: Planner → Coder → Tester → Reviewer with feedback loop.
| File / Feature | Details | Owner | Status |
|---|---|---|---|
src/opus/orchestrator/pipeline.py | Pipeline executor: sequence agent stages, pass context between stages | opus | — |
| Planner stage | Planner agent reads FR spec, outputs implementation plan | opus | — |
| Coder stage | Coder agent executes plan in sandboxed worktree | opus | — |
| Tester stage | Tester agent runs test suite, reports pass/fail with details | mv | — |
| Reviewer stage | Reviewer agent (FR-057) reviews diff, outputs structured feedback | mv | — |
| Feedback loop | On review failure: extract issues, send to coder, re-code, re-test, re-review (max 3 iterations) | mv | — |
| PR creation | On pipeline success: create PR via FR-058 git workflow | opus | — |
| Pipeline state | Track current stage per job in registry (FR-046) | opus | — |
| Integration tests | Full pipeline with mock agents: plan → code → test → review → PR | mv | — |
Phase 3: Parallel Execution —
Goal: Run multiple FR pipelines concurrently with independent sandboxes.
| File / Feature | Details | Owner | Status |
|---|---|---|---|
src/opus/orchestrator/scheduler.py | Concurrent scheduler: manage multiple pipeline instances | opus | — |
| Concurrency control | Configurable max concurrent pipelines (default: 3) | opus | — |
| Resource management | Track sandbox count, LLM API usage, prevent resource exhaustion | opus | — |
| Priority preemption | Higher-priority job can pause lower-priority one (via FR-046 queue) | opus | — |
| Inter-pipeline isolation | Each pipeline has its own sandbox, no shared mutable state | opus | — |
| Progress dashboard | Real-time view of all running pipelines and their current stage | opus | — |
| Integration tests | Run 3 pipelines concurrently, verify isolation and completion | mv | — |
Phase 4: Continuous Operation —
Goal: Run the orchestrator as a production service: crash recovery, health monitoring, notifications.
| File / Feature | Details | Owner | Status |
|---|---|---|---|
src/opus/orchestrator/service.py | Systemd-compatible service wrapper: start, stop, reload | opus | — |
| Crash recovery | On restart: query registry for running jobs, resume from last checkpoint | opus | — |
| Health monitoring | Periodic self-check: is the loop running? Are agents responsive? | opus | — |
| Notifications | Notify user on: pipeline completion, pipeline failure, stuck agent, crash recovery | mv | — |
| Metrics | Track: jobs/day, success rate, average pipeline duration, agent utilization | opus | — |
| Graceful shutdown | On SIGTERM: finish current pipeline stage, save state, exit cleanly | opus | — |
| Systemd unit file | opus-orchestrator.service for deployment on VPS (FR-019) | opus | — |
Prerequisites / Gap Analysis
Requirements
| Requirement | Description |
|---|---|
| REQ-0 | Design doc reviewed and approved (designs/approved/autonomous-coding-agency.md) |
| REQ-1 | Python project scaffold (FR-009) — orchestrator is a Python module |
| REQ-2 | Custom agents (FR-043) — orchestrator dispatches to agents |
| REQ-3 | Job registry (FR-046) — tracks all dispatched work |
| REQ-4 | LLM provider (FR-054) — agents need LLM access |
| REQ-5 | Sandboxed execution (FR-055) — agents need isolated environments |
Current State
| Component | Status | Details |
|---|---|---|
| Python scaffold | — | FR-009 not started |
| Agent definitions | — | FR-043 not started |
| Job registry | — | FR-046 not started |
| LLM abstraction | — | FR-054 not started |
| Sandbox | — | FR-055 not started |
Gap (What’s missing?)
| Gap | Effort | Blocker? |
|---|---|---|
| Python scaffold (FR-009) | Med | Yes |
| Agent definitions (FR-043) | High | Yes — orchestrator dispatches to agents |
| Job registry (FR-046) | Med | Yes for Phase 2+ — Phase 1 can work without it |
| LLM abstraction (FR-054) | Med | Yes — agents need LLM calls |
| Sandbox (FR-055) | Med | Yes — agents need isolated environments |
| VPS (FR-019) | High | Only for Phase 4 (production deployment) |
Test
Manual tests
| Test | Expected | Owner | Actual | Last |
|---|---|---|---|---|
| Main loop discovers planned FR in vault | FR detected and selected for dispatch | opus | pending | - |
| Single agent dispatch creates sandbox and invokes agent | Agent runs in isolated worktree | opus | pending | - |
| FR status updated to in-progress on dispatch | Frontmatter updated | opus | pending | - |
| Pipeline sequences stages correctly | Plan → code → test → review in order | opus | pending | - |
| Review failure triggers feedback loop to coder | Coder receives issues, fixes, re-review | mv | pending | - |
| Feedback loop exits after max iterations | Pipeline fails gracefully after 3 attempts | opus | pending | - |
| PR created on successful pipeline completion | PR exists on GitHub with correct description | opus | pending | - |
| 3 concurrent pipelines run without interference | All complete independently | opus | pending | - |
| Crash recovery resumes in-flight pipeline | Resumes from last completed stage | opus | pending | - |
| Graceful shutdown saves state and exits | No data loss, clean exit | opus | pending | - |
AI-verified tests
| Scenario | Expected behavior | Verification method |
|---|---|---|
| … | … | … |
E2E tests
| Scenario | Assertion |
|---|---|
| … | … |
Integration tests
| Component | Coverage |
|---|---|
| … | … |
Unit tests
| Component | Tests | Coverage |
|---|---|---|
| … | … | … |
History
| Date | Event | Details |
|---|---|---|
| 2026-03-12 | Created | Central piece of autonomous coding agency architecture |
References
- FR-009 (Python Project Scaffold) — code infrastructure prerequisite
- FR-043 (Custom Agents) — agent definitions dispatched by the orchestrator
- FR-045 (Complexity Routing) — scoring determines execution strategy
- FR-046 (Job Registry & Priority Queue) — job tracking and prioritization
- FR-054 (LLM Provider Abstraction) — agents use this for LLM calls
- FR-055 (Sandboxed Code Execution) — agents run in isolated environments
- FR-057 (Code Review Pipeline) — reviewer stage in the pipeline
- FR-058 (Agent Git Workflow) — PR creation and branch management
- FR-019 (VPS Deployment) — production hosting for the orchestrator
vault/00_system/designs/drafts/autonomous-coding-agency.md— architecture overview