Define explicit boundaries for when the autonomous system must stop and ask a human, based on risk scoring, confidence thresholds, and action categories.
Problem / Motivation
FR-056 (Autonomous Coding Orchestrator) will run continuously, making decisions without human oversight.
Without explicit “stop and ask” rules, the system either asks about everything (useless) or acts on everything (dangerous).
Risk varies wildly: renaming a variable vs adding a new dependency vs calling an external API vs deploying to production.
FR-008 (Design Review) gates implementation on approved designs, but that’s one gate. The system needs gates throughout the pipeline.
Every serious autonomous coding system (Devin, SWE-agent, etc.) has learned this the hard way — uncontrolled autonomy leads to runaway costs, broken repos, and lost trust.
Proposed Solution
A risk classification system with escalation rules, implemented as a policy file the orchestrator reads. Each action type has a risk level and a corresponding escalation path (auto-proceed, log-and-proceed, ask-and-wait, block).
Open Questions
1. Default Autonomy Mode
Question: What should the default behavior be when no specific rule matches?
Option
Description
A) Ask-first
Unknown actions require human approval. Safe but slower.
B) Act-first with rollback
Proceed but ensure reversibility. Fast but riskier.
C) Act-first with logging
Proceed and log for post-hoc review. Fastest, least safe.
Recommendation: Option A for Phase 1-2. Move to B for well-tested action types in Phase 3.
Decision:
2. Confidence Scoring
Question: Should the system self-assess confidence before acting?
Option
Description
A) LLM self-rating + action type
Combine action risk level with LLM’s confidence estimate
B) Action type only
Fixed rules per action category, no self-assessment
C) Outcome prediction
Predict impact before acting, escalate if high-impact
Recommendation: Option A — simple to implement, LLM confidence is imperfect but useful as a signal.