Decisions

  • Pending: Default mode — ask-first or act-first with rollback?
  • Pending: Confidence threshold for auto-proceed
  • Pending: Escalation channel — CLI prompt, Telegram, GitHub issue?

User Tasks


Summary

Define explicit boundaries for when the autonomous system must stop and ask a human, based on risk scoring, confidence thresholds, and action categories.

Problem / Motivation

  • FR-056 (Autonomous Coding Orchestrator) will run continuously, making decisions without human oversight.
  • Without explicit “stop and ask” rules, the system either asks about everything (useless) or acts on everything (dangerous).
  • Risk varies wildly: renaming a variable vs adding a new dependency vs calling an external API vs deploying to production.
  • FR-008 (Design Review) gates implementation on approved designs, but that’s one gate. The system needs gates throughout the pipeline.
  • Every serious autonomous coding system (Devin, SWE-agent, etc.) has learned this the hard way — uncontrolled autonomy leads to runaway costs, broken repos, and lost trust.

Proposed Solution

A risk classification system with escalation rules, implemented as a policy file the orchestrator reads. Each action type has a risk level and a corresponding escalation path (auto-proceed, log-and-proceed, ask-and-wait, block).


Open Questions

1. Default Autonomy Mode

Question: What should the default behavior be when no specific rule matches?

OptionDescription
A) Ask-firstUnknown actions require human approval. Safe but slower.
B) Act-first with rollbackProceed but ensure reversibility. Fast but riskier.
C) Act-first with loggingProceed and log for post-hoc review. Fastest, least safe.

Recommendation: Option A for Phase 1-2. Move to B for well-tested action types in Phase 3.

Decision:

2. Confidence Scoring

Question: Should the system self-assess confidence before acting?

OptionDescription
A) LLM self-rating + action typeCombine action risk level with LLM’s confidence estimate
B) Action type onlyFixed rules per action category, no self-assessment
C) Outcome predictionPredict impact before acting, escalate if high-impact

Recommendation: Option A — simple to implement, LLM confidence is imperfect but useful as a signal.

Decision:

3. Escalation Channel

Question: How does the system reach the human?

OptionDescription
A) Multi-channel, priority-basedCritical → Telegram; medium → GitHub issue; low → log file
B) Single channelEverything goes to one place
C) Queue-basedAll escalations go to a review queue, processed in batch

Recommendation: Option A eventually, but Option B (GitHub issues) for Phase 1.

Decision:


Phase Overview

PhaseDescriptionStatus
Phase 1Policy file + risk categories + hard-coded rules
Phase 2Confidence scoring + escalation routing
Phase 3Adaptive thresholds (learn from approval/rejection patterns)

Phase 1: Policy File & Risk Categories —

Goal: A structured policy file that the orchestrator reads to decide ask vs act.

File / FeatureDetailsOwnerStatus
src/opus/policy/escalation.pyEscalationPolicy class, risk levels, action matchingopus
src/opus/policy/rules.yamlDeclarative escalation rulesopus
Risk categoriesDefine: file-edit, dependency-add, external-api, git-push, deploy, config-change, cost-thresholdopus
Escalation levelsauto / log / ask / blockopus
Orchestrator integrationFR-056 calls policy before every significant actionopus

Initial rules (starting point):

ActionRiskEscalation
Edit file in sandboxlowauto
Add/remove dependencymediumask
Call external APImediumask
Push branchlowlog
Create PRmediumlog
Merge PRhighask
Deploy to productioncriticalblock (manual only)
Spend > $X in tokenshighask
Modify CLAUDE.md or settingscriticalask
Delete files outside sandboxcriticalblock

Phase 2: Confidence Scoring —

Goal: Combine action risk with LLM self-assessed confidence to make smarter escalation decisions.

File / FeatureDetailsOwnerStatus
src/opus/policy/confidence.pyConfidenceScorer: LLM rates its own certainty (0-1)opus
Decision matrixrisk × confidence → escalation levelopus
Override rulesSome actions always escalate regardless of confidenceopus

Decision matrix example:

High confidenceMedium confidenceLow confidence
Low riskautoautolog
Medium risklogaskask
High riskaskaskblock
Criticalaskblockblock

Phase 3: Adaptive Thresholds —

Goal: Learn from human approval/rejection patterns to tune thresholds over time.

File / FeatureDetailsOwnerStatus
Decision logRecord every escalation + human responsemv
Pattern analysisDetect actions that are always approved → lower thresholdmv
Pattern analysisDetect actions that are often rejected → raise thresholdopus
Threshold tuningPropose rule changes, require human approval to applymv

Prerequisites / Gap Analysis

Requirements

RequirementDescription
REQ-0Design doc reviewed and approved
REQ-1FR-056 (Orchestrator) — consumer of escalation policy
REQ-2FR-009 (Python scaffold) — code infrastructure

Current State

ComponentStatusDetails
Escalation rulesNothing exists
Claude Code permissionsdoneBuilt-in allow/deny per tool, but not context-aware
FR-008 design gatenewOne specific gate, not a general framework

Gap (What’s missing?)

GapEffortBlocker?
Policy file + parserMediumNo
Risk classificationMediumNo
Orchestrator integrationLowDepends on FR-056

Test

Manual tests

TestExpectedActualLast
pending-

AI-verified tests

ScenarioExpected behaviorVerification method

E2E tests

ScenarioAssertion

Integration tests

ComponentCoverage

Unit tests

ComponentTestsCoverage

History

DateEventDetails
2026-03-12CreatedIdentified as critical gap for autonomous operation

References

  • FR-056 (Autonomous Coding Orchestrator) — primary consumer
  • FR-057 (Code Review Pipeline) — review severity feeds into escalation
  • FR-008 (Design Review Workflow) — specific gate, this FR generalizes it
  • FR-053 (Cost & Token Tracking) — cost threshold triggers
  • FR-005 (Protected Files System) — file-level protection, this FR adds action-level