Escalation Policy & Human-in-the-Loop Framework

Decisions

Pending: Default mode — ask-first or act-first with rollback?
Pending: Confidence threshold for auto-proceed
Pending: Escalation channel — CLI prompt, Telegram, GitHub issue?

User Tasks

Summary

Define explicit boundaries for when the autonomous system must stop and ask a human, based on risk scoring, confidence thresholds, and action categories.

Problem / Motivation

FR-056 (Autonomous Coding Orchestrator) will run continuously, making decisions without human oversight.
Without explicit “stop and ask” rules, the system either asks about everything (useless) or acts on everything (dangerous).
Risk varies wildly: renaming a variable vs adding a new dependency vs calling an external API vs deploying to production.
FR-008 (Design Review) gates implementation on approved designs, but that’s one gate. The system needs gates throughout the pipeline.
Every serious autonomous coding system (Devin, SWE-agent, etc.) has learned this the hard way — uncontrolled autonomy leads to runaway costs, broken repos, and lost trust.

Proposed Solution

A risk classification system with escalation rules, implemented as a policy file the orchestrator reads. Each action type has a risk level and a corresponding escalation path (auto-proceed, log-and-proceed, ask-and-wait, block).

Open Questions

1. Default Autonomy Mode

Question: What should the default behavior be when no specific rule matches?

Option	Description
A) Ask-first	Unknown actions require human approval. Safe but slower.
B) Act-first with rollback	Proceed but ensure reversibility. Fast but riskier.
C) Act-first with logging	Proceed and log for post-hoc review. Fastest, least safe.

Recommendation: Option A for Phase 1-2. Move to B for well-tested action types in Phase 3.

Decision:

2. Confidence Scoring

Question: Should the system self-assess confidence before acting?

Option	Description
A) LLM self-rating + action type	Combine action risk level with LLM’s confidence estimate
B) Action type only	Fixed rules per action category, no self-assessment
C) Outcome prediction	Predict impact before acting, escalate if high-impact

Recommendation: Option A — simple to implement, LLM confidence is imperfect but useful as a signal.

Decision:

3. Escalation Channel

Question: How does the system reach the human?

Option	Description
A) Multi-channel, priority-based	Critical → Telegram; medium → GitHub issue; low → log file
B) Single channel	Everything goes to one place
C) Queue-based	All escalations go to a review queue, processed in batch

Recommendation: Option A eventually, but Option B (GitHub issues) for Phase 1.

Decision:

Phase Overview

Phase	Description	Status
Phase 1	Policy file + risk categories + hard-coded rules	—
Phase 2	Confidence scoring + escalation routing	—
Phase 3	Adaptive thresholds (learn from approval/rejection patterns)	—

Phase 1: Policy File & Risk Categories —

Goal: A structured policy file that the orchestrator reads to decide ask vs act.

File / Feature	Details	Owner	Status
`src/opus/policy/escalation.py`	EscalationPolicy class, risk levels, action matching	opus	—
`src/opus/policy/rules.yaml`	Declarative escalation rules	opus	—
Risk categories	Define: file-edit, dependency-add, external-api, git-push, deploy, config-change, cost-threshold	opus	—
Escalation levels	`auto` / `log` / `ask` / `block`	opus	—
Orchestrator integration	FR-056 calls policy before every significant action	opus	—

Initial rules (starting point):

Action	Risk	Escalation
Edit file in sandbox	low	auto
Add/remove dependency	medium	ask
Call external API	medium	ask
Push branch	low	log
Create PR	medium	log
Merge PR	high	ask
Deploy to production	critical	block (manual only)
Spend > $X in tokens	high	ask
Modify CLAUDE.md or settings	critical	ask
Delete files outside sandbox	critical	block

Phase 2: Confidence Scoring —

Goal: Combine action risk with LLM self-assessed confidence to make smarter escalation decisions.

File / Feature	Details	Owner	Status
`src/opus/policy/confidence.py`	ConfidenceScorer: LLM rates its own certainty (0-1)	opus	—
Decision matrix	risk × confidence → escalation level	opus	—
Override rules	Some actions always escalate regardless of confidence	opus	—

Decision matrix example:

	High confidence	Medium confidence	Low confidence
Low risk	auto	auto	log
Medium risk	log	ask	ask
High risk	ask	ask	block
Critical	ask	block	block

Phase 3: Adaptive Thresholds —

Goal: Learn from human approval/rejection patterns to tune thresholds over time.

File / Feature	Details	Owner	Status
Decision log	Record every escalation + human response	mv	—
Pattern analysis	Detect actions that are always approved → lower threshold	mv	—
Pattern analysis	Detect actions that are often rejected → raise threshold	opus	—
Threshold tuning	Propose rule changes, require human approval to apply	mv	—

Prerequisites / Gap Analysis

Requirements

Requirement	Description
REQ-0	Design doc reviewed and approved
REQ-1	FR-056 (Orchestrator) — consumer of escalation policy
REQ-2	FR-009 (Python scaffold) — code infrastructure

Current State

Component	Status	Details
Escalation rules	—	Nothing exists
Claude Code permissions	done	Built-in allow/deny per tool, but not context-aware
FR-008 design gate	new	One specific gate, not a general framework

Gap (What’s missing?)

Gap	Effort	Blocker?
Policy file + parser	Medium	No
Risk classification	Medium	No
Orchestrator integration	Low	Depends on FR-056

Test

Manual tests

Test	Expected	Actual	Last
…	…	pending	-

AI-verified tests

Scenario	Expected behavior	Verification method
…	…	…

E2E tests

Scenario	Assertion
…	…

Integration tests

Component	Coverage
…	…

Unit tests

Component	Tests	Coverage
…	…	…

History

Date	Event	Details
2026-03-12	Created	Identified as critical gap for autonomous operation

References

FR-056 (Autonomous Coding Orchestrator) — primary consumer
FR-057 (Code Review Pipeline) — review severity feeds into escalation
FR-008 (Design Review Workflow) — specific gate, this FR generalizes it
FR-053 (Cost & Token Tracking) — cost threshold triggers
FR-005 (Protected Files System) — file-level protection, this FR adds action-level

Opus Vault

Explorer

Escalation Policy & Human-in-the-Loop Framework

Decisions

User Tasks

Summary

Problem / Motivation

Proposed Solution

Open Questions

1. Default Autonomy Mode

2. Confidence Scoring

3. Escalation Channel

Phase Overview

Phase 1: Policy File & Risk Categories —

Phase 2: Confidence Scoring —

Phase 3: Adaptive Thresholds —

Prerequisites / Gap Analysis

Requirements

Current State

Gap (What’s missing?)

Test

Manual tests

AI-verified tests

E2E tests

Integration tests

Unit tests

History

References

Graph View

Table of Contents