Decisions

  • Pending: Fixed checklist vs dynamic criteria per FR?
  • Pending: Minimum test coverage threshold
  • Pending: Who signs off — orchestrator auto-close or human always?

User Tasks


Summary

A framework that defines when a task is truly complete, giving the orchestrator clear exit criteria to prevent infinite loops and half-shipped work.

Problem / Motivation

  • FR-056 (Orchestrator) runs a pipeline: Plan → Code → Test → Review. But when does it stop?
  • Without exit criteria, the system either loops forever (review keeps finding issues) or declares victory too early (tests pass but functionality is wrong).
  • The FR template has a “Test” section but it’s manually written and not machine-readable.
  • Different FR types need different criteria: a hook FR needs “hook fires correctly”, a Python FR needs “tests pass + coverage”, a vault FR needs “template compliance.”
  • The feedback loop in FR-057 (max 3 iterations) is a timeout, not a definition of done.

Proposed Solution

A done-criteria section in FR frontmatter (or a companion file) that defines machine-checkable acceptance criteria. The orchestrator evaluates these after each pipeline run. A default checklist applies when no FR-specific criteria exist.


Open Questions

1. Criteria Format

Question: How should acceptance criteria be expressed?

OptionDescription
A) Structured YAML in frontmatterdone-criteria: [tests-pass, coverage-80, no-critical-review, lint-clean]
B) Free-text in FR bodyHuman-readable but not machine-checkable
C) Separate criteria filevault/10_features/criteria/FR-XXX.yaml

Recommendation: Option A — keeps criteria with the FR, machine-parseable, minimal overhead.

Decision:

2. Default Criteria

Question: What should apply when an FR doesn’t specify custom criteria?

OptionDescription
A) Type-based defaultsPython FR → tests + coverage + lint; vault FR → template compliance + frontmatter; skill FR → smoke test
B) Universal minimumSame checklist for everything
C) No defaultOnly explicit criteria count

Recommendation: Option A — different work types have different quality signals.

Decision:

3. Sign-off Authority

Question: Can the orchestrator auto-close an FR or does a human always sign off?

OptionDescription
A) Auto-close for low-risk, human for high-riskTies into FR-059 escalation policy
B) Human alwaysEvery FR completion requires user confirmation
C) Auto-close alwaysTrust the criteria

Recommendation: Option A — aligned with escalation policy. Low-risk FRs with all criteria met can auto-close.

Decision:


Phase Overview

PhaseDescriptionStatus
Phase 1Default criteria checklist + evaluator
Phase 2FR-specific criteria in frontmatter + type-based defaults
Phase 3Orchestrator integration + auto-close for qualifying FRs

Phase 1: Default Criteria & Evaluator —

Goal: A checklist evaluator that can verify common done-criteria.

File / FeatureDetailsOwnerStatus
src/opus/quality/criteria.pyDoneCriteria model, Evaluator classopus
src/opus/quality/checks/Individual check implementationsopus
Check: tests_passRun pytest, verify exit code 0mv
Check: coverage_thresholdParse coverage report, check minimum %opus
Check: lint_cleanRun ruff, verify no errorsmv
Check: no_critical_reviewParse FR-057 review output, verify no critical findingsmv
Check: template_compliantVerify vault files match their template structuremv
Check: frontmatter_validVerify required frontmatter fields present and correctmv

Default checklist (applies when no custom criteria specified):

CheckApplies toThreshold
tests_passPython codeall pass
coverage_thresholdPython code80%
lint_cleanPython code0 errors
no_critical_reviewAll code0 critical findings
template_compliantVault filesfull compliance
frontmatter_validFR/vault filesall required fields

Phase 2: FR-Specific Criteria —

Goal: Allow individual FRs to define custom acceptance criteria.

File / FeatureDetailsOwnerStatus
Frontmatter fielddone-criteria: [tests-pass, coverage-90, custom-check-name]mv
Type defaultssrc/opus/quality/defaults.yaml — per-FR-type default criteriaopus
Custom checksSupport FR-specific test commands as criteriamv

Type defaults:

FR TypeDefault Criteria
Python featuretests-pass, coverage-80, lint-clean, no-critical-review
Vault/templatetemplate-compliant, frontmatter-valid
Skillsmoke-test-pass, no-critical-review
Hookhook-fires, no-errors-in-log
Infrastructuretests-pass, lint-clean, docs-updated

Phase 3: Orchestrator Integration —

Goal: Orchestrator uses criteria to decide “done” vs “iterate” vs “escalate.”

File / FeatureDetailsOwnerStatus
Pipeline exit logicAfter review stage, run evaluator. All pass → done. Failures → iterate or escalate.mv
Auto-close integrationLow-risk FR + all criteria met → mark done, create PRopus
Escalation integrationHigh-risk FR or criteria failures after max iterations → escalate to human (FR-059)mv
FR status updateAuto-update FR frontmatter: status → done, result filled inopus

Test

Manual tests

TestExpectedActualLast
pending-

AI-verified tests

ScenarioExpected behaviorVerification method

E2E tests

ScenarioAssertion

Integration tests

ComponentCoverage

Unit tests

ComponentTestsCoverage

History

DateEventDetails
2026-03-12CreatedIdentified as critical gap for autonomous operation

References

  • FR-056 (Autonomous Coding Orchestrator) — primary consumer of done-criteria
  • FR-057 (Code Review Pipeline) — review results feed into no_critical_review check
  • FR-059 (Escalation Policy) — criteria failures can trigger escalation
  • FR-011 (Testing Infrastructure) — provides test/coverage tooling
  • FR-051 (System Integration Testing) — smoke tests for non-code FRs