Decisions
- Pending: Fixed checklist vs dynamic criteria per FR?
- Pending: Minimum test coverage threshold
- Pending: Who signs off — orchestrator auto-close or human always?
User Tasks
Summary
A framework that defines when a task is truly complete, giving the orchestrator clear exit criteria to prevent infinite loops and half-shipped work.
Problem / Motivation
- FR-056 (Orchestrator) runs a pipeline: Plan → Code → Test → Review. But when does it stop?
- Without exit criteria, the system either loops forever (review keeps finding issues) or declares victory too early (tests pass but functionality is wrong).
- The FR template has a “Test” section but it’s manually written and not machine-readable.
- Different FR types need different criteria: a hook FR needs “hook fires correctly”, a Python FR needs “tests pass + coverage”, a vault FR needs “template compliance.”
- The feedback loop in FR-057 (max 3 iterations) is a timeout, not a definition of done.
Proposed Solution
A done-criteria section in FR frontmatter (or a companion file) that defines machine-checkable acceptance criteria. The orchestrator evaluates these after each pipeline run. A default checklist applies when no FR-specific criteria exist.
Open Questions
1. Criteria Format
Question: How should acceptance criteria be expressed?
| Option | Description |
|---|---|
| A) Structured YAML in frontmatter | done-criteria: [tests-pass, coverage-80, no-critical-review, lint-clean] |
| B) Free-text in FR body | Human-readable but not machine-checkable |
| C) Separate criteria file | vault/10_features/criteria/FR-XXX.yaml |
Recommendation: Option A — keeps criteria with the FR, machine-parseable, minimal overhead.
Decision:
2. Default Criteria
Question: What should apply when an FR doesn’t specify custom criteria?
| Option | Description |
|---|---|
| A) Type-based defaults | Python FR → tests + coverage + lint; vault FR → template compliance + frontmatter; skill FR → smoke test |
| B) Universal minimum | Same checklist for everything |
| C) No default | Only explicit criteria count |
Recommendation: Option A — different work types have different quality signals.
Decision:
3. Sign-off Authority
Question: Can the orchestrator auto-close an FR or does a human always sign off?
| Option | Description |
|---|---|
| A) Auto-close for low-risk, human for high-risk | Ties into FR-059 escalation policy |
| B) Human always | Every FR completion requires user confirmation |
| C) Auto-close always | Trust the criteria |
Recommendation: Option A — aligned with escalation policy. Low-risk FRs with all criteria met can auto-close.
Decision:
Phase Overview
| Phase | Description | Status |
|---|---|---|
| Phase 1 | Default criteria checklist + evaluator | — |
| Phase 2 | FR-specific criteria in frontmatter + type-based defaults | — |
| Phase 3 | Orchestrator integration + auto-close for qualifying FRs | — |
Phase 1: Default Criteria & Evaluator —
Goal: A checklist evaluator that can verify common done-criteria.
| File / Feature | Details | Owner | Status |
|---|---|---|---|
src/opus/quality/criteria.py | DoneCriteria model, Evaluator class | opus | — |
src/opus/quality/checks/ | Individual check implementations | opus | — |
Check: tests_pass | Run pytest, verify exit code 0 | mv | — |
Check: coverage_threshold | Parse coverage report, check minimum % | opus | — |
Check: lint_clean | Run ruff, verify no errors | mv | — |
Check: no_critical_review | Parse FR-057 review output, verify no critical findings | mv | — |
Check: template_compliant | Verify vault files match their template structure | mv | — |
Check: frontmatter_valid | Verify required frontmatter fields present and correct | mv | — |
Default checklist (applies when no custom criteria specified):
| Check | Applies to | Threshold |
|---|---|---|
tests_pass | Python code | all pass |
coverage_threshold | Python code | 80% |
lint_clean | Python code | 0 errors |
no_critical_review | All code | 0 critical findings |
template_compliant | Vault files | full compliance |
frontmatter_valid | FR/vault files | all required fields |
Phase 2: FR-Specific Criteria —
Goal: Allow individual FRs to define custom acceptance criteria.
| File / Feature | Details | Owner | Status |
|---|---|---|---|
| Frontmatter field | done-criteria: [tests-pass, coverage-90, custom-check-name] | mv | — |
| Type defaults | src/opus/quality/defaults.yaml — per-FR-type default criteria | opus | — |
| Custom checks | Support FR-specific test commands as criteria | mv | — |
Type defaults:
| FR Type | Default Criteria |
|---|---|
| Python feature | tests-pass, coverage-80, lint-clean, no-critical-review |
| Vault/template | template-compliant, frontmatter-valid |
| Skill | smoke-test-pass, no-critical-review |
| Hook | hook-fires, no-errors-in-log |
| Infrastructure | tests-pass, lint-clean, docs-updated |
Phase 3: Orchestrator Integration —
Goal: Orchestrator uses criteria to decide “done” vs “iterate” vs “escalate.”
| File / Feature | Details | Owner | Status |
|---|---|---|---|
| Pipeline exit logic | After review stage, run evaluator. All pass → done. Failures → iterate or escalate. | mv | — |
| Auto-close integration | Low-risk FR + all criteria met → mark done, create PR | opus | — |
| Escalation integration | High-risk FR or criteria failures after max iterations → escalate to human (FR-059) | mv | — |
| FR status update | Auto-update FR frontmatter: status → done, result filled in | opus | — |
Test
Manual tests
| Test | Expected | Actual | Last |
|---|---|---|---|
| … | … | pending | - |
AI-verified tests
| Scenario | Expected behavior | Verification method |
|---|---|---|
| … | … | … |
E2E tests
| Scenario | Assertion |
|---|---|
| … | … |
Integration tests
| Component | Coverage |
|---|---|
| … | … |
Unit tests
| Component | Tests | Coverage |
|---|---|---|
| … | … | … |
History
| Date | Event | Details |
|---|---|---|
| 2026-03-12 | Created | Identified as critical gap for autonomous operation |
References
- FR-056 (Autonomous Coding Orchestrator) — primary consumer of done-criteria
- FR-057 (Code Review Pipeline) — review results feed into
no_critical_reviewcheck - FR-059 (Escalation Policy) — criteria failures can trigger escalation
- FR-011 (Testing Infrastructure) — provides test/coverage tooling
- FR-051 (System Integration Testing) — smoke tests for non-code FRs