Definition of Done & Acceptance Criteria

Decisions

Pending: Fixed checklist vs dynamic criteria per FR?
Pending: Minimum test coverage threshold
Pending: Who signs off — orchestrator auto-close or human always?

User Tasks

Summary

A framework that defines when a task is truly complete, giving the orchestrator clear exit criteria to prevent infinite loops and half-shipped work.

Problem / Motivation

FR-056 (Orchestrator) runs a pipeline: Plan → Code → Test → Review. But when does it stop?
Without exit criteria, the system either loops forever (review keeps finding issues) or declares victory too early (tests pass but functionality is wrong).
The FR template has a “Test” section but it’s manually written and not machine-readable.
Different FR types need different criteria: a hook FR needs “hook fires correctly”, a Python FR needs “tests pass + coverage”, a vault FR needs “template compliance.”
The feedback loop in FR-057 (max 3 iterations) is a timeout, not a definition of done.

Proposed Solution

A done-criteria section in FR frontmatter (or a companion file) that defines machine-checkable acceptance criteria. The orchestrator evaluates these after each pipeline run. A default checklist applies when no FR-specific criteria exist.

Open Questions

1. Criteria Format

Question: How should acceptance criteria be expressed?

Option	Description
A) Structured YAML in frontmatter	`done-criteria: [tests-pass, coverage-80, no-critical-review, lint-clean]`
B) Free-text in FR body	Human-readable but not machine-checkable
C) Separate criteria file	`vault/10_features/criteria/FR-XXX.yaml`

Recommendation: Option A — keeps criteria with the FR, machine-parseable, minimal overhead.

Decision:

2. Default Criteria

Question: What should apply when an FR doesn’t specify custom criteria?

Option	Description
A) Type-based defaults	Python FR → tests + coverage + lint; vault FR → template compliance + frontmatter; skill FR → smoke test
B) Universal minimum	Same checklist for everything
C) No default	Only explicit criteria count

Recommendation: Option A — different work types have different quality signals.

Decision:

3. Sign-off Authority

Question: Can the orchestrator auto-close an FR or does a human always sign off?

Option	Description
A) Auto-close for low-risk, human for high-risk	Ties into FR-059 escalation policy
B) Human always	Every FR completion requires user confirmation
C) Auto-close always	Trust the criteria

Recommendation: Option A — aligned with escalation policy. Low-risk FRs with all criteria met can auto-close.

Decision:

Phase Overview

Phase	Description	Status
Phase 1	Default criteria checklist + evaluator	—
Phase 2	FR-specific criteria in frontmatter + type-based defaults	—
Phase 3	Orchestrator integration + auto-close for qualifying FRs	—

Phase 1: Default Criteria & Evaluator —

Goal: A checklist evaluator that can verify common done-criteria.

File / Feature	Details	Owner	Status
`src/opus/quality/criteria.py`	DoneCriteria model, Evaluator class	opus	—
`src/opus/quality/checks/`	Individual check implementations	opus	—
Check: `tests_pass`	Run pytest, verify exit code 0	mv	—
Check: `coverage_threshold`	Parse coverage report, check minimum %	opus	—
Check: `lint_clean`	Run ruff, verify no errors	mv	—
Check: `no_critical_review`	Parse FR-057 review output, verify no critical findings	mv	—
Check: `template_compliant`	Verify vault files match their template structure	mv	—
Check: `frontmatter_valid`	Verify required frontmatter fields present and correct	mv	—

Default checklist (applies when no custom criteria specified):

Check	Applies to	Threshold
`tests_pass`	Python code	all pass
`coverage_threshold`	Python code	80%
`lint_clean`	Python code	0 errors
`no_critical_review`	All code	0 critical findings
`template_compliant`	Vault files	full compliance
`frontmatter_valid`	FR/vault files	all required fields

Phase 2: FR-Specific Criteria —

Goal: Allow individual FRs to define custom acceptance criteria.

File / Feature	Details	Owner	Status
Frontmatter field	`done-criteria: [tests-pass, coverage-90, custom-check-name]`	mv	—
Type defaults	`src/opus/quality/defaults.yaml` — per-FR-type default criteria	opus	—
Custom checks	Support FR-specific test commands as criteria	mv	—

Type defaults:

FR Type	Default Criteria
Python feature	tests-pass, coverage-80, lint-clean, no-critical-review
Vault/template	template-compliant, frontmatter-valid
Skill	smoke-test-pass, no-critical-review
Hook	hook-fires, no-errors-in-log
Infrastructure	tests-pass, lint-clean, docs-updated

Phase 3: Orchestrator Integration —

Goal: Orchestrator uses criteria to decide “done” vs “iterate” vs “escalate.”

File / Feature	Details	Owner	Status
Pipeline exit logic	After review stage, run evaluator. All pass → done. Failures → iterate or escalate.	mv	—
Auto-close integration	Low-risk FR + all criteria met → mark done, create PR	opus	—
Escalation integration	High-risk FR or criteria failures after max iterations → escalate to human (FR-059)	mv	—
FR status update	Auto-update FR frontmatter: status → done, result filled in	opus	—

Test

Manual tests

Test	Expected	Actual	Last
…	…	pending	-

AI-verified tests

Scenario	Expected behavior	Verification method
…	…	…

E2E tests

Scenario	Assertion
…	…

Integration tests

Component	Coverage
…	…

Unit tests

Component	Tests	Coverage
…	…	…

History

Date	Event	Details
2026-03-12	Created	Identified as critical gap for autonomous operation

References

FR-056 (Autonomous Coding Orchestrator) — primary consumer of done-criteria
FR-057 (Code Review Pipeline) — review results feed into no_critical_review check
FR-059 (Escalation Policy) — criteria failures can trigger escalation
FR-011 (Testing Infrastructure) — provides test/coverage tooling
FR-051 (System Integration Testing) — smoke tests for non-code FRs

Opus Vault

Explorer

Definition of Done & Acceptance Criteria

Decisions

User Tasks

Summary

Problem / Motivation

Proposed Solution

Open Questions

1. Criteria Format

2. Default Criteria

3. Sign-off Authority

Phase Overview

Phase 1: Default Criteria & Evaluator —

Phase 2: FR-Specific Criteria —

Phase 3: Orchestrator Integration —

Test

Manual tests

AI-verified tests

E2E tests

Integration tests

Unit tests

History

References

Graph View

Table of Contents