Decisions
- Pending: Review granularity — file-level, hunk-level, or line-level?
- Pending: Maximum feedback loop iterations before escalation?
- Pending: Should review rules be strict (block on any issue) or advisory (block only on critical)?
User Tasks
Summary
Build an automated code review agent that reads diffs, produces structured feedback with severity levels, and drives a feedback loop back to the coder agent until the review passes.
Problem / Motivation
In the autonomous coding pipeline (FR-056), code goes from the coder agent directly to a PR. Without automated review:
- Obvious bugs, style violations, and anti-patterns slip through
- The human reviewer (who approves PRs) wastes time on issues that a machine could catch
- No feedback loop — the coder doesn’t learn from mistakes within a session
- Quality is inconsistent across different FR implementations
Automated review closes the loop: Reviewer catches issues → Coder fixes → Reviewer re-checks → only clean code reaches the PR.
Proposed Solution
A src/opus/review/ module with:
- Review Agent: Reads git diffs and produces structured feedback (issues list with severity, location, description, suggestion).
- Feedback Loop Controller: Routes review findings back to the coder agent, tracks iterations, enforces max retries.
- Review Rules Engine: Loads coding standards and patterns from the vault as review context, so the reviewer checks project-specific conventions.
Open Questions
1. Review Granularity
Question: At what level should the reviewer provide feedback?
| Option | Description |
|---|---|
| A) File-level | ”This file has issues: …” Coarse, may be vague |
| B) Hunk-level | Feedback per diff hunk. Good balance of specificity and noise |
| C) Line-level | Feedback on specific lines with inline suggestions. Most actionable |
Recommendation: Option C — line-level feedback is most actionable for the coder agent. Include file path + line number + suggestion.
Decision:
2. Max Feedback Loop Iterations
Question: How many review-fix cycles before giving up?
| Option | Description |
|---|---|
| A) 1 iteration | Reviewer flags issues once, coder fixes once, done. May leave issues |
| B) 3 iterations | Up to 3 rounds of review-fix. Balances quality with time cost |
| C) Until all issues resolved | Keep looping until clean. Risk of infinite loops on intractable issues |
Recommendation: Option B — 3 iterations catches most issues. If still failing after 3, escalate to human with the remaining issues listed.
Decision:
3. Review Strictness
Question: Should any review issue block the PR, or only critical ones?
| Option | Description |
|---|---|
| A) Strict — any issue blocks | Every finding must be resolved. High quality but slow |
| B) Severity-based — critical blocks, warnings are advisory | Critical/error severity blocks PR. Warnings included in PR description but don’t block |
| C) Advisory only — nothing blocks | Review is informational. Human decides what to fix |
Recommendation: Option B — critical issues (bugs, security, missing tests) block. Warnings (style, naming, minor improvements) are noted but don’t block.
Decision:
Phase Overview
| Phase | Description | Status |
|---|---|---|
| Phase 1 | Review agent producing structured feedback | — |
| Phase 2 | Feedback loop — reviewer → coder → re-review | — |
| Phase 3 | Review rules from vault | — |
Phase 1: Review Agent —
Goal: Build a review agent that reads diffs and produces structured, actionable feedback.
| File / Feature | Details | Owner | Status |
|---|---|---|---|
src/opus/review/models.py | ReviewFinding dataclass: file, line, severity (critical/error/warning/info), description, suggestion | mv | — |
src/opus/review/models.py | ReviewResult dataclass: findings list, overall verdict (pass/fail), summary | mv | — |
src/opus/review/reviewer.py | ReviewAgent: takes a git diff, calls LLM (via FR-054), parses structured feedback | mv | — |
src/opus/review/prompts.py | Review prompt templates: system prompt for code review, output format instructions | mv | — |
| Diff parsing | Parse unified diff format into structured hunks for targeted review | mv | — |
| Severity classification | Critical: bugs, security issues, data loss. Error: logic errors, missing error handling. Warning: style, naming. Info: suggestions | opus | — |
| Unit tests | Mock LLM, verify finding extraction, severity classification, verdict logic | mv | — |
Phase 2: Feedback Loop —
Goal: Wire the reviewer into the pipeline with a fix-review cycle.
| File / Feature | Details | Owner | Status |
|---|---|---|---|
src/opus/review/loop.py | FeedbackLoopController: manage review iterations, track findings across rounds | mv | — |
| Finding → fix instruction | Convert each ReviewFinding into a concrete fix instruction for the coder agent | mv | — |
| Iteration tracking | Track which findings are resolved vs still open across iterations | opus | — |
| Max iteration enforcement | After N iterations (default: 3), report remaining issues and escalate | opus | — |
| Escalation | On max iterations: create PR with review comments, notify human of unresolved issues | mv | — |
| Integration with FR-056 | Wire into orchestrator pipeline as the review stage | mv | — |
| Integration tests | Mock coder + reviewer: run feedback loop, verify convergence | mv | — |
Phase 3: Review Rules from Vault —
Goal: Load project-specific coding standards and patterns from the vault as review context.
| File / Feature | Details | Owner | Status |
|---|---|---|---|
src/opus/review/rules.py | RulesLoader: read markdown files from vault, extract review rules | mv | — |
vault/00_system/docs/coding-standards.md | Project coding standards in structured markdown | opus | — |
| Rule injection | Append rules to review prompt as context, so reviewer checks project conventions | mv | — |
| Pattern library | Common anti-patterns and preferred patterns for the project | opus | — |
| Rule categories | Group rules by type: style, security, performance, testing, architecture | mv | — |
| Unit tests | Rules loading, prompt assembly with rules context | mv | — |
Prerequisites / Gap Analysis
Requirements
| Requirement | Description |
|---|---|
| REQ-0 | Coding agency design doc reviewed and approved |
| REQ-1 | Custom agents (FR-043) — reviewer is an agent definition |
| REQ-2 | LLM provider (FR-054) — reviewer needs LLM for analysis |
Current State
| Component | Status | Details |
|---|---|---|
| Agent definitions | — | FR-043 not started |
| LLM abstraction | — | FR-054 not started |
| Review process | — | No automated review exists |
Gap (What’s missing?)
| Gap | Effort | Blocker? |
|---|---|---|
| Agent definitions (FR-043) | High | Yes — reviewer is an agent |
| LLM abstraction (FR-054) | Med | Yes — reviewer needs LLM calls |
| Coding standards doc | Low | No — can start with generic rules |
Test
Manual tests
| Test | Expected | Actual | Last |
|---|---|---|---|
| Feedback loop: coder fixes issue, re-review passes | Issue resolved in round 2, verdict: pass | pending | - |
| Review rules from vault influence findings | Project-specific convention violation caught | pending | - |
AI-verified tests
| Scenario | Expected behavior | Verification method |
|---|---|---|
| … | … | … |
E2E tests
| Scenario | Assertion |
|---|---|
| … | … |
Integration tests
| Component | Coverage |
|---|---|
| … | … |
Unit tests
| Component | Tests | Coverage |
|---|---|---|
| … | … | … |
History
| Date | Event | Details |
|---|---|---|
| 2026-03-12 | Created | Part of autonomous coding agency architecture |
References
- FR-043 (Custom Agents) — reviewer is an agent definition
- FR-054 (LLM Provider Abstraction) — reviewer uses LLM for analysis
- FR-056 (Autonomous Coding Orchestrator) — review is a pipeline stage
- FR-058 (Agent Git Workflow) — review happens on the diff before PR creation
vault/00_system/designs/drafts/autonomous-coding-agency.md— architecture overview