Decisions

  • Pending: Review granularity — file-level, hunk-level, or line-level?
  • Pending: Maximum feedback loop iterations before escalation?
  • Pending: Should review rules be strict (block on any issue) or advisory (block only on critical)?

User Tasks


Summary

Build an automated code review agent that reads diffs, produces structured feedback with severity levels, and drives a feedback loop back to the coder agent until the review passes.

Problem / Motivation

In the autonomous coding pipeline (FR-056), code goes from the coder agent directly to a PR. Without automated review:

  • Obvious bugs, style violations, and anti-patterns slip through
  • The human reviewer (who approves PRs) wastes time on issues that a machine could catch
  • No feedback loop — the coder doesn’t learn from mistakes within a session
  • Quality is inconsistent across different FR implementations

Automated review closes the loop: Reviewer catches issues → Coder fixes → Reviewer re-checks → only clean code reaches the PR.

Proposed Solution

A src/opus/review/ module with:

  1. Review Agent: Reads git diffs and produces structured feedback (issues list with severity, location, description, suggestion).
  2. Feedback Loop Controller: Routes review findings back to the coder agent, tracks iterations, enforces max retries.
  3. Review Rules Engine: Loads coding standards and patterns from the vault as review context, so the reviewer checks project-specific conventions.

Open Questions

1. Review Granularity

Question: At what level should the reviewer provide feedback?

OptionDescription
A) File-level”This file has issues: …” Coarse, may be vague
B) Hunk-levelFeedback per diff hunk. Good balance of specificity and noise
C) Line-levelFeedback on specific lines with inline suggestions. Most actionable

Recommendation: Option C — line-level feedback is most actionable for the coder agent. Include file path + line number + suggestion.

Decision:

2. Max Feedback Loop Iterations

Question: How many review-fix cycles before giving up?

OptionDescription
A) 1 iterationReviewer flags issues once, coder fixes once, done. May leave issues
B) 3 iterationsUp to 3 rounds of review-fix. Balances quality with time cost
C) Until all issues resolvedKeep looping until clean. Risk of infinite loops on intractable issues

Recommendation: Option B — 3 iterations catches most issues. If still failing after 3, escalate to human with the remaining issues listed.

Decision:

3. Review Strictness

Question: Should any review issue block the PR, or only critical ones?

OptionDescription
A) Strict — any issue blocksEvery finding must be resolved. High quality but slow
B) Severity-based — critical blocks, warnings are advisoryCritical/error severity blocks PR. Warnings included in PR description but don’t block
C) Advisory only — nothing blocksReview is informational. Human decides what to fix

Recommendation: Option B — critical issues (bugs, security, missing tests) block. Warnings (style, naming, minor improvements) are noted but don’t block.

Decision:


Phase Overview

PhaseDescriptionStatus
Phase 1Review agent producing structured feedback
Phase 2Feedback loop — reviewer → coder → re-review
Phase 3Review rules from vault

Phase 1: Review Agent —

Goal: Build a review agent that reads diffs and produces structured, actionable feedback.

File / FeatureDetailsOwnerStatus
src/opus/review/models.pyReviewFinding dataclass: file, line, severity (critical/error/warning/info), description, suggestionmv
src/opus/review/models.pyReviewResult dataclass: findings list, overall verdict (pass/fail), summarymv
src/opus/review/reviewer.pyReviewAgent: takes a git diff, calls LLM (via FR-054), parses structured feedbackmv
src/opus/review/prompts.pyReview prompt templates: system prompt for code review, output format instructionsmv
Diff parsingParse unified diff format into structured hunks for targeted reviewmv
Severity classificationCritical: bugs, security issues, data loss. Error: logic errors, missing error handling. Warning: style, naming. Info: suggestionsopus
Unit testsMock LLM, verify finding extraction, severity classification, verdict logicmv

Phase 2: Feedback Loop —

Goal: Wire the reviewer into the pipeline with a fix-review cycle.

File / FeatureDetailsOwnerStatus
src/opus/review/loop.pyFeedbackLoopController: manage review iterations, track findings across roundsmv
Finding → fix instructionConvert each ReviewFinding into a concrete fix instruction for the coder agentmv
Iteration trackingTrack which findings are resolved vs still open across iterationsopus
Max iteration enforcementAfter N iterations (default: 3), report remaining issues and escalateopus
EscalationOn max iterations: create PR with review comments, notify human of unresolved issuesmv
Integration with FR-056Wire into orchestrator pipeline as the review stagemv
Integration testsMock coder + reviewer: run feedback loop, verify convergencemv

Phase 3: Review Rules from Vault —

Goal: Load project-specific coding standards and patterns from the vault as review context.

File / FeatureDetailsOwnerStatus
src/opus/review/rules.pyRulesLoader: read markdown files from vault, extract review rulesmv
vault/00_system/docs/coding-standards.mdProject coding standards in structured markdownopus
Rule injectionAppend rules to review prompt as context, so reviewer checks project conventionsmv
Pattern libraryCommon anti-patterns and preferred patterns for the projectopus
Rule categoriesGroup rules by type: style, security, performance, testing, architecturemv
Unit testsRules loading, prompt assembly with rules contextmv

Prerequisites / Gap Analysis

Requirements

RequirementDescription
REQ-0Coding agency design doc reviewed and approved
REQ-1Custom agents (FR-043) — reviewer is an agent definition
REQ-2LLM provider (FR-054) — reviewer needs LLM for analysis

Current State

ComponentStatusDetails
Agent definitionsFR-043 not started
LLM abstractionFR-054 not started
Review processNo automated review exists

Gap (What’s missing?)

GapEffortBlocker?
Agent definitions (FR-043)HighYes — reviewer is an agent
LLM abstraction (FR-054)MedYes — reviewer needs LLM calls
Coding standards docLowNo — can start with generic rules

Test

Manual tests

TestExpectedActualLast
Feedback loop: coder fixes issue, re-review passesIssue resolved in round 2, verdict: passpending-
Review rules from vault influence findingsProject-specific convention violation caughtpending-

AI-verified tests

ScenarioExpected behaviorVerification method

E2E tests

ScenarioAssertion

Integration tests

ComponentCoverage

Unit tests

ComponentTestsCoverage

History

DateEventDetails
2026-03-12CreatedPart of autonomous coding agency architecture

References

  • FR-043 (Custom Agents) — reviewer is an agent definition
  • FR-054 (LLM Provider Abstraction) — reviewer uses LLM for analysis
  • FR-056 (Autonomous Coding Orchestrator) — review is a pipeline stage
  • FR-058 (Agent Git Workflow) — review happens on the diff before PR creation
  • vault/00_system/designs/drafts/autonomous-coding-agency.md — architecture overview