Code Review Pipeline

Decisions

Pending: Review granularity — file-level, hunk-level, or line-level?
Pending: Maximum feedback loop iterations before escalation?
Pending: Should review rules be strict (block on any issue) or advisory (block only on critical)?

User Tasks

Summary

Build an automated code review agent that reads diffs, produces structured feedback with severity levels, and drives a feedback loop back to the coder agent until the review passes.

Problem / Motivation

In the autonomous coding pipeline (FR-056), code goes from the coder agent directly to a PR. Without automated review:

Obvious bugs, style violations, and anti-patterns slip through
The human reviewer (who approves PRs) wastes time on issues that a machine could catch
No feedback loop — the coder doesn’t learn from mistakes within a session
Quality is inconsistent across different FR implementations

Automated review closes the loop: Reviewer catches issues → Coder fixes → Reviewer re-checks → only clean code reaches the PR.

Proposed Solution

A src/opus/review/ module with:

Review Agent: Reads git diffs and produces structured feedback (issues list with severity, location, description, suggestion).
Feedback Loop Controller: Routes review findings back to the coder agent, tracks iterations, enforces max retries.
Review Rules Engine: Loads coding standards and patterns from the vault as review context, so the reviewer checks project-specific conventions.

Open Questions

1. Review Granularity

Question: At what level should the reviewer provide feedback?

Option	Description
A) File-level	”This file has issues: …” Coarse, may be vague
B) Hunk-level	Feedback per diff hunk. Good balance of specificity and noise
C) Line-level	Feedback on specific lines with inline suggestions. Most actionable

Recommendation: Option C — line-level feedback is most actionable for the coder agent. Include file path + line number + suggestion.

Decision:

2. Max Feedback Loop Iterations

Question: How many review-fix cycles before giving up?

Option	Description
A) 1 iteration	Reviewer flags issues once, coder fixes once, done. May leave issues
B) 3 iterations	Up to 3 rounds of review-fix. Balances quality with time cost
C) Until all issues resolved	Keep looping until clean. Risk of infinite loops on intractable issues

Recommendation: Option B — 3 iterations catches most issues. If still failing after 3, escalate to human with the remaining issues listed.

Decision:

3. Review Strictness

Question: Should any review issue block the PR, or only critical ones?

Option	Description
A) Strict — any issue blocks	Every finding must be resolved. High quality but slow
B) Severity-based — critical blocks, warnings are advisory	Critical/error severity blocks PR. Warnings included in PR description but don’t block
C) Advisory only — nothing blocks	Review is informational. Human decides what to fix

Recommendation: Option B — critical issues (bugs, security, missing tests) block. Warnings (style, naming, minor improvements) are noted but don’t block.

Decision:

Phase Overview

Phase	Description	Status
Phase 1	Review agent producing structured feedback	—
Phase 2	Feedback loop — reviewer → coder → re-review	—
Phase 3	Review rules from vault	—

Phase 1: Review Agent —

Goal: Build a review agent that reads diffs and produces structured, actionable feedback.

File / Feature	Details	Owner	Status
`src/opus/review/models.py`	`ReviewFinding` dataclass: file, line, severity (critical/error/warning/info), description, suggestion	mv	—
`src/opus/review/models.py`	`ReviewResult` dataclass: findings list, overall verdict (pass/fail), summary	mv	—
`src/opus/review/reviewer.py`	ReviewAgent: takes a git diff, calls LLM (via FR-054), parses structured feedback	mv	—
`src/opus/review/prompts.py`	Review prompt templates: system prompt for code review, output format instructions	mv	—
Diff parsing	Parse unified diff format into structured hunks for targeted review	mv	—
Severity classification	Critical: bugs, security issues, data loss. Error: logic errors, missing error handling. Warning: style, naming. Info: suggestions	opus	—
Unit tests	Mock LLM, verify finding extraction, severity classification, verdict logic	mv	—

Phase 2: Feedback Loop —

Goal: Wire the reviewer into the pipeline with a fix-review cycle.

File / Feature	Details	Owner	Status
`src/opus/review/loop.py`	FeedbackLoopController: manage review iterations, track findings across rounds	mv	—
Finding → fix instruction	Convert each ReviewFinding into a concrete fix instruction for the coder agent	mv	—
Iteration tracking	Track which findings are resolved vs still open across iterations	opus	—
Max iteration enforcement	After N iterations (default: 3), report remaining issues and escalate	opus	—
Escalation	On max iterations: create PR with review comments, notify human of unresolved issues	mv	—
Integration with FR-056	Wire into orchestrator pipeline as the review stage	mv	—
Integration tests	Mock coder + reviewer: run feedback loop, verify convergence	mv	—

Phase 3: Review Rules from Vault —

Goal: Load project-specific coding standards and patterns from the vault as review context.

File / Feature	Details	Owner	Status
`src/opus/review/rules.py`	RulesLoader: read markdown files from vault, extract review rules	mv	—
`vault/00_system/docs/coding-standards.md`	Project coding standards in structured markdown	opus	—
Rule injection	Append rules to review prompt as context, so reviewer checks project conventions	mv	—
Pattern library	Common anti-patterns and preferred patterns for the project	opus	—
Rule categories	Group rules by type: style, security, performance, testing, architecture	mv	—
Unit tests	Rules loading, prompt assembly with rules context	mv	—

Prerequisites / Gap Analysis

Requirements

Requirement	Description
REQ-0	Coding agency design doc reviewed and approved
REQ-1	Custom agents (FR-043) — reviewer is an agent definition
REQ-2	LLM provider (FR-054) — reviewer needs LLM for analysis

Current State

Component	Status	Details
Agent definitions	—	FR-043 not started
LLM abstraction	—	FR-054 not started
Review process	—	No automated review exists

Gap (What’s missing?)

Gap	Effort	Blocker?
Agent definitions (FR-043)	High	Yes — reviewer is an agent
LLM abstraction (FR-054)	Med	Yes — reviewer needs LLM calls
Coding standards doc	Low	No — can start with generic rules

Test

Manual tests

Test	Expected	Actual	Last
Feedback loop: coder fixes issue, re-review passes	Issue resolved in round 2, verdict: pass	pending	-
Review rules from vault influence findings	Project-specific convention violation caught	pending	-

AI-verified tests

Scenario	Expected behavior	Verification method
…	…	…

E2E tests

Scenario	Assertion
…	…

Integration tests

Component	Coverage
…	…

Unit tests

Component	Tests	Coverage
…	…	…

History

Date	Event	Details
2026-03-12	Created	Part of autonomous coding agency architecture

References

FR-043 (Custom Agents) — reviewer is an agent definition
FR-054 (LLM Provider Abstraction) — reviewer uses LLM for analysis
FR-056 (Autonomous Coding Orchestrator) — review is a pipeline stage
FR-058 (Agent Git Workflow) — review happens on the diff before PR creation
vault/00_system/designs/drafts/autonomous-coding-agency.md — architecture overview

Opus Vault

Explorer

Code Review Pipeline

Decisions

User Tasks

Summary

Problem / Motivation

Proposed Solution

Open Questions

1. Review Granularity

2. Max Feedback Loop Iterations

3. Review Strictness

Phase Overview

Phase 1: Review Agent —

Phase 2: Feedback Loop —

Phase 3: Review Rules from Vault —

Prerequisites / Gap Analysis

Requirements

Current State

Gap (What’s missing?)

Test

Manual tests

AI-verified tests

E2E tests

Integration tests

Unit tests

History

References

Graph View

Table of Contents

Backlinks