Complexity Routing + Go Command

Decisions

Pending: Should scoring be heuristic-only, LLM-assisted, or hybrid?
Pending: What are the exact threshold boundaries between tiers?
Pending: Should the quality gate confidence threshold be configurable?
Pending: How to handle disagreement between advocate and reviewer agents?
Pending: What happens when an agent gets stuck — auto-escalate to user, retry, or hand off to another agent?
Pending: Should /go require explicit user confirmation before dispatching, or fire immediately?

User Tasks

Summary

Auto-analyze FR complexity when /go FR-XXX or /implement FR-XXX is called and route to the optimal execution strategy — single agent, pipeline with quality gate, or agent team — based on a numeric complexity score.

Problem / Motivation

Currently, the user must manually decide how to implement each feature: which agent to use, whether to review first, whether tasks can be parallelized. This requires orchestration knowledge and creates cognitive overhead. As Opus grows with more agents and more complex features, this manual routing becomes a bottleneck.

Not every feature request requires the same execution approach. Simple tasks get over-engineered, complex tasks get under-resourced. Manual routing is unsustainable at scale.

Inspired by Nexie (Sven Hennig’s personal AI system), complexity routing removes this burden. The system reads a spec, scores it, picks the right strategy, and dispatches work — all from one command.

Proposed Solution

Build a three-part system:

Complexity Scoring

Analyze the FR spec and produce a numeric score (0-100) based on:

Criterion	Weight	Description
Phase count	20	More phases = more complexity
Files affected	20	Count of files in phase tables
Cross-cutting concerns	20	Does the FR touch multiple systems? (skills + agents + vault + src)
Dependency count	15	Number of prerequisite FRs
New vs existing code	15	Creating new files/modules scores higher than modifying existing
Open questions count	10	More unresolved questions = more uncertainty

Tier Routing

Tier	Score Range	Strategy	Description
Simple	0-24	Single agent	One Claude Code session handles everything end-to-end
Medium	25-49	Pipeline + Quality Gate	Plan → Review (quality gate) → Implement. Advocate/reviewer agents must give confidence >= threshold
Complex	50+	Agent team	Lead agent coordinates Builder, Tester, and Reviewer agents working in parallel

Quality Gate (Pipeline Mode)

For medium-complexity FRs, the pipeline works as:

Plan: Generate detailed implementation plan from FR spec
Review: Advocate agent reviews plan, scores confidence (0-100)
Gate check: If confidence >= threshold (default 75), proceed to implementation. If below, surface concerns to user
Implement: Execute the approved plan

`/go` Skill

A single command that reads the FR, scores it, picks the strategy, dispatches work, and reports back on completion or blockers.

Override flags:

--team — force multi-agent team execution regardless of complexity score
--single / --simple — force single-agent execution
--phases 1,2 — limit implementation to specific phases only
--specialist <agent> — route to a specific agent
Default (no flag) — auto-route based on complexity score

Duration estimates: The system tracks execution times per complexity tier and reports estimated duration when dispatching:

Simple: ~5-10 min
Medium: ~15-20 min
Complex: ~20-30 min These estimates are refined over time based on actual execution data.

Open Questions

1. Scoring Method

Question: Should complexity scoring be heuristic-only, LLM-assisted, or hybrid?

Option	Description
A) Heuristic-only	Count phases, files, dependencies algorithmically. Fast, deterministic, transparent
B) LLM-assisted	Have Claude read the spec and output a score with reasoning. More nuanced but slower
C) Hybrid	Heuristic as base score, LLM adjusts for nuance (e.g., “this looks simple but has hidden complexity”)

Recommendation: Option A for Phase 1 — fast and deterministic. Option C for Phase 2 — LLM refinement for edge cases.

Decision:

2. Tier Thresholds

Question: What score ranges should map to each tier?

Option	Description
A) Fixed: 0-24 / 25-49 / 50+	Simple, predictable, easy to understand
B) Narrower middle: 0-19 / 20-39 / 40+	More tasks route to pipeline/team
C) Configurable	User sets thresholds in config file

Recommendation: Option A — start with wide, predictable ranges. Narrow based on experience.

Decision:

3. Quality Gate Threshold

Question: What confidence score should the quality gate require?

Option	Description
A) 60 — lenient	Most plans pass. Risk of low-quality implementations
B) 75 — balanced	Good plans pass, questionable ones flagged for user review
C) 90 — strict	Only high-confidence plans pass. May bottleneck on quality gate

Recommendation: Option B — 75 balances throughput with quality. Can be adjusted per-project.

Decision:

4. Agent Stuck Handling

Question: What happens when a dispatched agent hits a blocker?

Option	Description
A) Escalate to user	Pause work, notify user with context, wait for input
B) Auto-retry with different approach	Agent retries up to N times with adjusted prompt
C) Hand off to lead agent	A lead agent triages the blocker and reassigns

Recommendation: Option A for Phase 2 — keep the user in control. Option C is the end goal for Phase 3.

Decision:

Phase Overview

Phase	Description	Status
Phase 1	Complexity scoring formula + manual tier assignment	—
Phase 2	Auto-scoring + routing via /go and /implement	—
Phase 3	Full auto-routing with pipeline and team modes	—

Phase 1: Complexity Scoring Formula —

Goal: Define and implement the scoring algorithm that analyzes an FR spec and outputs a complexity score with breakdown.

File / Feature	Details	Owner	Status
`src/opus/routing/complexity.py`	Complexity scorer — parse FR spec, count factors, apply weights, return score	opus	—
`src/opus/routing/criteria.py`	Scoring criteria definitions: phase count, file count, dependency count, etc.	opus	—
Score output format	Return score + breakdown as structured data (e.g., `{total: 42, phases: 12, files: 10, ...}`)	opus	—
Manual tier assignment	Given a score, output the tier label (simple/medium/complex)	mv	—
Calibration	Score all existing FRs, verify distribution makes sense, adjust weights if needed	mv	—
Unit tests	Test scoring against sample FRs with known expected scores	mv	—

Phase 2: Auto-Scoring + Routing via /go and /implement —

Goal: Integrate complexity scoring with /go and /implement skills so scoring and routing happen automatically.

File / Feature	Details	Owner	Status
`src/opus/routing/router.py`	Strategy selection logic — score thresholds to strategy mapping	opus	—
`src/opus/routing/dispatcher.py`	Dispatch engine — invoke single agent, pipeline, or agent team based on strategy	opus	—
`.claude/skills/go/SKILL.md`	Skill definition for `/go` command	opus	—
`/implement` integration	Call complexity scorer when `/implement FR-XXX` is invoked	opus	—
Score display	Show complexity score and recommended tier to user before proceeding	mv	—
Override flags	Support `--simple`, `--team`, `--phases`, `--specialist` to bypass auto-routing	opus	—
Frontmatter update	Write `complexity-score` to FR frontmatter after scoring	opus	—
Blocker handling	Detect when agent is stuck, escalate to user	mv	—

Phase 3: Full Auto-Routing with Pipeline and Team Modes —

Goal: Implement the pipeline (quality gate) and agent team execution strategies, wired end-to-end.

File / Feature	Details	Owner	Status
`src/opus/routing/pipeline.py`	Pipeline executor: Plan → Review → Gate check → Implement	mv	—
`src/opus/routing/team.py`	Team executor: Lead agent coordinates Builder, Tester, Reviewer	mv	—
Quality gate	Advocate/reviewer agent scores plan confidence, gate check at threshold	mv	—
Gate failure handling	If confidence < threshold, surface concerns to user with specific issues	mv	—
Agent coordination	Lead agent breaks work into tasks, assigns to Builder/Tester/Reviewer	mv	—
Progress tracking	Track pipeline/team progress, report status to user	mv	—
End-to-end flow	Read spec → score → route → dispatch → report	opus	—
Status reporting	Report progress, completion, or blockers back to user	mv	—
Integration tests	Test full flow: score → route → execute for each tier	mv	—

Prerequisites / Gap Analysis

Requirements

Requirement	Description
REQ-1	Python project scaffold (FR-009) — scoring and routing are Python modules
REQ-2	/implement skill (FR-029) — routing is triggered by /implement
REQ-3	Custom agents (FR-043) — pipeline and team modes need agent definitions

Current State

Component	Status	Details
Python scaffold	—	FR-009 not started
/implement skill	—	FR-029 not started
Custom agents	—	FR-043 not started
Skill system	done	`.claude/skills/` works

Gap (What’s missing?)

Gap	Effort	Blocker?
Python scaffold (FR-009)	Med	Yes — scoring is Python code
/implement skill (FR-029)	Med	Yes — routing is triggered by /implement
Custom agents (FR-043)	High	Only for Phase 3

Test

Manual tests

Test	Expected	Owner	Actual	Last
Simple FR (1 phase, 2 files, 0 deps) scores < 25	Tier: simple	opus	pending	-
Medium FR (2 phases, 5 files, 2 deps) scores 25-49	Tier: medium	opus	pending	-
Complex FR (4 phases, 10+ files, 4 deps, cross-cutting) scores 50+	Tier: complex	opus	pending	-
Router selects correct strategy for each score range	Strategies match thresholds	opus	pending	-
`--simple` override forces single agent for complex FR	Single agent dispatched	opus	pending	-
`--team` override forces agent team for simple FR	Agent team dispatched	opus	pending	-
Pipeline quality gate blocks plan with confidence < 75	Plan flagged, user notified	opus	pending	-
Pipeline quality gate passes plan with confidence >= 75	Implementation proceeds	opus	pending	-
Score written to FR frontmatter after scoring	`complexity-score` field present	opus	pending	-
`/go FR-XXX` executes end-to-end	Work dispatched and completed or blocker reported	opus	pending	-
Agent stuck detection triggers escalation	User notified with context	opus	pending	-

AI-verified tests

Scenario	Expected behavior	Verification method
…	…	…

E2E tests

Scenario	Assertion
…	…

Integration tests

Component	Coverage
…	…

Unit tests

Component	Tests	Coverage
…	…	…

History

Date	Event	Details
2026-03-04	Created	Inspired by Nexie’s complexity routing and go-command pattern
2026-03-04	Updated	Added override flags and duration estimates (inspired by Nexie)
2026-03-12	Merged	FR-045 merged into FR-045 — eliminated duplicate

References

FR-029 (Approve & Implement Skills) — /implement triggers complexity routing
FR-043 (Custom Agents) — agents are execution targets for pipeline and team modes
FR-009 (Python Project Scaffold) — code infrastructure prerequisite
FR-046 (Job Registry & Priority Queue) — dispatched/routed jobs feed into the queue
FR-031 (Workflow State Machine) — state transitions triggered by /go completion
vault/10_features/01_ideas/opus/complexity-based-routing.md — original idea file
Nexie (Sven Hennig) — original inspiration for complexity-based routing

Opus Vault

Explorer

Complexity Routing + Go Command

Decisions

User Tasks

Summary

Problem / Motivation

Proposed Solution

Complexity Scoring

Tier Routing

Quality Gate (Pipeline Mode)

`/go` Skill

Open Questions

1. Scoring Method

2. Tier Thresholds

3. Quality Gate Threshold

4. Agent Stuck Handling

Phase Overview

Phase 1: Complexity Scoring Formula —

Phase 2: Auto-Scoring + Routing via /go and /implement —

Phase 3: Full Auto-Routing with Pipeline and Team Modes —

Prerequisites / Gap Analysis

Requirements

Current State

Gap (What’s missing?)

Test

Manual tests

AI-verified tests

E2E tests

Integration tests

Unit tests

History

References

Graph View

Table of Contents

Backlinks

Opus Vault

Explorer

Complexity Routing + Go Command

Decisions

User Tasks

Summary

Problem / Motivation

Proposed Solution

Complexity Scoring

Tier Routing

Quality Gate (Pipeline Mode)

/go Skill

Open Questions

1. Scoring Method

2. Tier Thresholds

3. Quality Gate Threshold

4. Agent Stuck Handling

Phase Overview

Phase 1: Complexity Scoring Formula —

Phase 2: Auto-Scoring + Routing via /go and /implement —

Phase 3: Full Auto-Routing with Pipeline and Team Modes —

Prerequisites / Gap Analysis

Requirements

Current State

Gap (What’s missing?)

Test

Manual tests

AI-verified tests

E2E tests

Integration tests

Unit tests

History

References

Graph View

Table of Contents

Backlinks

`/go` Skill