Decisions

  • Pending: Should scoring be heuristic-only, LLM-assisted, or hybrid?
  • Pending: What are the exact threshold boundaries between tiers?
  • Pending: Should the quality gate confidence threshold be configurable?
  • Pending: How to handle disagreement between advocate and reviewer agents?
  • Pending: What happens when an agent gets stuck — auto-escalate to user, retry, or hand off to another agent?
  • Pending: Should /go require explicit user confirmation before dispatching, or fire immediately?

User Tasks


Summary

Auto-analyze FR complexity when /go FR-XXX or /implement FR-XXX is called and route to the optimal execution strategy — single agent, pipeline with quality gate, or agent team — based on a numeric complexity score.

Problem / Motivation

Currently, the user must manually decide how to implement each feature: which agent to use, whether to review first, whether tasks can be parallelized. This requires orchestration knowledge and creates cognitive overhead. As Opus grows with more agents and more complex features, this manual routing becomes a bottleneck.

Not every feature request requires the same execution approach. Simple tasks get over-engineered, complex tasks get under-resourced. Manual routing is unsustainable at scale.

Inspired by Nexie (Sven Hennig’s personal AI system), complexity routing removes this burden. The system reads a spec, scores it, picks the right strategy, and dispatches work — all from one command.

Proposed Solution

Build a three-part system:

Complexity Scoring

Analyze the FR spec and produce a numeric score (0-100) based on:

CriterionWeightDescription
Phase count20More phases = more complexity
Files affected20Count of files in phase tables
Cross-cutting concerns20Does the FR touch multiple systems? (skills + agents + vault + src)
Dependency count15Number of prerequisite FRs
New vs existing code15Creating new files/modules scores higher than modifying existing
Open questions count10More unresolved questions = more uncertainty

Tier Routing

TierScore RangeStrategyDescription
Simple0-24Single agentOne Claude Code session handles everything end-to-end
Medium25-49Pipeline + Quality GatePlan Review (quality gate) Implement. Advocate/reviewer agents must give confidence >= threshold
Complex50+Agent teamLead agent coordinates Builder, Tester, and Reviewer agents working in parallel

Quality Gate (Pipeline Mode)

For medium-complexity FRs, the pipeline works as:

  1. Plan: Generate detailed implementation plan from FR spec
  2. Review: Advocate agent reviews plan, scores confidence (0-100)
  3. Gate check: If confidence >= threshold (default 75), proceed to implementation. If below, surface concerns to user
  4. Implement: Execute the approved plan

/go Skill

A single command that reads the FR, scores it, picks the strategy, dispatches work, and reports back on completion or blockers.

Override flags:

  • --team — force multi-agent team execution regardless of complexity score
  • --single / --simple — force single-agent execution
  • --phases 1,2 — limit implementation to specific phases only
  • --specialist <agent> — route to a specific agent
  • Default (no flag) — auto-route based on complexity score

Duration estimates: The system tracks execution times per complexity tier and reports estimated duration when dispatching:

  • Simple: ~5-10 min
  • Medium: ~15-20 min
  • Complex: ~20-30 min These estimates are refined over time based on actual execution data.

Open Questions

1. Scoring Method

Question: Should complexity scoring be heuristic-only, LLM-assisted, or hybrid?

OptionDescription
A) Heuristic-onlyCount phases, files, dependencies algorithmically. Fast, deterministic, transparent
B) LLM-assistedHave Claude read the spec and output a score with reasoning. More nuanced but slower
C) HybridHeuristic as base score, LLM adjusts for nuance (e.g., “this looks simple but has hidden complexity”)

Recommendation: Option A for Phase 1 — fast and deterministic. Option C for Phase 2 — LLM refinement for edge cases.

Decision:

2. Tier Thresholds

Question: What score ranges should map to each tier?

OptionDescription
A) Fixed: 0-24 / 25-49 / 50+Simple, predictable, easy to understand
B) Narrower middle: 0-19 / 20-39 / 40+More tasks route to pipeline/team
C) ConfigurableUser sets thresholds in config file

Recommendation: Option A — start with wide, predictable ranges. Narrow based on experience.

Decision:

3. Quality Gate Threshold

Question: What confidence score should the quality gate require?

OptionDescription
A) 60 — lenientMost plans pass. Risk of low-quality implementations
B) 75 — balancedGood plans pass, questionable ones flagged for user review
C) 90 — strictOnly high-confidence plans pass. May bottleneck on quality gate

Recommendation: Option B — 75 balances throughput with quality. Can be adjusted per-project.

Decision:

4. Agent Stuck Handling

Question: What happens when a dispatched agent hits a blocker?

OptionDescription
A) Escalate to userPause work, notify user with context, wait for input
B) Auto-retry with different approachAgent retries up to N times with adjusted prompt
C) Hand off to lead agentA lead agent triages the blocker and reassigns

Recommendation: Option A for Phase 2 — keep the user in control. Option C is the end goal for Phase 3.

Decision:


Phase Overview

PhaseDescriptionStatus
Phase 1Complexity scoring formula + manual tier assignment
Phase 2Auto-scoring + routing via /go and /implement
Phase 3Full auto-routing with pipeline and team modes

Phase 1: Complexity Scoring Formula —

Goal: Define and implement the scoring algorithm that analyzes an FR spec and outputs a complexity score with breakdown.

File / FeatureDetailsOwnerStatus
src/opus/routing/complexity.pyComplexity scorer — parse FR spec, count factors, apply weights, return scoreopus
src/opus/routing/criteria.pyScoring criteria definitions: phase count, file count, dependency count, etc.opus
Score output formatReturn score + breakdown as structured data (e.g., {total: 42, phases: 12, files: 10, ...})opus
Manual tier assignmentGiven a score, output the tier label (simple/medium/complex)mv
CalibrationScore all existing FRs, verify distribution makes sense, adjust weights if neededmv
Unit testsTest scoring against sample FRs with known expected scoresmv

Phase 2: Auto-Scoring + Routing via /go and /implement —

Goal: Integrate complexity scoring with /go and /implement skills so scoring and routing happen automatically.

File / FeatureDetailsOwnerStatus
src/opus/routing/router.pyStrategy selection logic — score thresholds to strategy mappingopus
src/opus/routing/dispatcher.pyDispatch engine — invoke single agent, pipeline, or agent team based on strategyopus
.claude/skills/go/SKILL.mdSkill definition for /go commandopus
/implement integrationCall complexity scorer when /implement FR-XXX is invokedopus
Score displayShow complexity score and recommended tier to user before proceedingmv
Override flagsSupport --simple, --team, --phases, --specialist to bypass auto-routingopus
Frontmatter updateWrite complexity-score to FR frontmatter after scoringopus
Blocker handlingDetect when agent is stuck, escalate to usermv

Phase 3: Full Auto-Routing with Pipeline and Team Modes —

Goal: Implement the pipeline (quality gate) and agent team execution strategies, wired end-to-end.

File / FeatureDetailsOwnerStatus
src/opus/routing/pipeline.pyPipeline executor: Plan Review Gate check Implementmv
src/opus/routing/team.pyTeam executor: Lead agent coordinates Builder, Tester, Reviewermv
Quality gateAdvocate/reviewer agent scores plan confidence, gate check at thresholdmv
Gate failure handlingIf confidence < threshold, surface concerns to user with specific issuesmv
Agent coordinationLead agent breaks work into tasks, assigns to Builder/Tester/Reviewermv
Progress trackingTrack pipeline/team progress, report status to usermv
End-to-end flowRead spec score route dispatch reportopus
Status reportingReport progress, completion, or blockers back to usermv
Integration testsTest full flow: score route execute for each tiermv

Prerequisites / Gap Analysis

Requirements

RequirementDescription
REQ-1Python project scaffold (FR-009) — scoring and routing are Python modules
REQ-2/implement skill (FR-029) — routing is triggered by /implement
REQ-3Custom agents (FR-043) — pipeline and team modes need agent definitions

Current State

ComponentStatusDetails
Python scaffoldFR-009 not started
/implement skillFR-029 not started
Custom agentsFR-043 not started
Skill systemdone.claude/skills/ works

Gap (What’s missing?)

GapEffortBlocker?
Python scaffold (FR-009)MedYes — scoring is Python code
/implement skill (FR-029)MedYes — routing is triggered by /implement
Custom agents (FR-043)HighOnly for Phase 3

Test

Manual tests

TestExpectedOwnerActualLast
Simple FR (1 phase, 2 files, 0 deps) scores < 25Tier: simpleopuspending-
Medium FR (2 phases, 5 files, 2 deps) scores 25-49Tier: mediumopuspending-
Complex FR (4 phases, 10+ files, 4 deps, cross-cutting) scores 50+Tier: complexopuspending-
Router selects correct strategy for each score rangeStrategies match thresholdsopuspending-
--simple override forces single agent for complex FRSingle agent dispatchedopuspending-
--team override forces agent team for simple FRAgent team dispatchedopuspending-
Pipeline quality gate blocks plan with confidence < 75Plan flagged, user notifiedopuspending-
Pipeline quality gate passes plan with confidence >= 75Implementation proceedsopuspending-
Score written to FR frontmatter after scoringcomplexity-score field presentopuspending-
/go FR-XXX executes end-to-endWork dispatched and completed or blocker reportedopuspending-
Agent stuck detection triggers escalationUser notified with contextopuspending-

AI-verified tests

ScenarioExpected behaviorVerification method

E2E tests

ScenarioAssertion

Integration tests

ComponentCoverage

Unit tests

ComponentTestsCoverage

History

DateEventDetails
2026-03-04CreatedInspired by Nexie’s complexity routing and go-command pattern
2026-03-04UpdatedAdded override flags and duration estimates (inspired by Nexie)
2026-03-12MergedFR-045 merged into FR-045 — eliminated duplicate

References

  • FR-029 (Approve & Implement Skills) — /implement triggers complexity routing
  • FR-043 (Custom Agents) — agents are execution targets for pipeline and team modes
  • FR-009 (Python Project Scaffold) — code infrastructure prerequisite
  • FR-046 (Job Registry & Priority Queue) — dispatched/routed jobs feed into the queue
  • FR-031 (Workflow State Machine) — state transitions triggered by /go completion
  • vault/10_features/01_ideas/opus/complexity-based-routing.md — original idea file
  • Nexie (Sven Hennig) — original inspiration for complexity-based routing