Decisions
- Pending: Should scoring be heuristic-only, LLM-assisted, or hybrid?
- Pending: What are the exact threshold boundaries between tiers?
- Pending: Should the quality gate confidence threshold be configurable?
- Pending: How to handle disagreement between advocate and reviewer agents?
- Pending: What happens when an agent gets stuck — auto-escalate to user, retry, or hand off to another agent?
- Pending: Should
/gorequire explicit user confirmation before dispatching, or fire immediately?
User Tasks
Summary
Auto-analyze FR complexity when
/go FR-XXXor/implement FR-XXXis called and route to the optimal execution strategy — single agent, pipeline with quality gate, or agent team — based on a numeric complexity score.
Problem / Motivation
Currently, the user must manually decide how to implement each feature: which agent to use, whether to review first, whether tasks can be parallelized. This requires orchestration knowledge and creates cognitive overhead. As Opus grows with more agents and more complex features, this manual routing becomes a bottleneck.
Not every feature request requires the same execution approach. Simple tasks get over-engineered, complex tasks get under-resourced. Manual routing is unsustainable at scale.
Inspired by Nexie (Sven Hennig’s personal AI system), complexity routing removes this burden. The system reads a spec, scores it, picks the right strategy, and dispatches work — all from one command.
Proposed Solution
Build a three-part system:
Complexity Scoring
Analyze the FR spec and produce a numeric score (0-100) based on:
| Criterion | Weight | Description |
|---|---|---|
| Phase count | 20 | More phases = more complexity |
| Files affected | 20 | Count of files in phase tables |
| Cross-cutting concerns | 20 | Does the FR touch multiple systems? (skills + agents + vault + src) |
| Dependency count | 15 | Number of prerequisite FRs |
| New vs existing code | 15 | Creating new files/modules scores higher than modifying existing |
| Open questions count | 10 | More unresolved questions = more uncertainty |
Tier Routing
| Tier | Score Range | Strategy | Description |
|---|---|---|---|
| Simple | 0-24 | Single agent | One Claude Code session handles everything end-to-end |
| Medium | 25-49 | Pipeline + Quality Gate | Plan → Review (quality gate) → Implement. Advocate/reviewer agents must give confidence >= threshold |
| Complex | 50+ | Agent team | Lead agent coordinates Builder, Tester, and Reviewer agents working in parallel |
Quality Gate (Pipeline Mode)
For medium-complexity FRs, the pipeline works as:
- Plan: Generate detailed implementation plan from FR spec
- Review: Advocate agent reviews plan, scores confidence (0-100)
- Gate check: If confidence >= threshold (default 75), proceed to implementation. If below, surface concerns to user
- Implement: Execute the approved plan
/go Skill
A single command that reads the FR, scores it, picks the strategy, dispatches work, and reports back on completion or blockers.
Override flags:
--team— force multi-agent team execution regardless of complexity score--single/--simple— force single-agent execution--phases 1,2— limit implementation to specific phases only--specialist <agent>— route to a specific agent- Default (no flag) — auto-route based on complexity score
Duration estimates: The system tracks execution times per complexity tier and reports estimated duration when dispatching:
- Simple: ~5-10 min
- Medium: ~15-20 min
- Complex: ~20-30 min These estimates are refined over time based on actual execution data.
Open Questions
1. Scoring Method
Question: Should complexity scoring be heuristic-only, LLM-assisted, or hybrid?
| Option | Description |
|---|---|
| A) Heuristic-only | Count phases, files, dependencies algorithmically. Fast, deterministic, transparent |
| B) LLM-assisted | Have Claude read the spec and output a score with reasoning. More nuanced but slower |
| C) Hybrid | Heuristic as base score, LLM adjusts for nuance (e.g., “this looks simple but has hidden complexity”) |
Recommendation: Option A for Phase 1 — fast and deterministic. Option C for Phase 2 — LLM refinement for edge cases.
Decision:
2. Tier Thresholds
Question: What score ranges should map to each tier?
| Option | Description |
|---|---|
| A) Fixed: 0-24 / 25-49 / 50+ | Simple, predictable, easy to understand |
| B) Narrower middle: 0-19 / 20-39 / 40+ | More tasks route to pipeline/team |
| C) Configurable | User sets thresholds in config file |
Recommendation: Option A — start with wide, predictable ranges. Narrow based on experience.
Decision:
3. Quality Gate Threshold
Question: What confidence score should the quality gate require?
| Option | Description |
|---|---|
| A) 60 — lenient | Most plans pass. Risk of low-quality implementations |
| B) 75 — balanced | Good plans pass, questionable ones flagged for user review |
| C) 90 — strict | Only high-confidence plans pass. May bottleneck on quality gate |
Recommendation: Option B — 75 balances throughput with quality. Can be adjusted per-project.
Decision:
4. Agent Stuck Handling
Question: What happens when a dispatched agent hits a blocker?
| Option | Description |
|---|---|
| A) Escalate to user | Pause work, notify user with context, wait for input |
| B) Auto-retry with different approach | Agent retries up to N times with adjusted prompt |
| C) Hand off to lead agent | A lead agent triages the blocker and reassigns |
Recommendation: Option A for Phase 2 — keep the user in control. Option C is the end goal for Phase 3.
Decision:
Phase Overview
| Phase | Description | Status |
|---|---|---|
| Phase 1 | Complexity scoring formula + manual tier assignment | — |
| Phase 2 | Auto-scoring + routing via /go and /implement | — |
| Phase 3 | Full auto-routing with pipeline and team modes | — |
Phase 1: Complexity Scoring Formula —
Goal: Define and implement the scoring algorithm that analyzes an FR spec and outputs a complexity score with breakdown.
| File / Feature | Details | Owner | Status |
|---|---|---|---|
src/opus/routing/complexity.py | Complexity scorer — parse FR spec, count factors, apply weights, return score | opus | — |
src/opus/routing/criteria.py | Scoring criteria definitions: phase count, file count, dependency count, etc. | opus | — |
| Score output format | Return score + breakdown as structured data (e.g., {total: 42, phases: 12, files: 10, ...}) | opus | — |
| Manual tier assignment | Given a score, output the tier label (simple/medium/complex) | mv | — |
| Calibration | Score all existing FRs, verify distribution makes sense, adjust weights if needed | mv | — |
| Unit tests | Test scoring against sample FRs with known expected scores | mv | — |
Phase 2: Auto-Scoring + Routing via /go and /implement —
Goal: Integrate complexity scoring with /go and /implement skills so scoring and routing happen automatically.
| File / Feature | Details | Owner | Status |
|---|---|---|---|
src/opus/routing/router.py | Strategy selection logic — score thresholds to strategy mapping | opus | — |
src/opus/routing/dispatcher.py | Dispatch engine — invoke single agent, pipeline, or agent team based on strategy | opus | — |
.claude/skills/go/SKILL.md | Skill definition for /go command | opus | — |
/implement integration | Call complexity scorer when /implement FR-XXX is invoked | opus | — |
| Score display | Show complexity score and recommended tier to user before proceeding | mv | — |
| Override flags | Support --simple, --team, --phases, --specialist to bypass auto-routing | opus | — |
| Frontmatter update | Write complexity-score to FR frontmatter after scoring | opus | — |
| Blocker handling | Detect when agent is stuck, escalate to user | mv | — |
Phase 3: Full Auto-Routing with Pipeline and Team Modes —
Goal: Implement the pipeline (quality gate) and agent team execution strategies, wired end-to-end.
| File / Feature | Details | Owner | Status |
|---|---|---|---|
src/opus/routing/pipeline.py | Pipeline executor: Plan → Review → Gate check → Implement | mv | — |
src/opus/routing/team.py | Team executor: Lead agent coordinates Builder, Tester, Reviewer | mv | — |
| Quality gate | Advocate/reviewer agent scores plan confidence, gate check at threshold | mv | — |
| Gate failure handling | If confidence < threshold, surface concerns to user with specific issues | mv | — |
| Agent coordination | Lead agent breaks work into tasks, assigns to Builder/Tester/Reviewer | mv | — |
| Progress tracking | Track pipeline/team progress, report status to user | mv | — |
| End-to-end flow | Read spec → score → route → dispatch → report | opus | — |
| Status reporting | Report progress, completion, or blockers back to user | mv | — |
| Integration tests | Test full flow: score → route → execute for each tier | mv | — |
Prerequisites / Gap Analysis
Requirements
| Requirement | Description |
|---|---|
| REQ-1 | Python project scaffold (FR-009) — scoring and routing are Python modules |
| REQ-2 | /implement skill (FR-029) — routing is triggered by /implement |
| REQ-3 | Custom agents (FR-043) — pipeline and team modes need agent definitions |
Current State
| Component | Status | Details |
|---|---|---|
| Python scaffold | — | FR-009 not started |
| /implement skill | — | FR-029 not started |
| Custom agents | — | FR-043 not started |
| Skill system | done | .claude/skills/ works |
Gap (What’s missing?)
| Gap | Effort | Blocker? |
|---|---|---|
| Python scaffold (FR-009) | Med | Yes — scoring is Python code |
| /implement skill (FR-029) | Med | Yes — routing is triggered by /implement |
| Custom agents (FR-043) | High | Only for Phase 3 |
Test
Manual tests
| Test | Expected | Owner | Actual | Last |
|---|---|---|---|---|
| Simple FR (1 phase, 2 files, 0 deps) scores < 25 | Tier: simple | opus | pending | - |
| Medium FR (2 phases, 5 files, 2 deps) scores 25-49 | Tier: medium | opus | pending | - |
| Complex FR (4 phases, 10+ files, 4 deps, cross-cutting) scores 50+ | Tier: complex | opus | pending | - |
| Router selects correct strategy for each score range | Strategies match thresholds | opus | pending | - |
--simple override forces single agent for complex FR | Single agent dispatched | opus | pending | - |
--team override forces agent team for simple FR | Agent team dispatched | opus | pending | - |
| Pipeline quality gate blocks plan with confidence < 75 | Plan flagged, user notified | opus | pending | - |
| Pipeline quality gate passes plan with confidence >= 75 | Implementation proceeds | opus | pending | - |
| Score written to FR frontmatter after scoring | complexity-score field present | opus | pending | - |
/go FR-XXX executes end-to-end | Work dispatched and completed or blocker reported | opus | pending | - |
| Agent stuck detection triggers escalation | User notified with context | opus | pending | - |
AI-verified tests
| Scenario | Expected behavior | Verification method |
|---|---|---|
| … | … | … |
E2E tests
| Scenario | Assertion |
|---|---|
| … | … |
Integration tests
| Component | Coverage |
|---|---|
| … | … |
Unit tests
| Component | Tests | Coverage |
|---|---|---|
| … | … | … |
History
| Date | Event | Details |
|---|---|---|
| 2026-03-04 | Created | Inspired by Nexie’s complexity routing and go-command pattern |
| 2026-03-04 | Updated | Added override flags and duration estimates (inspired by Nexie) |
| 2026-03-12 | Merged | FR-045 merged into FR-045 — eliminated duplicate |
References
- FR-029 (Approve & Implement Skills) — /implement triggers complexity routing
- FR-043 (Custom Agents) — agents are execution targets for pipeline and team modes
- FR-009 (Python Project Scaffold) — code infrastructure prerequisite
- FR-046 (Job Registry & Priority Queue) — dispatched/routed jobs feed into the queue
- FR-031 (Workflow State Machine) — state transitions triggered by
/gocompletion vault/10_features/01_ideas/opus/complexity-based-routing.md— original idea file- Nexie (Sven Hennig) — original inspiration for complexity-based routing