Decisions

  • Pending: Storage backend — JSON file, SQLite, or in-memory?
  • Pending: Maximum concurrent agents?
  • Pending: Priority formula — fixed categories or weighted scoring?
  • Pending: Starvation prevention threshold — how quickly should low-priority jobs escalate?

User Tasks


Summary

Build a central job registry that tracks all work in the system and a priority queue that dispatches jobs to available agents based on urgency, dependencies, and resource availability.

Problem / Motivation

Opus currently works one task at a time in a single session. As the system grows with multiple agents (FR-043), complexity routing (FR-045), and automated triggers (FR-050), tasks will pile up and need centralized management. Without a job registry:

  • There is no visibility into what is queued, running, or completed
  • No way to prioritize urgent work over background tasks
  • No dependency tracking between jobs
  • No concurrent execution — agents sit idle while work waits
  • User messages compete with automated tasks without priority differentiation

A job registry and priority queue solve this by becoming the central nervous system for all work in Opus.

Proposed Solution

Build a Python module with two core components:

Job Registry — A persistent store tracking every piece of work:

  • Job ID, type, source (user, automated, agent-spawned)
  • Status: queued, running, completed, failed, blocked
  • Assigned agent (if running)
  • Created, started, completed timestamps
  • Dependencies (job B can’t start until job A finishes)
  • Result/output summary

Priority Queue — Intelligent job ordering:

  • Human messages always jump to the front — interactive requests take absolute priority
  • Priority tiers: critical (user-interactive) > urgent (bug fixes) > normal (features) > low (maintenance, cleanup)
  • Dependency-aware: jobs with unsatisfied dependencies stay blocked regardless of priority
  • Starvation prevention: low-priority jobs gradually increase in priority over time so they eventually get processed
  • Concurrency management: dispatch jobs to idle agents, respect max-concurrent limits

What This Enables:

  • Kick off multiple features before bed, wake up to all completed
  • While one agent implements FR-040, another reviews FR-050, and a third researches FR-039
  • The system self-manages workload without manual scheduling
  • Full audit trail of all work performed

Open Questions

1. Storage Backend

Question: Where should job data be persisted?

OptionDescription
A) JSON fileSimple, human-readable, easy to debug. May have concurrency issues with multiple writers
B) SQLiteBuilt into Python, proper querying, handles concurrent reads, single-file storage
C) In-memory onlyFastest, but data lost on restart. Only suitable if jobs are short-lived

Recommendation: Option B — SQLite is the sweet spot. Built-in, fast, queryable, persists across restarts, handles concurrent access.

Decision:

2. Maximum Concurrent Agents

Question: How many agents should run simultaneously?

OptionDescription
A) 1 (sequential)Simplest, no coordination needed. Defeats the purpose for Phase 3
B) 3-4 concurrentGood parallelism without overwhelming system resources
C) UnlimitedMaximum throughput, but could exhaust API limits or system resources

Recommendation: Option B — start with 3, make it configurable. Respect API rate limits.

Decision:

3. Priority Formula

Question: How should job priority be calculated?

OptionDescription
A) Tier-based with agingFixed tiers (critical/urgent/normal/low) + time-based priority boost. Simple and predictable
B) Weighted scoringMulti-factor score (urgency * impact * age * dependencies). More nuanced but harder to debug
C) User-assigned onlyUser manually sets priority for each job. Maximum control, minimum automation

Recommendation: Option A — tier-based is transparent and easy to reason about. Aging prevents starvation naturally.

Decision:


Phase Overview

PhaseDescriptionStatus
Phase 1Job data model + registry
Phase 2Priority scoring + queue
Phase 3Concurrent dispatch
Phase 4Monitoring dashboard

Phase 1: Job Data Model + Registry —

Goal: Define the job data model and build a persistent registry for tracking all work.

File / FeatureDetailsOwnerStatus
src/opus/jobs/models.pyJob dataclass: id, type, source, status, agent, timestamps, dependencies, resultopus
src/opus/jobs/registry.pyRegistry class: create, update, query, list jobs. Persistent storage backendopus
src/opus/jobs/storage.pyStorage adapter (SQLite or JSON, based on decision)opus
Job status transitionsDefine valid status transitions: queued running completed/failed, queued blocked queuedopus
Unit testsCRUD operations, status transitions, persistence across restartsmv

Phase 2: Priority Scoring + Queue —

Goal: Implement priority-based job ordering with starvation prevention.

File / FeatureDetailsOwnerStatus
src/opus/jobs/priority.pyPriority tiers, scoring logic, aging algorithmopus
src/opus/jobs/queue.pyPriority queue: enqueue, dequeue (highest priority non-blocked job), peekopus
Dependency trackingJobs blocked by incomplete dependencies stay in queue but are not dispatchedopus
Starvation preventionLow-priority jobs gain priority points over timeopus
Human message priorityInteractive user messages always get critical prioritymv
Unit testsPriority ordering, aging, dependency blocking, human message preemptionmv

Phase 3: Concurrent Dispatch —

Goal: Dispatch jobs to multiple agents in parallel, managing agent availability and resource limits.

File / FeatureDetailsOwnerStatus
src/opus/jobs/dispatcher.pyDispatch engine: assign jobs to idle agents, track agent availabilityopus
Agent pool managementTrack which agents are busy/idle, respect max-concurrent limitopus
Job completion handlingOn agent completion: update registry, check if blocked jobs are now unblockedopus
Failure handlingOn agent failure: mark job failed, retry policy, escalation to usermv
Integration with FR-043 agentsWire dispatcher to actual agent invocationsopus
Integration testsMulti-job dispatch, concurrent execution, dependency resolutionmv

Phase 4: Monitoring Dashboard —

Goal: Provide visibility into the job system state for the user.

File / FeatureDetailsOwnerStatus
/status jobs queryShow current queue state: running, queued, completed, failedopus
vault/00_system/dashboards/job-dashboard.mdAuto-generated dashboard: recent jobs, agent utilization, throughput statsopus
Job historyQuery completed jobs by date range, agent, typeopus
Performance metricsAverage job duration, success rate, queue wait timeopus

Prerequisites / Gap Analysis

Requirements

RequirementDescription
REQ-1Python project scaffold (FR-009) — jobs module is Python code
REQ-2Custom agents (FR-043) — dispatcher dispatches to agents

Current State

ComponentStatusDetails
Python scaffoldFR-009 not started
Agent systemFR-043 not started
Job trackingNo centralized job management exists

Gap (What’s missing?)

GapEffortBlocker?
Python scaffold (FR-009)MedYes — code needs a home
Agent system (FR-043)HighYes for Phase 3 — dispatch needs agents
Storage backend decisionLowNo — Phase 1 can start with either option

Test

Manual tests

TestExpectedActualLast
Human message preempts queued jobsUser message jumps to frontpending-

AI-verified tests

ScenarioExpected behaviorVerification method

E2E tests

ScenarioAssertion

Integration tests

ComponentCoverage

Unit tests

ComponentTestsCoverage

History

DateEventDetails
2026-03-04CreatedInspired by Nexie’s job registry and priority queue architecture

References

  • FR-009 (Python Project Scaffold) — code infrastructure prerequisite
  • FR-043 (Custom Agents) — agents are the workers that execute dispatched jobs
  • FR-045 (Complexity Routing) — routing feeds scored jobs into the queue
  • FR-031 (Workflow State Machine) — state machine triggers job creation on transitions
  • FR-050 (Proactive Monitoring) — monitoring creates jobs when issues detected
  • Nexie (Sven Hennig) — original inspiration for centralized job management