Job Registry and Priority Queue

Decisions

Pending: Storage backend — JSON file, SQLite, or in-memory?
Pending: Maximum concurrent agents?
Pending: Priority formula — fixed categories or weighted scoring?
Pending: Starvation prevention threshold — how quickly should low-priority jobs escalate?

User Tasks

Summary

Build a central job registry that tracks all work in the system and a priority queue that dispatches jobs to available agents based on urgency, dependencies, and resource availability.

Problem / Motivation

Opus currently works one task at a time in a single session. As the system grows with multiple agents (FR-043), complexity routing (FR-045), and automated triggers (FR-050), tasks will pile up and need centralized management. Without a job registry:

There is no visibility into what is queued, running, or completed
No way to prioritize urgent work over background tasks
No dependency tracking between jobs
No concurrent execution — agents sit idle while work waits
User messages compete with automated tasks without priority differentiation

A job registry and priority queue solve this by becoming the central nervous system for all work in Opus.

Proposed Solution

Build a Python module with two core components:

Job Registry — A persistent store tracking every piece of work:

Job ID, type, source (user, automated, agent-spawned)
Status: queued, running, completed, failed, blocked
Assigned agent (if running)
Created, started, completed timestamps
Dependencies (job B can’t start until job A finishes)
Result/output summary

Priority Queue — Intelligent job ordering:

Human messages always jump to the front — interactive requests take absolute priority
Priority tiers: critical (user-interactive) > urgent (bug fixes) > normal (features) > low (maintenance, cleanup)
Dependency-aware: jobs with unsatisfied dependencies stay blocked regardless of priority
Starvation prevention: low-priority jobs gradually increase in priority over time so they eventually get processed
Concurrency management: dispatch jobs to idle agents, respect max-concurrent limits

What This Enables:

Kick off multiple features before bed, wake up to all completed
While one agent implements FR-040, another reviews FR-050, and a third researches FR-039
The system self-manages workload without manual scheduling
Full audit trail of all work performed

Open Questions

1. Storage Backend

Question: Where should job data be persisted?

Option	Description
A) JSON file	Simple, human-readable, easy to debug. May have concurrency issues with multiple writers
B) SQLite	Built into Python, proper querying, handles concurrent reads, single-file storage
C) In-memory only	Fastest, but data lost on restart. Only suitable if jobs are short-lived

Recommendation: Option B — SQLite is the sweet spot. Built-in, fast, queryable, persists across restarts, handles concurrent access.

Decision:

2. Maximum Concurrent Agents

Question: How many agents should run simultaneously?

Option	Description
A) 1 (sequential)	Simplest, no coordination needed. Defeats the purpose for Phase 3
B) 3-4 concurrent	Good parallelism without overwhelming system resources
C) Unlimited	Maximum throughput, but could exhaust API limits or system resources

Recommendation: Option B — start with 3, make it configurable. Respect API rate limits.

Decision:

3. Priority Formula

Question: How should job priority be calculated?

Option	Description
A) Tier-based with aging	Fixed tiers (critical/urgent/normal/low) + time-based priority boost. Simple and predictable
B) Weighted scoring	Multi-factor score (urgency * impact * age * dependencies). More nuanced but harder to debug
C) User-assigned only	User manually sets priority for each job. Maximum control, minimum automation

Recommendation: Option A — tier-based is transparent and easy to reason about. Aging prevents starvation naturally.

Decision:

Phase Overview

Phase	Description	Status
Phase 1	Job data model + registry	—
Phase 2	Priority scoring + queue	—
Phase 3	Concurrent dispatch	—
Phase 4	Monitoring dashboard	—

Phase 1: Job Data Model + Registry —

Goal: Define the job data model and build a persistent registry for tracking all work.

File / Feature	Details	Owner	Status
`src/opus/jobs/models.py`	Job dataclass: id, type, source, status, agent, timestamps, dependencies, result	opus	—
`src/opus/jobs/registry.py`	Registry class: create, update, query, list jobs. Persistent storage backend	opus	—
`src/opus/jobs/storage.py`	Storage adapter (SQLite or JSON, based on decision)	opus	—
Job status transitions	Define valid status transitions: queued → running → completed/failed, queued → blocked → queued	opus	—
Unit tests	CRUD operations, status transitions, persistence across restarts	mv	—

Phase 2: Priority Scoring + Queue —

Goal: Implement priority-based job ordering with starvation prevention.

File / Feature	Details	Owner	Status
`src/opus/jobs/priority.py`	Priority tiers, scoring logic, aging algorithm	opus	—
`src/opus/jobs/queue.py`	Priority queue: enqueue, dequeue (highest priority non-blocked job), peek	opus	—
Dependency tracking	Jobs blocked by incomplete dependencies stay in queue but are not dispatched	opus	—
Starvation prevention	Low-priority jobs gain priority points over time	opus	—
Human message priority	Interactive user messages always get critical priority	mv	—
Unit tests	Priority ordering, aging, dependency blocking, human message preemption	mv	—

Phase 3: Concurrent Dispatch —

Goal: Dispatch jobs to multiple agents in parallel, managing agent availability and resource limits.

File / Feature	Details	Owner	Status
`src/opus/jobs/dispatcher.py`	Dispatch engine: assign jobs to idle agents, track agent availability	opus	—
Agent pool management	Track which agents are busy/idle, respect max-concurrent limit	opus	—
Job completion handling	On agent completion: update registry, check if blocked jobs are now unblocked	opus	—
Failure handling	On agent failure: mark job failed, retry policy, escalation to user	mv	—
Integration with FR-043 agents	Wire dispatcher to actual agent invocations	opus	—
Integration tests	Multi-job dispatch, concurrent execution, dependency resolution	mv	—

Phase 4: Monitoring Dashboard —

Goal: Provide visibility into the job system state for the user.

File / Feature	Details	Owner	Status
`/status jobs` query	Show current queue state: running, queued, completed, failed	opus	—
`vault/00_system/dashboards/job-dashboard.md`	Auto-generated dashboard: recent jobs, agent utilization, throughput stats	opus	—
Job history	Query completed jobs by date range, agent, type	opus	—
Performance metrics	Average job duration, success rate, queue wait time	opus	—

Prerequisites / Gap Analysis

Requirements

Requirement	Description
REQ-1	Python project scaffold (FR-009) — jobs module is Python code
REQ-2	Custom agents (FR-043) — dispatcher dispatches to agents

Current State

Component	Status	Details
Python scaffold	—	FR-009 not started
Agent system	—	FR-043 not started
Job tracking	—	No centralized job management exists

Gap (What’s missing?)

Gap	Effort	Blocker?
Python scaffold (FR-009)	Med	Yes — code needs a home
Agent system (FR-043)	High	Yes for Phase 3 — dispatch needs agents
Storage backend decision	Low	No — Phase 1 can start with either option

Test

Manual tests

Test	Expected	Actual	Last
Human message preempts queued jobs	User message jumps to front	pending	-

AI-verified tests

Scenario	Expected behavior	Verification method
…	…	…

E2E tests

Scenario	Assertion
…	…

Integration tests

Component	Coverage
…	…

Unit tests

Component	Tests	Coverage
…	…	…

History

Date	Event	Details
2026-03-04	Created	Inspired by Nexie’s job registry and priority queue architecture

References

FR-009 (Python Project Scaffold) — code infrastructure prerequisite
FR-043 (Custom Agents) — agents are the workers that execute dispatched jobs
FR-045 (Complexity Routing) — routing feeds scored jobs into the queue
FR-031 (Workflow State Machine) — state machine triggers job creation on transitions
FR-050 (Proactive Monitoring) — monitoring creates jobs when issues detected
Nexie (Sven Hennig) — original inspiration for centralized job management

Opus Vault

Explorer

Job Registry and Priority Queue

Decisions

User Tasks

Summary

Problem / Motivation

Proposed Solution

Open Questions

1. Storage Backend

2. Maximum Concurrent Agents

3. Priority Formula

Phase Overview

Phase 1: Job Data Model + Registry —

Phase 2: Priority Scoring + Queue —

Phase 3: Concurrent Dispatch —

Phase 4: Monitoring Dashboard —

Prerequisites / Gap Analysis

Requirements

Current State

Gap (What’s missing?)

Test

Manual tests

AI-verified tests

E2E tests

Integration tests

Unit tests

History

References

Graph View

Table of Contents

Backlinks