Decisions

  • Pending: Docker required from Phase 1 or only Phase 2?
  • Pending: How to handle shared dependencies (node_modules, venvs) across worktrees?
  • Pending: Timeout enforcement — hard kill or graceful shutdown?

User Tasks


Summary

Build a sandboxed execution environment where coding agents can safely run code, git operations, and tests without affecting the main working tree or other agents.

Problem / Motivation

When multiple agents work concurrently on different FRs, they need isolated environments. Without sandboxing:

  • Agents write to the same files, causing conflicts and corruption
  • A runaway test suite or infinite loop can hang the entire system
  • Malicious or buggy generated code can damage the repository or host system
  • No way to enforce resource limits (CPU, memory, time)
  • Agent A’s half-finished changes pollute Agent B’s working directory

Git worktrees provide lightweight isolation for file changes. Docker provides process-level isolation for untrusted execution.

Proposed Solution

A src/opus/sandbox/ module with layered isolation:

  1. Worktree Manager: Creates/destroys git worktrees for each agent task. Each agent gets its own directory with its own branch.
  2. Docker Runner (Phase 2): Wraps code execution in a Docker container with resource limits, network restrictions, and a mounted worktree.
  3. Lifecycle Manager: Handles creation, timeout enforcement, cleanup, and resource reclamation.

Open Questions

1. Docker Requirement Timing

Question: Should Docker be required from Phase 1 or introduced in Phase 2?

OptionDescription
A) Worktrees only in Phase 1Git worktrees provide file isolation. Simpler to implement, no Docker dependency. Sufficient for trusted self-generated code
B) Docker from Phase 1Full isolation immediately. Higher setup cost, but safer from day one
C) No Docker everRely on worktrees + OS-level process limits only. Simpler but less secure

Recommendation: Option A — worktrees are sufficient for Phase 1 where agents run trusted code in a controlled environment. Docker adds security for untrusted or external code later.

Decision:

2. Shared Dependencies

Question: How to handle large dependency directories (node_modules, .venv) across worktrees?

OptionDescription
A) Copy per worktreeSimple but wastes disk space and time
B) Symlink shared depsWorktrees share a common dependency cache via symlinks. Fast, space-efficient
C) Mount read-only in DockerDependencies mounted read-only from a shared volume

Recommendation: Option B for Phase 1 (worktree mode). Option C when Docker is introduced.

Decision:

3. Timeout Enforcement

Question: How to handle processes that exceed the time limit?

OptionDescription
A) Graceful shutdown (SIGTERM → wait → SIGKILL)Gives process time to clean up. Risk of hanging during grace period
B) Hard kill after timeoutImmediate SIGKILL after timeout. No cleanup but guaranteed termination
C) Configurable per taskLet the job specify graceful or hard timeout

Recommendation: Option B for safety — a stuck agent should not block the system. Cleanup is handled by the lifecycle manager after kill.

Decision:


Phase Overview

PhaseDescriptionStatus
Phase 1Git worktree isolation
Phase 2Docker-based sandboxing
Phase 3Resource limits, timeout enforcement, cleanup

Phase 1: Git Worktree Isolation —

Goal: Each agent task gets its own git worktree with a dedicated branch. Full file isolation without Docker.

File / FeatureDetailsOwnerStatus
src/opus/sandbox/worktree.pyWorktreeManager: create, list, destroy worktrees. Maps agent/job ID to worktree pathopus
src/opus/sandbox/models.pySandbox dataclass: id, worktree path, branch name, agent id, created timestamp, statusopus
Branch namingConvention: agent/<agent-id>/<fr-id> (e.g., agent/coder/FR-054)opus
Worktree directory.opus-worktrees/<job-id>/ in a configurable location outside the main repoopus
Cleanup on completionDestroy worktree and delete branch after job completion or failureopus
Unit testsCreate worktree, verify isolation, destroy, verify cleanupmv

Phase 2: Docker-Based Sandboxing —

Goal: Wrap code execution in Docker containers for process-level isolation and security.

File / FeatureDetailsOwnerStatus
src/opus/sandbox/docker.pyDockerRunner: build/pull image, run command in container, capture outputopus
src/opus/sandbox/DockerfileBase image with Python, Node, common tools. Minimal attack surfaceopus
Worktree mountMount agent’s worktree as a volume in the containeropus
Network isolationNo network access by default. Opt-in for dependency installationopus
Output captureStdout/stderr captured and returned to the agentopus
Integration testsRun code in container, verify isolation, verify output capturemv

Phase 3: Resource Limits + Lifecycle —

Goal: Enforce CPU, memory, and time limits. Automatic cleanup of stale sandboxes.

File / FeatureDetailsOwnerStatus
src/opus/sandbox/limits.pyResource limit configuration: max CPU, max memory, max execution timeopus
src/opus/sandbox/lifecycle.pyLifecycle manager: track active sandboxes, enforce timeouts, periodic cleanup of orphansopus
Timeout enforcementKill process/container after configured timeout (default: 10 min)opus
Memory limitsDocker --memory flag, or OS-level ulimit for non-Docker modeopus
Stale sandbox cleanupBackground task: destroy worktrees/containers older than threshold (default: 1 hour)opus
Health checkVerify sandbox is responsive, kill if unresponsivemv
Unit testsTimeout triggers kill, stale cleanup works, resource limits enforcedmv

Prerequisites / Gap Analysis

Requirements

RequirementDescription
REQ-0Coding agency design doc reviewed and approved
REQ-1Python project scaffold (FR-009) — this is a Python module
REQ-2Git installed and accessible from Python (subprocess)
REQ-3Docker installed on host (Phase 2 only)

Current State

ComponentStatusDetails
Python scaffoldFR-009 not started
Git worktreesavailableGit supports worktrees natively
DockeravailableInstalled on dev machine, needed on VPS for production

Gap (What’s missing?)

GapEffortBlocker?
Python scaffold (FR-009)MedYes — code needs a home
Docker on VPS (FR-019)MedOnly for Phase 2 production use

Test

Manual tests

TestExpectedActualLast
pending-

AI-verified tests

ScenarioExpected behaviorVerification method

E2E tests

ScenarioAssertion

Integration tests

ComponentCoverage

Unit tests

ComponentTestsCoverage

History

DateEventDetails
2026-03-12CreatedPart of autonomous coding agency architecture

References

  • FR-009 (Python Project Scaffold) — code infrastructure prerequisite
  • FR-043 (Custom Agents) — agents are the primary users of sandboxes
  • FR-056 (Autonomous Coding Orchestrator) — orchestrator creates sandboxes for dispatched jobs
  • FR-058 (Agent Git Workflow) — git operations happen inside sandboxed worktrees
  • FR-019 (VPS Deployment) — production sandboxing requires Docker on the server
  • vault/00_system/designs/drafts/autonomous-coding-agency.md — architecture overview