Decisions
- Pending: Docker required from Phase 1 or only Phase 2?
- Pending: How to handle shared dependencies (node_modules, venvs) across worktrees?
- Pending: Timeout enforcement — hard kill or graceful shutdown?
User Tasks
Summary
Build a sandboxed execution environment where coding agents can safely run code, git operations, and tests without affecting the main working tree or other agents.
Problem / Motivation
When multiple agents work concurrently on different FRs, they need isolated environments. Without sandboxing:
- Agents write to the same files, causing conflicts and corruption
- A runaway test suite or infinite loop can hang the entire system
- Malicious or buggy generated code can damage the repository or host system
- No way to enforce resource limits (CPU, memory, time)
- Agent A’s half-finished changes pollute Agent B’s working directory
Git worktrees provide lightweight isolation for file changes. Docker provides process-level isolation for untrusted execution.
Proposed Solution
A src/opus/sandbox/ module with layered isolation:
- Worktree Manager: Creates/destroys git worktrees for each agent task. Each agent gets its own directory with its own branch.
- Docker Runner (Phase 2): Wraps code execution in a Docker container with resource limits, network restrictions, and a mounted worktree.
- Lifecycle Manager: Handles creation, timeout enforcement, cleanup, and resource reclamation.
Open Questions
1. Docker Requirement Timing
Question: Should Docker be required from Phase 1 or introduced in Phase 2?
| Option | Description |
|---|---|
| A) Worktrees only in Phase 1 | Git worktrees provide file isolation. Simpler to implement, no Docker dependency. Sufficient for trusted self-generated code |
| B) Docker from Phase 1 | Full isolation immediately. Higher setup cost, but safer from day one |
| C) No Docker ever | Rely on worktrees + OS-level process limits only. Simpler but less secure |
Recommendation: Option A — worktrees are sufficient for Phase 1 where agents run trusted code in a controlled environment. Docker adds security for untrusted or external code later.
Decision:
2. Shared Dependencies
Question: How to handle large dependency directories (node_modules, .venv) across worktrees?
| Option | Description |
|---|---|
| A) Copy per worktree | Simple but wastes disk space and time |
| B) Symlink shared deps | Worktrees share a common dependency cache via symlinks. Fast, space-efficient |
| C) Mount read-only in Docker | Dependencies mounted read-only from a shared volume |
Recommendation: Option B for Phase 1 (worktree mode). Option C when Docker is introduced.
Decision:
3. Timeout Enforcement
Question: How to handle processes that exceed the time limit?
| Option | Description |
|---|---|
| A) Graceful shutdown (SIGTERM → wait → SIGKILL) | Gives process time to clean up. Risk of hanging during grace period |
| B) Hard kill after timeout | Immediate SIGKILL after timeout. No cleanup but guaranteed termination |
| C) Configurable per task | Let the job specify graceful or hard timeout |
Recommendation: Option B for safety — a stuck agent should not block the system. Cleanup is handled by the lifecycle manager after kill.
Decision:
Phase Overview
| Phase | Description | Status |
|---|---|---|
| Phase 1 | Git worktree isolation | — |
| Phase 2 | Docker-based sandboxing | — |
| Phase 3 | Resource limits, timeout enforcement, cleanup | — |
Phase 1: Git Worktree Isolation —
Goal: Each agent task gets its own git worktree with a dedicated branch. Full file isolation without Docker.
| File / Feature | Details | Owner | Status |
|---|---|---|---|
src/opus/sandbox/worktree.py | WorktreeManager: create, list, destroy worktrees. Maps agent/job ID to worktree path | opus | — |
src/opus/sandbox/models.py | Sandbox dataclass: id, worktree path, branch name, agent id, created timestamp, status | opus | — |
| Branch naming | Convention: agent/<agent-id>/<fr-id> (e.g., agent/coder/FR-054) | opus | — |
| Worktree directory | .opus-worktrees/<job-id>/ in a configurable location outside the main repo | opus | — |
| Cleanup on completion | Destroy worktree and delete branch after job completion or failure | opus | — |
| Unit tests | Create worktree, verify isolation, destroy, verify cleanup | mv | — |
Phase 2: Docker-Based Sandboxing —
Goal: Wrap code execution in Docker containers for process-level isolation and security.
| File / Feature | Details | Owner | Status |
|---|---|---|---|
src/opus/sandbox/docker.py | DockerRunner: build/pull image, run command in container, capture output | opus | — |
src/opus/sandbox/Dockerfile | Base image with Python, Node, common tools. Minimal attack surface | opus | — |
| Worktree mount | Mount agent’s worktree as a volume in the container | opus | — |
| Network isolation | No network access by default. Opt-in for dependency installation | opus | — |
| Output capture | Stdout/stderr captured and returned to the agent | opus | — |
| Integration tests | Run code in container, verify isolation, verify output capture | mv | — |
Phase 3: Resource Limits + Lifecycle —
Goal: Enforce CPU, memory, and time limits. Automatic cleanup of stale sandboxes.
| File / Feature | Details | Owner | Status |
|---|---|---|---|
src/opus/sandbox/limits.py | Resource limit configuration: max CPU, max memory, max execution time | opus | — |
src/opus/sandbox/lifecycle.py | Lifecycle manager: track active sandboxes, enforce timeouts, periodic cleanup of orphans | opus | — |
| Timeout enforcement | Kill process/container after configured timeout (default: 10 min) | opus | — |
| Memory limits | Docker --memory flag, or OS-level ulimit for non-Docker mode | opus | — |
| Stale sandbox cleanup | Background task: destroy worktrees/containers older than threshold (default: 1 hour) | opus | — |
| Health check | Verify sandbox is responsive, kill if unresponsive | mv | — |
| Unit tests | Timeout triggers kill, stale cleanup works, resource limits enforced | mv | — |
Prerequisites / Gap Analysis
Requirements
| Requirement | Description |
|---|---|
| REQ-0 | Coding agency design doc reviewed and approved |
| REQ-1 | Python project scaffold (FR-009) — this is a Python module |
| REQ-2 | Git installed and accessible from Python (subprocess) |
| REQ-3 | Docker installed on host (Phase 2 only) |
Current State
| Component | Status | Details |
|---|---|---|
| Python scaffold | — | FR-009 not started |
| Git worktrees | available | Git supports worktrees natively |
| Docker | available | Installed on dev machine, needed on VPS for production |
Gap (What’s missing?)
| Gap | Effort | Blocker? |
|---|---|---|
| Python scaffold (FR-009) | Med | Yes — code needs a home |
| Docker on VPS (FR-019) | Med | Only for Phase 2 production use |
Test
Manual tests
| Test | Expected | Actual | Last |
|---|---|---|---|
| … | … | pending | - |
AI-verified tests
| Scenario | Expected behavior | Verification method |
|---|---|---|
| … | … | … |
E2E tests
| Scenario | Assertion |
|---|---|
| … | … |
Integration tests
| Component | Coverage |
|---|---|
| … | … |
Unit tests
| Component | Tests | Coverage |
|---|---|---|
| … | … | … |
History
| Date | Event | Details |
|---|---|---|
| 2026-03-12 | Created | Part of autonomous coding agency architecture |
References
- FR-009 (Python Project Scaffold) — code infrastructure prerequisite
- FR-043 (Custom Agents) — agents are the primary users of sandboxes
- FR-056 (Autonomous Coding Orchestrator) — orchestrator creates sandboxes for dispatched jobs
- FR-058 (Agent Git Workflow) — git operations happen inside sandboxed worktrees
- FR-019 (VPS Deployment) — production sandboxing requires Docker on the server
vault/00_system/designs/drafts/autonomous-coding-agency.md— architecture overview