Sandboxed Code Execution

Decisions

Pending: Docker required from Phase 1 or only Phase 2?
Pending: How to handle shared dependencies (node_modules, venvs) across worktrees?
Pending: Timeout enforcement — hard kill or graceful shutdown?

User Tasks

Summary

Build a sandboxed execution environment where coding agents can safely run code, git operations, and tests without affecting the main working tree or other agents.

Problem / Motivation

When multiple agents work concurrently on different FRs, they need isolated environments. Without sandboxing:

Agents write to the same files, causing conflicts and corruption
A runaway test suite or infinite loop can hang the entire system
Malicious or buggy generated code can damage the repository or host system
No way to enforce resource limits (CPU, memory, time)
Agent A’s half-finished changes pollute Agent B’s working directory

Git worktrees provide lightweight isolation for file changes. Docker provides process-level isolation for untrusted execution.

Proposed Solution

A src/opus/sandbox/ module with layered isolation:

Worktree Manager: Creates/destroys git worktrees for each agent task. Each agent gets its own directory with its own branch.
Docker Runner (Phase 2): Wraps code execution in a Docker container with resource limits, network restrictions, and a mounted worktree.
Lifecycle Manager: Handles creation, timeout enforcement, cleanup, and resource reclamation.

Open Questions

1. Docker Requirement Timing

Question: Should Docker be required from Phase 1 or introduced in Phase 2?

Option	Description
A) Worktrees only in Phase 1	Git worktrees provide file isolation. Simpler to implement, no Docker dependency. Sufficient for trusted self-generated code
B) Docker from Phase 1	Full isolation immediately. Higher setup cost, but safer from day one
C) No Docker ever	Rely on worktrees + OS-level process limits only. Simpler but less secure

Recommendation: Option A — worktrees are sufficient for Phase 1 where agents run trusted code in a controlled environment. Docker adds security for untrusted or external code later.

Decision:

2. Shared Dependencies

Question: How to handle large dependency directories (node_modules, .venv) across worktrees?

Option	Description
A) Copy per worktree	Simple but wastes disk space and time
B) Symlink shared deps	Worktrees share a common dependency cache via symlinks. Fast, space-efficient
C) Mount read-only in Docker	Dependencies mounted read-only from a shared volume

Recommendation: Option B for Phase 1 (worktree mode). Option C when Docker is introduced.

Decision:

3. Timeout Enforcement

Question: How to handle processes that exceed the time limit?

Option	Description
A) Graceful shutdown (SIGTERM → wait → SIGKILL)	Gives process time to clean up. Risk of hanging during grace period
B) Hard kill after timeout	Immediate SIGKILL after timeout. No cleanup but guaranteed termination
C) Configurable per task	Let the job specify graceful or hard timeout

Recommendation: Option B for safety — a stuck agent should not block the system. Cleanup is handled by the lifecycle manager after kill.

Decision:

Phase Overview

Phase	Description	Status
Phase 1	Git worktree isolation	—
Phase 2	Docker-based sandboxing	—
Phase 3	Resource limits, timeout enforcement, cleanup	—

Phase 1: Git Worktree Isolation —

Goal: Each agent task gets its own git worktree with a dedicated branch. Full file isolation without Docker.

File / Feature	Details	Owner	Status
`src/opus/sandbox/worktree.py`	WorktreeManager: create, list, destroy worktrees. Maps agent/job ID to worktree path	opus	—
`src/opus/sandbox/models.py`	`Sandbox` dataclass: id, worktree path, branch name, agent id, created timestamp, status	opus	—
Branch naming	Convention: `agent/<agent-id>/<fr-id>` (e.g., `agent/coder/FR-054`)	opus	—
Worktree directory	`.opus-worktrees/<job-id>/` in a configurable location outside the main repo	opus	—
Cleanup on completion	Destroy worktree and delete branch after job completion or failure	opus	—
Unit tests	Create worktree, verify isolation, destroy, verify cleanup	mv	—

Phase 2: Docker-Based Sandboxing —

Goal: Wrap code execution in Docker containers for process-level isolation and security.

File / Feature	Details	Owner	Status
`src/opus/sandbox/docker.py`	DockerRunner: build/pull image, run command in container, capture output	opus	—
`src/opus/sandbox/Dockerfile`	Base image with Python, Node, common tools. Minimal attack surface	opus	—
Worktree mount	Mount agent’s worktree as a volume in the container	opus	—
Network isolation	No network access by default. Opt-in for dependency installation	opus	—
Output capture	Stdout/stderr captured and returned to the agent	opus	—
Integration tests	Run code in container, verify isolation, verify output capture	mv	—

Phase 3: Resource Limits + Lifecycle —

Goal: Enforce CPU, memory, and time limits. Automatic cleanup of stale sandboxes.

File / Feature	Details	Owner	Status
`src/opus/sandbox/limits.py`	Resource limit configuration: max CPU, max memory, max execution time	opus	—
`src/opus/sandbox/lifecycle.py`	Lifecycle manager: track active sandboxes, enforce timeouts, periodic cleanup of orphans	opus	—
Timeout enforcement	Kill process/container after configured timeout (default: 10 min)	opus	—
Memory limits	Docker `--memory` flag, or OS-level `ulimit` for non-Docker mode	opus	—
Stale sandbox cleanup	Background task: destroy worktrees/containers older than threshold (default: 1 hour)	opus	—
Health check	Verify sandbox is responsive, kill if unresponsive	mv	—
Unit tests	Timeout triggers kill, stale cleanup works, resource limits enforced	mv	—

Prerequisites / Gap Analysis

Requirements

Requirement	Description
REQ-0	Coding agency design doc reviewed and approved
REQ-1	Python project scaffold (FR-009) — this is a Python module
REQ-2	Git installed and accessible from Python (subprocess)
REQ-3	Docker installed on host (Phase 2 only)

Current State

Component	Status	Details
Python scaffold	—	FR-009 not started
Git worktrees	available	Git supports worktrees natively
Docker	available	Installed on dev machine, needed on VPS for production

Gap (What’s missing?)

Gap	Effort	Blocker?
Python scaffold (FR-009)	Med	Yes — code needs a home
Docker on VPS (FR-019)	Med	Only for Phase 2 production use

Test

Manual tests

Test	Expected	Actual	Last
…	…	pending	-

AI-verified tests

Scenario	Expected behavior	Verification method
…	…	…

E2E tests

Scenario	Assertion
…	…

Integration tests

Component	Coverage
…	…

Unit tests

Component	Tests	Coverage
…	…	…

History

Date	Event	Details
2026-03-12	Created	Part of autonomous coding agency architecture

References

FR-009 (Python Project Scaffold) — code infrastructure prerequisite
FR-043 (Custom Agents) — agents are the primary users of sandboxes
FR-056 (Autonomous Coding Orchestrator) — orchestrator creates sandboxes for dispatched jobs
FR-058 (Agent Git Workflow) — git operations happen inside sandboxed worktrees
FR-019 (VPS Deployment) — production sandboxing requires Docker on the server
vault/00_system/designs/drafts/autonomous-coding-agency.md — architecture overview

Opus Vault

Explorer

Sandboxed Code Execution

Decisions

User Tasks

Summary

Problem / Motivation

Proposed Solution

Open Questions

1. Docker Requirement Timing

2. Shared Dependencies

3. Timeout Enforcement

Phase Overview

Phase 1: Git Worktree Isolation —

Phase 2: Docker-Based Sandboxing —

Phase 3: Resource Limits + Lifecycle —

Prerequisites / Gap Analysis

Requirements

Current State

Gap (What’s missing?)

Test

Manual tests

AI-verified tests

E2E tests

Integration tests

Unit tests

History

References

Graph View

Table of Contents

Backlinks