Decisions
- Pending: Sync vs async interface?
- Pending: How to handle tool/function calling across providers?
- Pending: Should streaming be mandatory or optional per call?
User Tasks
Summary
Build a thin abstraction layer over LLM provider APIs so all Opus agents call a unified interface. Provider (Claude, OpenAI, local models) is selected via configuration, not hardcoded.
Problem / Motivation
Per CLAUDE.md’s LLM Portability rule, Opus must be model-agnostic. Currently there is no Python code at all, so this is the right time to establish the abstraction before any agent code is written. Without it:
- Every agent would import provider-specific SDKs directly
- Switching providers means rewriting every agent
- Testing agents requires live API calls (no mock provider)
- Cost optimization (routing cheap tasks to cheaper models) is impossible
Proposed Solution
A src/opus/llm/ module with three layers:
- Interface (
base.py): Abstract base class definingcomplete(),stream(),tool_call()methods and a universalLLMResponsemodel. - Providers (
providers/claude.py,providers/openai.py, etc.): Concrete implementations wrapping each SDK. - Factory (
factory.py): Reads config, returns the correct provider instance. Agents never instantiate providers directly.
Key design constraints:
- Response model is provider-agnostic (text, tool calls, usage stats)
- Tool/function calling uses a common schema, translated per provider
- Streaming uses a common async iterator pattern
- Provider config lives in a YAML/TOML file, not in code
Open Questions
1. Sync vs Async Interface
Question: Should the base interface be synchronous, asynchronous, or both?
| Option | Description |
|---|---|
| A) Sync only | Simpler, works everywhere. Blocks during API calls |
| B) Async only | Better for concurrent agents. Requires async runtime everywhere |
| C) Async with sync wrapper | Async-native with a complete_sync() convenience method for simple scripts |
Recommendation: Option C — async is needed for the orchestrator (FR-056) running multiple agents, but sync wrappers keep simple usage easy.
Decision:
2. Tool/Function Calling Abstraction
Question: How to handle the different tool calling formats across providers?
| Option | Description |
|---|---|
| A) Common tool schema | Define tools in a provider-agnostic JSON schema, translate per provider at call time |
| B) Provider-native schemas | Each provider uses its own tool format, agents must know the provider |
| C) No tool support initially | Skip tool calling in Phase 1, add later |
Recommendation: Option A — tool calling is critical for coding agents. A common schema keeps agents portable.
Decision:
3. Streaming Behavior
Question: Should streaming be the default, optional, or provider-dependent?
| Option | Description |
|---|---|
| A) Always stream | Consistent behavior, but some providers may not support it well |
| B) Optional per call | Caller decides: complete() for full response, stream() for incremental |
| C) Provider decides | Stream if supported, buffer if not. Transparent to caller |
Recommendation: Option B — explicit is better than implicit. Callers that need streaming use stream().
Decision:
Phase Overview
| Phase | Description | Status |
|---|---|---|
| Phase 1 | Interface definition + Claude API implementation | — |
| Phase 2 | Response streaming, tool/function calling support | — |
| Phase 3 | Provider switching (OpenAI, local models) | — |
Phase 1: Interface Definition + Claude Provider —
Goal: Define the universal LLM interface and implement the first provider (Claude/Anthropic).
| File / Feature | Details | Owner | Status |
|---|---|---|---|
src/opus/llm/base.py | Abstract base class: LLMProvider with complete(), stream(), tool_call() | opus | — |
src/opus/llm/models.py | LLMRequest, LLMResponse, ToolDefinition, ToolCall, Usage dataclasses | opus | — |
src/opus/llm/providers/claude.py | Anthropic SDK wrapper implementing LLMProvider | opus | — |
src/opus/llm/factory.py | Provider factory: read config, return provider instance | opus | — |
src/opus/llm/config.py | Config schema: provider name, model, API key ref, temperature, max tokens | opus | — |
| Unit tests | Mock provider for testing, test request/response serialization | mv | — |
Phase 2: Streaming + Tool Calling —
Goal: Add streaming responses and tool/function calling with provider-agnostic schemas.
| File / Feature | Details | Owner | Status |
|---|---|---|---|
src/opus/llm/base.py | Add stream() returning AsyncIterator[LLMChunk] | opus | — |
src/opus/llm/tools.py | Common tool schema definition + per-provider translators | opus | — |
src/opus/llm/providers/claude.py | Implement streaming + Anthropic tool_use format translation | opus | — |
| Retry logic | Exponential backoff on rate limits and transient errors | opus | — |
| Usage tracking | Track tokens per call, aggregate per agent session | opus | — |
| Unit tests | Streaming mock, tool call round-trip tests | mv | — |
Phase 3: Multi-Provider Support —
Goal: Add OpenAI and local model providers, enable runtime provider switching.
| File / Feature | Details | Owner | Status |
|---|---|---|---|
src/opus/llm/providers/openai.py | OpenAI SDK wrapper implementing LLMProvider | opus | — |
src/opus/llm/providers/local.py | Local model wrapper (ollama, llama.cpp HTTP API) | opus | — |
| Provider switching | Config-driven: change provider in config file, agents work unchanged | opus | — |
| Model routing | Route simple tasks to cheaper/faster models, complex to capable ones | opus | — |
| Integration tests | Same test suite runs against all providers | mv | — |
Prerequisites / Gap Analysis
Requirements
| Requirement | Description |
|---|---|
| REQ-0 | Coding agency design doc reviewed and approved |
| REQ-1 | Python project scaffold (FR-009) — this is a Python module |
| REQ-2 | Secrets management (FR-015) — API keys must not be hardcoded |
Current State
| Component | Status | Details |
|---|---|---|
| Python scaffold | — | FR-009 not started |
| LLM integration | — | No Python LLM code exists |
Gap (What’s missing?)
| Gap | Effort | Blocker? |
|---|---|---|
| Python scaffold (FR-009) | Med | Yes — code needs a home |
| API key management | Low | No — can use env vars initially |
Test
Manual tests
| Test | Expected | Actual | Last |
|---|---|---|---|
| … | … | pending | - |
AI-verified tests
| Scenario | Expected behavior | Verification method |
|---|---|---|
| … | … | … |
E2E tests
| Scenario | Assertion |
|---|---|
| … | … |
Integration tests
| Component | Coverage |
|---|---|
| … | … |
Unit tests
| Component | Tests | Coverage |
|---|---|---|
| … | … | … |
History
| Date | Event | Details |
|---|---|---|
| 2026-03-12 | Created | Part of autonomous coding agency architecture |
References
- FR-009 (Python Project Scaffold) — code infrastructure prerequisite
- FR-043 (Custom Agents) — agents are the primary consumers of this abstraction
- FR-056 (Autonomous Coding Orchestrator) — orchestrator uses LLM for planning decisions
- FR-053 (Cost & Token Tracking) — usage data from this module feeds cost tracking
vault/00_system/designs/drafts/autonomous-coding-agency.md— architecture overview