Decisions

  • Pending: Sync vs async interface?
  • Pending: How to handle tool/function calling across providers?
  • Pending: Should streaming be mandatory or optional per call?

User Tasks


Summary

Build a thin abstraction layer over LLM provider APIs so all Opus agents call a unified interface. Provider (Claude, OpenAI, local models) is selected via configuration, not hardcoded.

Problem / Motivation

Per CLAUDE.md’s LLM Portability rule, Opus must be model-agnostic. Currently there is no Python code at all, so this is the right time to establish the abstraction before any agent code is written. Without it:

  • Every agent would import provider-specific SDKs directly
  • Switching providers means rewriting every agent
  • Testing agents requires live API calls (no mock provider)
  • Cost optimization (routing cheap tasks to cheaper models) is impossible

Proposed Solution

A src/opus/llm/ module with three layers:

  1. Interface (base.py): Abstract base class defining complete(), stream(), tool_call() methods and a universal LLMResponse model.
  2. Providers (providers/claude.py, providers/openai.py, etc.): Concrete implementations wrapping each SDK.
  3. Factory (factory.py): Reads config, returns the correct provider instance. Agents never instantiate providers directly.

Key design constraints:

  • Response model is provider-agnostic (text, tool calls, usage stats)
  • Tool/function calling uses a common schema, translated per provider
  • Streaming uses a common async iterator pattern
  • Provider config lives in a YAML/TOML file, not in code

Open Questions

1. Sync vs Async Interface

Question: Should the base interface be synchronous, asynchronous, or both?

OptionDescription
A) Sync onlySimpler, works everywhere. Blocks during API calls
B) Async onlyBetter for concurrent agents. Requires async runtime everywhere
C) Async with sync wrapperAsync-native with a complete_sync() convenience method for simple scripts

Recommendation: Option C — async is needed for the orchestrator (FR-056) running multiple agents, but sync wrappers keep simple usage easy.

Decision:

2. Tool/Function Calling Abstraction

Question: How to handle the different tool calling formats across providers?

OptionDescription
A) Common tool schemaDefine tools in a provider-agnostic JSON schema, translate per provider at call time
B) Provider-native schemasEach provider uses its own tool format, agents must know the provider
C) No tool support initiallySkip tool calling in Phase 1, add later

Recommendation: Option A — tool calling is critical for coding agents. A common schema keeps agents portable.

Decision:

3. Streaming Behavior

Question: Should streaming be the default, optional, or provider-dependent?

OptionDescription
A) Always streamConsistent behavior, but some providers may not support it well
B) Optional per callCaller decides: complete() for full response, stream() for incremental
C) Provider decidesStream if supported, buffer if not. Transparent to caller

Recommendation: Option B — explicit is better than implicit. Callers that need streaming use stream().

Decision:


Phase Overview

PhaseDescriptionStatus
Phase 1Interface definition + Claude API implementation
Phase 2Response streaming, tool/function calling support
Phase 3Provider switching (OpenAI, local models)

Phase 1: Interface Definition + Claude Provider —

Goal: Define the universal LLM interface and implement the first provider (Claude/Anthropic).

File / FeatureDetailsOwnerStatus
src/opus/llm/base.pyAbstract base class: LLMProvider with complete(), stream(), tool_call()opus
src/opus/llm/models.pyLLMRequest, LLMResponse, ToolDefinition, ToolCall, Usage dataclassesopus
src/opus/llm/providers/claude.pyAnthropic SDK wrapper implementing LLMProvideropus
src/opus/llm/factory.pyProvider factory: read config, return provider instanceopus
src/opus/llm/config.pyConfig schema: provider name, model, API key ref, temperature, max tokensopus
Unit testsMock provider for testing, test request/response serializationmv

Phase 2: Streaming + Tool Calling —

Goal: Add streaming responses and tool/function calling with provider-agnostic schemas.

File / FeatureDetailsOwnerStatus
src/opus/llm/base.pyAdd stream() returning AsyncIterator[LLMChunk]opus
src/opus/llm/tools.pyCommon tool schema definition + per-provider translatorsopus
src/opus/llm/providers/claude.pyImplement streaming + Anthropic tool_use format translationopus
Retry logicExponential backoff on rate limits and transient errorsopus
Usage trackingTrack tokens per call, aggregate per agent sessionopus
Unit testsStreaming mock, tool call round-trip testsmv

Phase 3: Multi-Provider Support —

Goal: Add OpenAI and local model providers, enable runtime provider switching.

File / FeatureDetailsOwnerStatus
src/opus/llm/providers/openai.pyOpenAI SDK wrapper implementing LLMProvideropus
src/opus/llm/providers/local.pyLocal model wrapper (ollama, llama.cpp HTTP API)opus
Provider switchingConfig-driven: change provider in config file, agents work unchangedopus
Model routingRoute simple tasks to cheaper/faster models, complex to capable onesopus
Integration testsSame test suite runs against all providersmv

Prerequisites / Gap Analysis

Requirements

RequirementDescription
REQ-0Coding agency design doc reviewed and approved
REQ-1Python project scaffold (FR-009) — this is a Python module
REQ-2Secrets management (FR-015) — API keys must not be hardcoded

Current State

ComponentStatusDetails
Python scaffoldFR-009 not started
LLM integrationNo Python LLM code exists

Gap (What’s missing?)

GapEffortBlocker?
Python scaffold (FR-009)MedYes — code needs a home
API key managementLowNo — can use env vars initially

Test

Manual tests

TestExpectedActualLast
pending-

AI-verified tests

ScenarioExpected behaviorVerification method

E2E tests

ScenarioAssertion

Integration tests

ComponentCoverage

Unit tests

ComponentTestsCoverage

History

DateEventDetails
2026-03-12CreatedPart of autonomous coding agency architecture

References

  • FR-009 (Python Project Scaffold) — code infrastructure prerequisite
  • FR-043 (Custom Agents) — agents are the primary consumers of this abstraction
  • FR-056 (Autonomous Coding Orchestrator) — orchestrator uses LLM for planning decisions
  • FR-053 (Cost & Token Tracking) — usage data from this module feeds cost tracking
  • vault/00_system/designs/drafts/autonomous-coding-agency.md — architecture overview