LLM Provider Abstraction

Decisions

Pending: Sync vs async interface?
Pending: How to handle tool/function calling across providers?
Pending: Should streaming be mandatory or optional per call?

User Tasks

Summary

Build a thin abstraction layer over LLM provider APIs so all Opus agents call a unified interface. Provider (Claude, OpenAI, local models) is selected via configuration, not hardcoded.

Problem / Motivation

Per CLAUDE.md’s LLM Portability rule, Opus must be model-agnostic. Currently there is no Python code at all, so this is the right time to establish the abstraction before any agent code is written. Without it:

Every agent would import provider-specific SDKs directly
Switching providers means rewriting every agent
Testing agents requires live API calls (no mock provider)
Cost optimization (routing cheap tasks to cheaper models) is impossible

Proposed Solution

A src/opus/llm/ module with three layers:

Interface (base.py): Abstract base class defining complete(), stream(), tool_call() methods and a universal LLMResponse model.
Providers (providers/claude.py, providers/openai.py, etc.): Concrete implementations wrapping each SDK.
Factory (factory.py): Reads config, returns the correct provider instance. Agents never instantiate providers directly.

Key design constraints:

Response model is provider-agnostic (text, tool calls, usage stats)
Tool/function calling uses a common schema, translated per provider
Streaming uses a common async iterator pattern
Provider config lives in a YAML/TOML file, not in code

Open Questions

1. Sync vs Async Interface

Question: Should the base interface be synchronous, asynchronous, or both?

Option	Description
A) Sync only	Simpler, works everywhere. Blocks during API calls
B) Async only	Better for concurrent agents. Requires async runtime everywhere
C) Async with sync wrapper	Async-native with a `complete_sync()` convenience method for simple scripts

Recommendation: Option C — async is needed for the orchestrator (FR-056) running multiple agents, but sync wrappers keep simple usage easy.

Decision:

2. Tool/Function Calling Abstraction

Question: How to handle the different tool calling formats across providers?

Option	Description
A) Common tool schema	Define tools in a provider-agnostic JSON schema, translate per provider at call time
B) Provider-native schemas	Each provider uses its own tool format, agents must know the provider
C) No tool support initially	Skip tool calling in Phase 1, add later

Recommendation: Option A — tool calling is critical for coding agents. A common schema keeps agents portable.

Decision:

3. Streaming Behavior

Question: Should streaming be the default, optional, or provider-dependent?

Option	Description
A) Always stream	Consistent behavior, but some providers may not support it well
B) Optional per call	Caller decides: `complete()` for full response, `stream()` for incremental
C) Provider decides	Stream if supported, buffer if not. Transparent to caller

Recommendation: Option B — explicit is better than implicit. Callers that need streaming use stream().

Decision:

Phase Overview

Phase	Description	Status
Phase 1	Interface definition + Claude API implementation	—
Phase 2	Response streaming, tool/function calling support	—
Phase 3	Provider switching (OpenAI, local models)	—

Phase 1: Interface Definition + Claude Provider —

Goal: Define the universal LLM interface and implement the first provider (Claude/Anthropic).

File / Feature	Details	Owner	Status
`src/opus/llm/base.py`	Abstract base class: `LLMProvider` with `complete()`, `stream()`, `tool_call()`	opus	—
`src/opus/llm/models.py`	`LLMRequest`, `LLMResponse`, `ToolDefinition`, `ToolCall`, `Usage` dataclasses	opus	—
`src/opus/llm/providers/claude.py`	Anthropic SDK wrapper implementing `LLMProvider`	opus	—
`src/opus/llm/factory.py`	Provider factory: read config, return provider instance	opus	—
`src/opus/llm/config.py`	Config schema: provider name, model, API key ref, temperature, max tokens	opus	—
Unit tests	Mock provider for testing, test request/response serialization	mv	—

Phase 2: Streaming + Tool Calling —

Goal: Add streaming responses and tool/function calling with provider-agnostic schemas.

File / Feature	Details	Owner	Status
`src/opus/llm/base.py`	Add `stream()` returning `AsyncIterator[LLMChunk]`	opus	—
`src/opus/llm/tools.py`	Common tool schema definition + per-provider translators	opus	—
`src/opus/llm/providers/claude.py`	Implement streaming + Anthropic tool_use format translation	opus	—
Retry logic	Exponential backoff on rate limits and transient errors	opus	—
Usage tracking	Track tokens per call, aggregate per agent session	opus	—
Unit tests	Streaming mock, tool call round-trip tests	mv	—

Phase 3: Multi-Provider Support —

Goal: Add OpenAI and local model providers, enable runtime provider switching.

File / Feature	Details	Owner	Status
`src/opus/llm/providers/openai.py`	OpenAI SDK wrapper implementing `LLMProvider`	opus	—
`src/opus/llm/providers/local.py`	Local model wrapper (ollama, llama.cpp HTTP API)	opus	—
Provider switching	Config-driven: change provider in config file, agents work unchanged	opus	—
Model routing	Route simple tasks to cheaper/faster models, complex to capable ones	opus	—
Integration tests	Same test suite runs against all providers	mv	—

Prerequisites / Gap Analysis

Requirements

Requirement	Description
REQ-0	Coding agency design doc reviewed and approved
REQ-1	Python project scaffold (FR-009) — this is a Python module
REQ-2	Secrets management (FR-015) — API keys must not be hardcoded

Current State

Component	Status	Details
Python scaffold	—	FR-009 not started
LLM integration	—	No Python LLM code exists

Gap (What’s missing?)

Gap	Effort	Blocker?
Python scaffold (FR-009)	Med	Yes — code needs a home
API key management	Low	No — can use env vars initially

Test

Manual tests

Test	Expected	Actual	Last
…	…	pending	-

AI-verified tests

Scenario	Expected behavior	Verification method
…	…	…

E2E tests

Scenario	Assertion
…	…

Integration tests

Component	Coverage
…	…

Unit tests

Component	Tests	Coverage
…	…	…

History

Date	Event	Details
2026-03-12	Created	Part of autonomous coding agency architecture

References

FR-009 (Python Project Scaffold) — code infrastructure prerequisite
FR-043 (Custom Agents) — agents are the primary consumers of this abstraction
FR-056 (Autonomous Coding Orchestrator) — orchestrator uses LLM for planning decisions
FR-053 (Cost & Token Tracking) — usage data from this module feeds cost tracking
vault/00_system/designs/drafts/autonomous-coding-agency.md — architecture overview

Opus Vault

Explorer

LLM Provider Abstraction

Decisions

User Tasks

Summary

Problem / Motivation

Proposed Solution

Open Questions

1. Sync vs Async Interface

2. Tool/Function Calling Abstraction

3. Streaming Behavior

Phase Overview

Phase 1: Interface Definition + Claude Provider —

Phase 2: Streaming + Tool Calling —

Phase 3: Multi-Provider Support —

Prerequisites / Gap Analysis

Requirements

Current State

Gap (What’s missing?)

Test

Manual tests

AI-verified tests

E2E tests

Integration tests

Unit tests

History

References

Graph View

Table of Contents

Backlinks