AD-011: AI Providers
Summary
The framework integrates LLM capabilities through an LLMProvider interface
in providers/ai/ (package aiprovider), with built-in implementations for
Anthropic, OpenAI, Azure OpenAI, Ollama, and Gemini (plus an offline demo
provider) and an optional StreamingLLMProvider extension for live thinking
progress. AI tools call providers directly; throughput comes from
config-driven batching and bounded concurrency inside the tool, not from a
separate worker-pool subsystem. A ChatStructured method with JSON Schema
enables reliable batch translation and other structured-output tasks.
Context
Modern LLMs are capable translators, reviewers, and terminology extractors. Treating them as a separate service loses the composability of the streaming pipeline: AI tools should sit alongside TM leverage, term enforcement, and QA in the same flow.
AI APIs come with practical constraints: rate limits, cost per token, transient failures, and variable latency. The framework's answer is to keep the provider interface thin and let the calling tool decide how much work to batch into a single request and how many requests to run in parallel. Workspace-scale orchestration — async job queues, multi-tenant quotas — belongs to a platform layer, not to the framework primitives.
Providers also differ in their structured-output mechanism: OpenAI and
Azure OpenAI use response_format: json_schema, Anthropic uses tool-use
with input_schema, Ollama uses format: json, and Gemini uses
response schema hints. A single interface must paper over these details
while giving tools a predictable contract.
Decision
LLMProvider interface
type LLMProvider interface {
Name() ProviderID
Translate(ctx context.Context, req TranslateRequest) (*TranslateResponse, error)
Chat(ctx context.Context, messages []Message) (*ChatResponse, error)
ChatStructured(ctx context.Context, messages []Message,
schema JSONSchema) (*ChatResponse, error)
Close() error
}
ChatStructured extends Chat with a JSON Schema constraint that forces
the provider to return structured output. The JSONSchema type includes
Name, Description, Schema (the JSON Schema definition), and a
Strict flag for providers that support strict validation.
Provider configuration is schema-driven: fields in AI tool configs generate
CLI flags automatically via schema.FromStruct(), removing the need for
manual flag registration.
Built-in providers
| Provider | File | Default Model | Notes |
|---|---|---|---|
| Anthropic | providers/ai/anthropic.go | claude-sonnet-4-20250514 | Extended thinking support |
| OpenAI | providers/ai/openai.go | gpt-4o | response_format JSON schema |
| Azure OpenAI | providers/ai/azureopenai.go | deployment-specific | Managed Identity via TokenProvider |
| Ollama | providers/ai/ollama.go | llama3 | Local models, format: json |
| Google Gemini | providers/ai/gemini.go | gemini-3-flash-preview | SSE streaming with includeThoughts |
Two non-network providers round out the registry: a mock provider
(providers/ai/mock.go) for deterministic tests, and a demo provider
(providers/ai/demo.go) registered as demo that returns illustrative
output so the browser playground can run AI commands with no API keys. The
provider list is generated from the registry in providers/ai/provider.go
(Providers()), not hardcoded — the live set surfaces as the provider
option in the ai-translate reference.
Each provider takes a Config struct with API key, base URL, model name,
and generation parameters (temperature, max tokens, etc.). Azure OpenAI
additionally accepts a TokenProvider function, enabling passwordless
access via Azure Managed Identity.
StreamingLLMProvider
An optional extension interface surfaces live progress events for providers that support them:
type StreamingLLMProvider interface {
LLMProvider
ChatStream(ctx context.Context, messages []Message,
onEvent func(ChatStreamEvent)) (*ChatResponse, error)
ChatStructuredStream(ctx context.Context, messages []Message,
schema JSONSchema, onEvent func(ChatStreamEvent)) (*ChatResponse, error)
}
type ChatStreamEvent struct {
Type StreamEventType // StreamEventThinking | StreamEventContent | StreamEventDone
Content string // text chunk (thinking summary or output content)
Usage TokenUsage // cumulative usage; populated on StreamEventDone
Model string // model name; populated on StreamEventDone
}
The streaming methods deliver progress events through an onEvent
callback and return the final aggregated *ChatResponse, rather than
exposing a channel directly.
UIs and CLI tools display live thinking progress from providers that
support it (Anthropic extended thinking, Gemini includeThoughts). A
provider that does not implement StreamingLLMProvider can still be
used — callers that need streaming check for the extension with a type
assertion.
Concurrency model
AI tools call the provider directly — provider.Translate() for a single
block, provider.ChatStructured() for a batch. There is no intervening
worker pool, rate limiter, or circuit breaker in the framework. Throughput is
a property of the tool's own configuration, illustrated by ai-translate
(core/ai/tools/translate.go):
const (
DefaultBatchSize = 100
DefaultBatchConcurrency = 1
)
AITranslateConfig exposes BatchSize and BatchConcurrency as schema
fields, so they surface as CLI flags and flow config like any other tool
option. The tool's Process method chooses a path from those values:
- Block-by-block (
batchSize <= 1andconcurrency <= 1) — the defaultBaseTool.Processdrives oneprovider.Translate()call per translatable Block. Under a session it uses the simplest sequential skip/hydrate path (sessionHandleBlock):GetOverlayto skip already-translated Blocks,PutOverlayto write the result back. The batched path also honours session overlay caching, viaprocessBatchedWithSession, which pre-filters cached Blocks and writes overlays on the way out. - Batched (
processBatched) — drains all input Parts into a slice, selects the translatable Blocks (skipping already-translated ones whenSkipMatchedis set), groups them into batches ofbatchSize, and translates each batch in a singleChatStructured()call. Batches run under achan struct{}semaphore sized toBatchConcurrency, so at most that many LLM calls are in flight at once. All Parts are then written downstream in their original order; entries missing from the structured response fall back to individual per-blocktranslate()calls (oneprovider.Translate()per missing Block).
Streaming mode is orthogonal: when the provider implements
StreamingLLMProvider and an OnProgress callback is supplied, the tool
routes calls through ChatStream / ChatStructuredStream to surface live
thinking summaries (see below). Transient-failure handling (retry, backoff)
is left to the individual provider implementations and the underlying SDK;
the framework does not impose a uniform retry policy.
This in-tool batching is distinct from the ParallelBlockTool concurrency in
AD-004: Processing Engine, which parallelizes Part
dispatch across the pipeline rather than grouping Blocks into a single API
call.
AI tools
AI capabilities reach the pipeline as ordinary Tools (AD-006: Tool System):
| Tool | Purpose |
|---|---|
ai-translate | Translate untranslated Blocks using an LLM |
ai-qa | Check translations for fluency, accuracy, terminology |
ai-terminology | Extract terminology candidates from source Blocks |
ai-review | Review translations with explanations |
ai-entity-extract | Extract entities and term candidates (hybrid LLM + NER) |
Because AI tools are ordinary Tools, they compose naturally:
Terminology-aware prompts
AI tools receive terminology context from upstream stages:
- Term annotations — when
term-lookuphas run, matched terms and their preferred translations appear in the prompt. - Entity annotations — when
ai-entity-extracthas run, identified entities (with DNT flags and locale formatting hints) appear in the prompt context. - Glossary constraints — a dedicated glossary section lists preferred and forbidden terms applicable to the current Block's domain, product, and market.
Terminology enforcement is not just a post-translation validation step; it actively guides AI translation quality from the start.
Structured batch output
The batched ai-translate path relies on ChatStructured() to make a
multi-block response unambiguous. The tool sends a numbered prompt
([1] …, [2] …) and constrains the response to a JSON Schema that returns
{ translations: [{ index, text }] } with additionalProperties: false and
strict: true. Index-text pairs eliminate the text-parsing ambiguity of
free-form output and let the tool re-associate each translation with its
source Block. Blocks whose source carries inline codes are rendered as
placeholder-tagged text before the call and reconstructed from the response
via ParseRunsPlaceholderText, so inline markup survives the round trip.
Prompt templates
Prompt templates live in core/ai/prompt/ as versioned Go files
using text/template:
translate.go— translation prompts with glossary and context (single and batched)qa.go— quality assurance check prompts
Tool-specific prompts that have not been factored into the shared prompt
package (e.g. the review prompt) are built inline in their tool, such as
core/ai/tools/review.go.
Templates are context-aware: they include surrounding Blocks for document context, glossary constraints from term lookup, TM matches from leveraging, and format metadata (HTML tag handling instructions, CDATA boundaries, etc.).
Credential resolution
AI providers read credentials at runtime from one of three sources:
- The CLI credential store (AD-013: Kapi CLI) — provider configs as JSON, API keys in the OS keychain.
- Environment variables —
ANTHROPIC_API_KEY,OPENAI_API_KEY, etc. - Explicit
--api-keyflag on CLI invocation.
Flag overrides store overrides environment. API keys never appear in project files.
Scope boundary
The framework's responsibility ends at the provider interface and the pipeline tools that call it. Server-side asynchronous job queues, multi-tenant quota enforcement, rate-limit budgets, and workspace-scale translation orchestration are a platform layer's concern, built on top of these framework primitives.
Consequences
- AI translation is a pipeline tool, not a separate system. It composes with all other tools without special orchestration.
- Ordering is meaningful: TM leverage before AI translation avoids re-translating exact matches, reducing cost.
- Terminology context flows through the pipeline via annotations, enabling AI tools to produce terminology-consistent translations from the start.
- Throughput tuning lives on the tool, not in a hidden subsystem: a
caller raises
BatchSizeto cut API call count andBatchConcurrencyto run batches in parallel, with no separate worker pool to configure. - Structured batch output gives the tool a reliable index-text contract, so large documents translate in far fewer calls without parsing ambiguity.
- Provider abstraction enables cost optimization: local Ollama for development, Claude or OpenAI for production.
- Prompt templates are centralized and testable. The mock provider enables deterministic tests without API calls.
- Azure Managed Identity eliminates API key management for production Azure deployments while the same interface continues to support key-based auth elsewhere.
ChatStructuredgives tools a reliable JSON contract across providers with very different structured-output mechanisms.
Related
- AD-002: Content Model — annotations on Blocks
- AD-004: Processing Engine — flow execution
and
ParallelBlockTool - AD-006: Tool System — Tool pattern
- AD-009: Translation Memory —
tm-leveragefeeds context to AI tools - AD-010: Terminology — term annotations feed context to AI tools
- AD-012: MT Providers — complementary external MT services
- AD-013: Kapi CLI — credential store