AD-006: Tool System
Summary
A Tool is a single stage in a processing pipeline. It reads Parts from an
input channel and writes Parts to an output channel. Tools compose into
Flows; Flows are executed by the pipeline engine
(AD-004: Processing Engine). The BaseTool
struct with optional handler fields — a capability-typed block handler
(Annotate / Translate / Transform) plus untyped HandleDataFn,
HandleMediaFn for other Part types — lets most tools implement only the
handler for the Part type they care about; everything else passes through
unchanged. The block handler a tool sets also declares what it may write
(see "Content immutability by capability" below). Tools
declare parameter schemas via SchemaProvider, which drives CLI flag
generation, flow-editor config panels, and validation. An IO contract on
ToolMeta declares locale cardinality, the stand-off layers a tool produces,
and side effects so the runner can infer locale iteration and the flow editor
can show data flow.
Context
Most tools only care about one or two Part types. A translation tool
processes Blocks; a word counter reads Blocks; a binary extractor handles
Media. Requiring every tool to implement the full Process(ctx, in, out)
method with a type switch over all Part types produces repetitive
boilerplate and creates risk of accidentally dropping Parts.
Beyond structural dispatch, a tool system needs to answer several questions uniformly for CLI, flow editor, and plugin consumers:
- What parameters does this tool accept, and what are their types?
- How many locales does it operate on? Which ones?
- What stand-off layers does it produce? Which does it consume?
- What external systems does it touch (TM, termbase, APIs)?
Decision
Tool interface and BaseTool dispatch
The core interface is minimal:
type Tool interface {
Process(ctx context.Context, in <-chan *Part, out chan<- *Part) error
}
BaseTool provides a standard dispatch shell. The block handler is one of
three capability-typed fields — the tool sets exactly one, and the parameter
type bounds what it may write:
type BaseTool struct {
Annotate func(BlockView) error // read-only: overlays/annotations/properties
Translate func(TargetView) error // writes target
Transform func(BlockView) (EditPlan, error) // read-only producer; the applier rewrites source
HandleDataFn func(ctx context.Context, data *Data) (*Data, error)
HandleMediaFn func(ctx context.Context, media *Media) (*Media, error)
SchemaFn func() *schema.ComponentSchema
}
BaseTool.Process reads Parts from the input channel, dispatches Blocks to
whichever capability-typed handler is set (and other Part types to their
Handle*Fn), and passes unhandled Part types through unchanged. Concrete tools
embed BaseTool and set only the handlers they need. A tool that needs the full
stream — batching, 1→N fan-out, cross-block state (e.g. the batch collector, the
concurrent ai-translate path) — overrides Process directly; it may reuse a
typed handler over a held block via tool.NewBlockView/NewTargetView.
SessionTool extension
The channel-based Tool.Process is a forward-only transform. Some tools
need random access to the project's block state — lookup by content hash,
reading prior overlays (TM matches, QA findings, previously-produced
targets) to skip work that's already done, or writing annotations that
downstream tools in the same or a later run will consult. Those tools
opt into the SessionTool interface alongside Tool:
type SessionTool interface {
Tool
SessionProcess(
ctx context.Context,
sess blockstore.Session,
in <-chan *Part,
out chan<- *Part,
) error
}
Lifecycle (owned by the executor, not the tool):
- At flow start the executor opens a
blockstore.Sessionagainst the project's declared store backend (memory,cache, remote — see AD-008: Kapi Project Model). - For each tool the executor calls
SessionProcesswhen the tool implementsSessionTool, otherwise the plain streamingProcess. Hybrid implementations are allowed:SessionProcesscan read fromin, enrich via the session, and emit toout. - The executor commits the session on success or rolls back on error.
Tools MUST NOT call
Commit/Rollbackthemselves.
SessionTool is additive — every SessionTool also implements Tool so flow composition (chaining steps that may or may not use the session) keeps working. See the SessionTool authoring guide for idiomatic patterns (skip-if-cached, overlay conventions, provider selection).
Tool categories
Tools fall into four categories that set expectations for idempotency and ordering:
| Category | Responsibility | Examples |
|---|---|---|
| Transform | Modify content in place | case change, search/replace, redaction |
| Enrich | Add metadata or overlays | segmentation, TM leveraging, AI translation, terminology lookup |
| Validate | Check quality without modifying | QA checks, word count, character count |
| Convert | Transform representations | Encoding conversion, line-break normalization |
IO model
Each tool declares an IO contract in its ToolMeta (package core/schema).
The contract is expressed over IOPorts — typed stand-off layers of a Block
(AD-002) — not over coarse part-type names: Consumes
lists the ports a tool reads upstream and Produces the ports it writes. An
IOPort's Type names an overlay type (term, qa, …), a block-annotation
type (brand-voice, …), or a pseudo-port (PortTarget / PortSource); its
Side says which side it pertains to; and Optional marks a consumed port as
degradable (graceful degradation) rather than required.
// core/schema/schema.go
type IOPort struct {
Type string // overlay type, annotation type, or "target"/"source"
Side model.Side // source | target
Optional bool // consumed: degrades without it, does more with it
Layer string // segmentation granularity; LayerPrimary = primary
}
// PortTarget is the committed Target; PortSource is a rewritten source.
const (
PortTarget = "target"
PortSource = "source"
)
type ToolMeta struct {
ID string
Category string // "translate","validate","enrich","convert","transform","pipeline"
DisplayName string
Description string
Tags []string
// Requires declares external resources the tool needs at runtime.
Requires []string // "target-language","tm","termbase","credentials",…
// Cardinality declares how many locales the tool operates on per execution.
Cardinality LocaleCardinality
// DefaultLocale is an optional default for monolingual and bilingual tools.
DefaultLocale model.LocaleID
// Consumes / Produces are the IO contract. Non-Optional consumed
// ports are hard requirements the flow validator enforces.
Consumes []IOPort
Produces []IOPort
// SideEffects lists external systems this tool reads from or writes to.
SideEffects []SideEffect
// Recoverable marks a transformer that vaults the originals it removes
// and restores them later (redaction); the placement pass holds it to
// the remote-egress rule.
Recoverable bool
WritesOutput bool // CLI adds -o/--output when true
DefaultParallelBlocks int // concurrency for IO-bound tools
Aliases []string // alternative CLI command names
}
For example tm-leverage optionally consumes source segmentation and produces
tm-match, alt-translation and target; qa-check requires a target and
produces qa. The flow loader uses these contracts for data-flow validation —
a flow whose tool needs a port that no upstream tool or the source binding
supplies is rejected at build (AD-026).
Locale cardinality
Tools declare how many locales they operate on per execution:
type LocaleCardinality string
const (
// Monolingual — operates on a single locale.
// Examples: word-count (source), pseudo-translate (target),
// encoding-detect (source).
Monolingual LocaleCardinality = "monolingual"
// Bilingual — operates on exactly two locales, provided as a pair.
// Examples: ai-translate (source→target), qa-check (source vs target).
Bilingual LocaleCardinality = "bilingual"
// Multilingual — operates on N locales simultaneously.
// Examples: translation-comparison, cross-locale QA.
Multilingual LocaleCardinality = "multilingual"
)
Cardinality describes how many locales a tool needs. Which locales are provided at runtime by the runner or flow configuration — never hardcoded in the tool.
Uniform locale access
Blocks carry one source locale and N target locales. The source locale is structurally distinct because it anchors the document skeleton and inline code positions, but tools should not need to know whether a locale is "source" or "target" — they just need text for a given locale:
// Text returns the plain text for a locale: the source text if the
// locale matches the Block's source locale, otherwise the target text.
func (b *Block) Text(locale LocaleID) string
// SetText writes text for a locale (source if it matches the source
// locale, otherwise a target).
func (b *Block) SetText(locale LocaleID, text string)
// HasLocale reports whether the Block has content for the locale.
func (b *Block) HasLocale(locale LocaleID) bool
A bilingual tool comparing [fr, de] calls block.Text("fr") and
block.Text("de") — identical code whether fr is source or target.
SourceText() and TargetText(locale) remain available when a tool
specifically needs the source-anchored skeleton.
Stand-off types and the payload registry
The stand-off types a tool consumes and produces are typed string constants
(AD-002). Positional, run-anchored layers use the
OverlayType constants (OverlaySegmentation, OverlayTerm, OverlayEntity,
OverlayQA, OverlayAlignment, OverlayTermCandidate); block-scoped metadata
uses the annotation-key constants (AnnoNote, AnnoAltTranslation,
AnnoTMMatch, AnnoWordCount, …). Both an overlay span's Value and an
annotation value are typed payloads; the framework registers the well-known
content payloads, and formats and plugins register additional types and their
constructors via one payload registry (model.RegisterPayload / NewPayload):
// Positional layers (Block.Overlays) — core/model/overlay.go
const (
OverlaySegmentation OverlayType = "segmentation"
OverlayTerm OverlayType = "term"
OverlayEntity OverlayType = "entity"
OverlayQA OverlayType = "qa"
OverlayAlignment OverlayType = "alignment"
OverlayTermCandidate OverlayType = "term-candidate"
)
// Block-scoped metadata (Block.Annotations) — core/model/annotation_access.go
const (
AnnoNote = "note"
AnnoAltTranslation = "alt-translation"
AnnoTMMatch = "tm-match"
AnnoWordCount = "word-count"
// …char-count, seg-count, comparison, repetition, brand-voice, …
)
The IO contract also uses two pseudo-ports — PortTarget ("target", the
committed Target) and PortSource ("source", a rewritten source) — which name
produced/consumed outputs that participate in data-flow validation but are not
stored as stand-off layers.
Every checker — terminology, do-not-translate, placeholder, QA, brand
voice — writes the same qa overlay (a core/check.FindingsAnnotation payload
carrying a []check.Finding plus a rolled-up score), so one scoring,
annotation, and governance path serves them all.
A tool's Consumes/Produces name these overlay and annotation types (or a
pseudo-port), so the same registry that discriminates a payload's concrete type
on the wire is the vocabulary the flow validator checks the IO contract against.
Side effects
Side effects are a closed set of known external interactions:
type SideEffect string
const (
SideEffectTMRead SideEffect = "tm-read"
SideEffectTMWrite SideEffect = "tm-write"
SideEffectTermbaseRead SideEffect = "termbase-read"
SideEffectTermbaseWrite SideEffect = "termbase-write"
SideEffectAPICall SideEffect = "api-call"
SideEffectAnalytics SideEffect = "analytics"
// RemoteSourceEgress marks a tool that sends source content to a remote
// system — deliberately distinct from APICall: a local detector or TM
// lookup must not carry it, every cloud-provider call must.
SideEffectRemoteSourceEgress SideEffect = "remote-source-egress"
)
Most side-effect declarations are informational metadata for the flow editor
and documentation. They are not enforced at runtime — a tool with
SideEffects: [SideEffectTMWrite] still runs normally even if no TM is
configured (it simply skips the write). This keeps the tool interface
simple while giving the UI enough information to warn meaningfully. The one
exception is RemoteSourceEgress: the transformer placement pass (below) keys
a hard build/load error off it, and a tool whose remoteness depends on
configuration (an AI tool pointed at a local Ollama or the offline demo
provider) refines it away through its contract resolver.
Flow locale inference
The runner inspects the tool chain's cardinality declarations to determine which locales to process:
func ResolveFlowLocales(
spec *StepsSpec,
toolInfos map[registry.ToolID]registry.ToolInfo,
sourceLocale string,
projectTargets []string,
) [][]string
The runner passes the flow's *StepsSpec plus a map from registry.ToolID to
registry.ToolInfo (which carries each tool's cardinality and default-locale
metadata), not a []ToolMeta slice.
Resolution returns a slice of locale sets — one set per execution pass. Examples:
| Flow | Tools | Passes |
|---|---|---|
| word-count | [word-count(mono)] | [[en]] |
| pseudo-translate | [pseudo-translate(bi, default:qps)] | [[en, qps]] |
| translate | [ai-translate(bi)] | [[en, de], [en, fr], [en, ja], ...] |
| translate+qa | [ai-translate(bi), qa-check(bi)] | [[en, de], [en, fr], ...] |
| compare de vs fr | [comparison(bi)] with config [de, fr] | [[de, fr]] |
| cross-locale QA | [consistency-check(multi)] | [[en, de, fr, ja, nb, ar]] |
| translate + pseudo | [ai-translate(bi), pseudo(bi, default:qps)] | [[en, de], [en, fr], ..., [en, qps]] |
Mixed flows resolve to the union of all needed passes.
Parameter schemas
Tools declare parameter schemas via the tool.SchemaProvider interface
with ComponentSchema in the core/schema/ package:
type SchemaProvider interface {
Schema() *schema.ComponentSchema
}
type ComponentSchema struct {
ID string // "$id"
Version string // "$version"
Title string
Description string
Type string // "object"
ToolMeta *ToolMeta // tool identity (see above)
Groups []ParameterGroup // UI groupings ("ui:groups")
StepMeta *StepMeta // Okapi-bridge step metadata, when applicable
Properties map[string]PropertySchema // parameter definitions
RawJSON json.RawMessage // full schema access
}
schema.FromStruct(cfg, meta) generates a ComponentSchema by reflecting
on a Go struct. It supports struct tags for additional metadata:
type PseudoConfig struct {
ExpansionPercent int `schema:"description=Text expansion percentage,min=0,max=200"`
Prefix string `schema:"description=Prefix for pseudo text"`
Suffix string `schema:"description=Suffix for pseudo text"`
InternalField string `schema:"-"` // excluded from schema
}
schema.ApplyConfig() bridges map[string]any configuration (from flow
YAML) to a typed struct via JSON round-trip.
The ToolRegistry stores schemas alongside factories via
RegisterWithSchema(name, factory, schema). All built-in tools register
auto-generated schemas.
Schema-driven features:
- CLI flags —
cli.RegisterSchemaFlags()auto-generates Cobra flags from the schema, mapping camelCase properties to kebab-case flags. - Flow editor — schema-driven config panels for tool nodes, reusing
the same
FilterConfigEditorcomponent that drives format filter configuration. - Validation —
ComponentSchema.Validate()checks parameter values against the schema. - JSON export —
kapi tools schema <name>prints the schema for any tool.
AI tool schemas include provider fields (Provider, APIKey, Model with enum support for provider selection), so AI-tool CLI flags are generated the same way as any other tool's.
Registration
Tools register into a ToolRegistry with a name, factory function, and
optional parameter schema:
reg.RegisterWithSchema("pseudo-translate", func() tool.Tool {
return NewPseudoTranslateTool(&PseudoConfig{Prefix: "▒ ", Suffix: " ▒", TargetLocale: "qps"})
}, toolSchema(&PseudoConfig{Prefix: "▒ ", Suffix: " ▒"}, toolMeta("pseudo-translate", "Pseudo Translate", schema.CategoryTranslation, ...)))
The factory is a zero-argument func() tool.Tool (registry.ToolFactory); it
returns a tool built from a default config, with no error return. A separate
config factory (SetConfigFactory) builds the tool from a config map when flow
YAML overrides the defaults.
RegisterAll(reg) in core/tools/register.go auto-registers all built-in
tools. AI and MT tools are auto-registered separately by aitools.RegisterAll
and mttools.RegisterAll (core/ai/tools, core/mt/tools), called alongside
the built-ins during App init (cli/app.go). Each registers with a default
offline factory (the mock LLM provider for AI tools, the demo MT provider for
the <provider>-translate tools) plus a config factory (SetConfigFactory);
the real provider is resolved from the credential-bearing config map at
tool-creation time, not at registration time.
Plugin tools (AD-007: Plugin System and Okapi Bridge)
use the same Tool interface via gRPC translation, so plugin-provided
tools and built-in tools are interchangeable from the pipeline's
perspective.
Annotation-based communication
Tools communicate through annotations on Blocks. A typical pipeline:
ai-entity-extractaddsEntityAnnotationwith named entities.term-lookupaddsTermAnnotationwith matched terminology.tm-leveragereads entity annotations for generalized matching, addsAltTranslation.ai-translatereads term and entity annotations for context-aware translation.term-enforcevalidates terminology consistency in targets.qa-checkvalidates translation quality.
Each tool reads the annotations it cares about and adds its own, keeping tools loosely coupled through a shared data model rather than direct dependencies.
Built-in tool inventory
All built-in tools register via RegisterAll() in core/tools/register.go.
Transform tools — modify content in place:
| Tool | Description |
|---|---|
pseudo-translate | Generate pseudo-translations with accent marks and prefix/suffix wrapping |
search-replace | Regex-based search and replace in content |
case-transform | Transform case of source and/or target text |
create-target | Create a target for blocks, optionally copying the source runs |
remove-target | Remove a locale's target (or all targets) from blocks |
inline-codes-remove | Strip inline-code runs to produce clean plain text |
properties-set | Set or modify block properties programmatically |
whitespace-correct | Normalize and fix whitespace issues in translations |
span-classify | Reclassify code:markup spans into semantic vocabulary types |
tag-protect | Identify and mark tags and placeholders for protection |
xslt-transform | Apply regex-based tag/text transformations to block text |
redact | Replace sensitive spans with placeholders pre-translation (recoverable transformer) |
unredact | Restore redacted spans from the vault post-translation |
Enrich tools — add metadata or overlays via annotations:
| Tool | Description |
|---|---|
segmentation | Annotate blocks with a sentence-segmentation overlay (SRX-like rules) |
tm-leverage | Pre-fill translations from Sievepen TM |
diff-leverage | Compare against previous version, preserve translations for unchanged text |
repetition-analysis | Analyze source text repetitions across blocks in the pipeline |
Validate tools — check quality without modifying:
| Tool | Description |
|---|---|
word-count | Count words per block |
char-count | Count characters per block |
segment-count | Count source and target segments in blocks |
qa-check | Rule-based quality checks (missing translations, whitespace, numbers, span constraints) |
dnt-check | Flag do-not-translate spans that were translated in the target (alias dnt) |
placeholder-check | Verify placeholders/variables are preserved between source and target |
brand-vocab-check | Check target text against brand vocabulary / preferred-term rules |
term-check | Verify terminology usage in translations against a glossary |
inconsistency-check | Check for translation inconsistencies across blocks |
length-check | Verify translation length constraints |
chars-check | Check for invalid or unexpected characters in translations |
pattern-check | Validate regex patterns in translations (placeholders, variables) |
translation-comparison | Compare translations across two target locales and report differences |
xml-validation | Validate XML well-formedness of block text |
chars-listing | List all unique characters used in content (for font subsetting) |
scoping-report | Classify blocks into scoping categories based on repetition and match status |
Convert tools — transform representations:
| Tool | Description |
|---|---|
encoding-convert | Convert character encoding of text content |
encoding-detect | Detect encoding characteristics of block text |
linebreak-convert | Normalize line endings in source and/or target text |
bom-convert | Add or remove the Unicode BOM marker on document layers |
fullwidth-convert | Convert between half-width and full-width characters |
uri-convert | Encode or decode URI escape sequences in text |
Pipeline tools — operate on the part stream:
| Tool | Description |
|---|---|
layer-processor | Apply format-specific tool chains to child layers |
external-command | Execute an external command on block text |
script | Run user-provided JavaScript (ES5 via goja) on each part |
batch | Collect blocks into configurable batches for downstream batch processing |
AI, MT, and terminology tools
AI and MT tools are registered at startup like the other built-ins, so they
appear in kapi tools and resolve in flows. Their distinguishing trait is
provider injection: the registry holds a default offline-provider factory, and
the real LLM/MT provider (with credentials) is supplied on demand via the
config factory when the tool is instantiated. They use the same Tool
interface and work identically in flows.
AI tools (core/ai/tools/):
| Tool | Description |
|---|---|
ai-translate | Translate blocks using an LLM provider (batch + concurrent) |
ai-qa | Check translation quality using an LLM provider |
ai-review | Review translations with explanations using an LLM |
ai-terminology | Extract terminology from blocks using an LLM |
ai-entity-extract | Extract named entities and term candidates using AI + optional NER |
MT tools (core/mt/tools/):
| Tool | Description |
|---|---|
{provider}-translate | Translate blocks using an MT provider (DeepL, Google, Microsoft, ModernMT, MyMemory) |
Terminology tools (termbase/):
| Tool | Description |
|---|---|
term-lookup | Annotate blocks with matching terms from a TermBase |
term-enforce | Verify correct terminology usage in translations |
TM tools (sievepen/):
| Tool | Description |
|---|---|
tm-leverage | Content-aware TM leverage with generalized, structural, and plain matching |
Flow steps format
Flows are authored as a YAML step list (compiled to the internal graph by the executor, see AD-004: Processing Engine):
A flow's source and sink are context-resolved bindings (AD-026: Flow I/O Binding), not fields of the flow document; the steps carry only the composition.
apiVersion: v1
kind: FlowDefinition
metadata:
name: Production Pipeline
spec:
steps:
- tool: tm-leverage
config:
fuzzyThreshold: 75
- tool: ai-translate
config:
provider: anthropic
- tool: qa-check
- parallel:
- tool: word-count
- tool: char-count
Steps are sequential by default; parallel: blocks provide fan-out. The
script step lets authors drop in custom JavaScript when no existing tool
fits.
Mutable streaming model
Tools modify Blocks in place as they flow through channels. This is a deliberate trade-off:
- Performance — no copying or delta accumulation for high-volume streaming; zero allocation per tool for pass-through Part types.
- Simplicity — tools read and write fields on the same Block. No immutable builders, lenses, or patch application.
- Proven pattern — Okapi Framework uses the same mutable-event model across thousands of localization workflows.
Document-level immutability is achieved by external storage layers that version entire Block states. Within a single pipeline execution, mutable streaming is the right trade-off.
Content immutability by capability
Mutable-in-place does not mean anything goes. A tool's write surface is a
compile-time property: it declares what it may write by which process-named
block handler it sets on BaseTool, and the handler's parameter type makes the
wrong writes unrepresentable.
| Handler | View | May write |
|---|---|---|
Annotate(BlockView) | source + target read-only | overlays, annotations, properties |
Translate(TargetView) | source read-only | target content (+ the above) |
Transform (edit producer) | source + target read-only | an edit plan the framework applies to source |
- Analysis / annotation tools (qa-check, word-count, term-lookup,
entity-extract, the segmenter) set
Annotate.BlockViewexposes no source/target setter, so they cannot mutate content — they emit overlays, annotations, and properties. - Translation tools (ai-translate, the MT tools, tm-leverage,
create-target) set
Translateand writeBlock.Targets; source stays read-only. - Transformers (redaction, normalization, case/encoding conversion) are the
only tools that rewrite
Block.Source, and they never do so directly. A transformer is a read-only edit producer: it inspects the block and returns an edit plan — a set of structuredmodel.RunEdits (a span→replacement map), any originals to vault (recoverable transformers such as redaction), or an opaque whole-block replacement for rewrites with no derivable mapping (LLM simplification). A single framework-owned applier is the one place that mutates the block: it applies the edits, rebases the surviving run-anchored overlays once (model.RemapOverlays) so segmentation, terms, and entities (see AD-002) follow the rewrite, vaults any secrets, and bounds-checks the result — atomically. Because tool code holds no source setter, a transformer cannot corrupt run-anchoring or leak a secret; an opaque whole-block replacement drops the overlays it cannot rebase. Recoverable transformers (redaction) keep the original in a block annotation or a sidecar vault and restore it on the way out.
The read views hand back the block's live run slices, which Go cannot make
deeply immutable without copying. So a dev/test backstop in
BaseTool.handleBlock content-hashes source and targets around each handler and
errors if a handler edited a surface its tier forbids (catching in-place edits
through those aliased slices). The applier likewise asserts that every surviving
source overlay span still anchors in-bounds against the rewritten runs
(Block.SourceOverlaysInBounds), so a rebase that left an overlay dangling is
rejected. The backstop is gated by tool.EnforceImmutability (on by default). A
tool that genuinely needs the maximal surface — script, which runs arbitrary
JavaScript — overrides Process instead and self-gates source mutation behind its
allowSourceMutation flag.
Transformer placement
Transformers and analyzers are ordinary steps in one ordered tool list; there is
no separate structural stage. Because the applier mutates inline and in order,
each transformer settles the source before later steps observe it, so analysis
that depends on a transform — segmentation over normalized text, an annotator
feeding a redactor (ai-entity-extract → redact) — sees the applied result.
Ordering safety is a placement pass that runs beside the data-flow contract,
using the Capability and SideEffects a tool already declares:
| Severity | Rule | Rationale |
|---|---|---|
| Error | a transformer must not follow a step that produces a committed target — unless it produces the target port itself (unredact rewrites both sides coherently) | rewriting source orphans the targets, which anchor to it |
| Error | a recoverable (redacting) transformer must run before any step that egresses source to a remote sink — except the step(s) producing an input its config-resolved contract requires | otherwise unprotected source leaks before redaction applies; a cloud NER feeding entity-driven redaction is the documented detection trade-off (AD-020) |
| Warning | a transformer placed later than its earliest valid slot (after its last required input) | every overlay present at apply time must be rebased; an earlier slot avoids the work |
The remote-egress rule keys off a remote source egress side-effect
(schema.SideEffectRemoteSourceEgress), distinct from a plain API call, so a
local detector or termbase lookup does not trip it while a cloud-provider call
does. The effect itself is config-refined: an AI tool pointed at a local
provider (Ollama, the offline demo) carries no remote egress. Tools — including
plugins — contribute their own placement diagnostics through the same
config-derived contract hook that resolves a tool's required inputs from its
configuration (e.g. redaction requires an upstream entity overlay only when
entity detection is enabled — and only a required input exempts its producer
from the egress rule, so a rules-only redact placed after a cloud NER step is
still rejected).
Consequences
- Implementing a new tool requires only embedding
BaseTooland setting one handler function field. - Unhandled Part types pass through automatically; no risk of accidentally dropping Parts.
- Plugin tools use the same interface via gRPC translation, so the pipeline treats all tools uniformly (AD-007: Plugin System and Okapi Bridge).
- Schema-driven CLI flags, flow editor config panels, and validation all share one schema representation — changes to a tool's config propagate automatically.
- IO contracts enable flow-level locale inference: the runner figures out whether to iterate project targets, run once on source, or run for a specific locale set based on declared cardinality.
- Annotation-based inter-tool communication keeps tools loosely coupled through shared data, not direct dependencies.
- Typed constants for
AnnotationType,SideEffect, andLocaleCardinalitycatch typos at compile time and enable IDE autocomplete. - Mixed-cardinality flows resolve cleanly through pass union; tool authors do not coordinate locale iteration.
Related
- AD-002: Content Model — Blocks, Annotations, and Fragment projections
- AD-004: Processing Engine — how Tools compose into Flows
- AD-005: Format System — readers and writers bracket the tool chain
- AD-026: Flow I/O Binding — a flow is composition only; tool = unit, binding = the ends
- AD-007: Plugin System and Okapi Bridge — plugin tools