AD-008: Kapi Project Model
Summary
A kapi project is a folder containing a {name}.kapi YAML recipe at its root
and a sibling .kapi/ state directory. The recipe captures the user's
declarative intent — identity, content collections, flows, store selection,
plus any platform extensions (such as a server: block) when a platform layer
is in use — while .kapi/ holds working state, with all regenerable caches under
.kapi/cache/. A ProjectContext resolves the recipe into a runtime
configuration, and a BlockStore interface with pluggable providers gives
tools random-access storage beyond the streaming pipeline.
Context
Localization workflows need to persist more than an in-flight stream of parts:
- Translators add targets over time, per locale.
- Multiple tools (term lookup, TM leverage, QA) contribute independent annotation layers.
- Re-running a flow must not re-translate blocks whose source has not changed.
- Content collections group heterogeneous source files with different formats, writer outputs, and language targets.
The channel-based Part → Tool → Part model (AD-004: Processing Engine)
is a forward-only transform. It does not cover random-access reads,
incremental work, or parallel tools writing independent annotation layers.
A declarative project file captures the user's intent (which plugins, which
collections, which flows). A local block store captures the working state.
The project folder is the day-to-day working unit and the default unit users
share, back up, and commit — discovered by a git-style upward walk, never named
on a command. Sharing a project usually means sharing the folder (via git). When
a folder cannot travel — emailing a translator, an archive, an air-gapped transfer
— the single-file .klz parcel carries the same state losslessly
(AD-025 §7), and a task-scoped bilingual .klz is what
goes to a translator or reviewer (AD-017). The
.klz is a parcel for boundaries, not a competing working model: you open it into
a project, work in the folder, and pack to ship.
Decision
Project layout
Three ownership zones at the project root:
my-app/
├── my-app.kapi ← RECIPE (user edits, click-to-open)
├── .kapi/ ← WORKING STATE (kapi maintains)
│ ├── manifest.yaml ← bookkeeping: block counts, fingerprints, timestamps
│ ├── tm.db ← project translation memory (AD-009) — authoritative
│ ├── termbase.db ← project termbase — authoritative
│ ├── flows/ ← optional file-per-flow definitions (authored)
│ │ └── <flow>.yaml
│ └── cache/ ← all regenerable caches under one roof
│ ├── blocks.db ← block store (SQLite, was `.kapi/cache.db`)
│ ├── sync-cache.json ← kapi push/pull state (only with server: block)
│ ├── extractions/ ← per-extract batch state (AD-017)
│ │ └── <batch-id>/
│ │ ├── manifest.yaml ← source→output pairs, leverage, hashes
│ │ ├── skel-<src-hash>.bin ← per-source skeleton for merge
│ │ └── suggestions.jsonl ← sub-threshold TM matches
│ └── collections/ ← overlay layers per collection
│ └── ui/
│ ├── targets/{fr,de}.json
│ ├── annotations/{terms,tm-matches,qa}.json
│ └── skeletons/
├── src/ ← authored sources (user-owned)
│ └── **/*.tsx
└── i18n/ ← generated translations (format writer output)
└── {de,fr}.json
Ownership:
{name}.kapi— the user's. Hand-edited YAML. The click-to-open handle for kapi-desktop. Committed to git..kapi/— kapi's. Authoritative state (tm.db,termbase.db,manifest.yaml) sits at the top level; all regenerable caches live under.kapi/cache/so users can blow them away without losing translation work. Gitignored by default; opt in to commit.kapi/tm.db/.kapi/termbase.dbwhen cross-clone reproducibility matters.src/**— user-authored content. Referenced by the recipe; never moved into.kapi/.- Writer outputs (e.g.
i18n/{locale}.json) — produced by format writers the recipe declares. The runtime consumes these; kapi does not.
The name pair mirrors git: .gitignore file plus .git/ folder at the same
root.
Recipe schema
The recipe is a YAML document parsed into core/project.KapiProject:
# my-app.kapi
version: v1
name: My App Localization
content:
- name: ui
items:
- path: "src/**/*.{tsx,jsx}"
format:
name: exec
config:
command: "vp kapi-react extract --stream"
target: "i18n/{lang}.json"
- path: "src/i18n/en/*.json"
format: json
target: "i18n/{lang}.json"
plugins:
okapi: "^1.47.0"
flows:
translate:
steps:
- tool: ai-translate
config:
provider: anthropic
- tool: qa-check
full-pipeline:
steps:
- tool: tm-leverage
config:
fuzzy_threshold: 75
- tool: ai-translate
- tool: qa-check
defaults:
source_language: en-US
target_languages: [fr-FR, de-DE, ja-JP]
concurrency: 4
parallel_blocks: 3
encoding: utf-8
tools:
redact:
detectors: [rules]
rules:
- term: Acme Corp
category: org
defaults.tools holds project-level tool presets: per-tool config defaults
applied wherever that tool runs in a project flow. A flow step's own config
overrides the preset per key (the step wins), so a project pins, say, its
redaction rules or a pseudo-translation prefix once while an individual flow
refines them. Resolution happens at tool construction, and the data-flow and
placement gates (AD-006) validate against the same
merged config the runtime uses — a preset that enables redact's entity
detection makes the upstream entity port required exactly as an inline
config would. The flow editor badges preset-backed steps and shows the
inherited values with override indicators in the step's config panel.
Required fields: version: v1 (must equal the current schema version) and, for
each content item, a non-empty path. Every flow contains at least one step
with a non-empty tool (unless the step uses parallel, in which case the
parallel branches carry the tools). name is recommended as a project label
but is optional and not validated.
The recipe holds provider names only — API keys live in the OS keychain (see AD-013: Kapi CLI) or environment. Nothing in the recipe is secret; it is safe to commit.
Discovery is git-style: kapi tools walk up from the current directory until
they find a *.kapi file. Multiple recipes at the same directory level
require an explicit -p <path> flag.
Content paths
Each content item's path is a doublestar
glob — ** matches across directories and {a,b,c} matches alternatives, so a
single glob (input/store/*.{json,yaml,html} or input/**/*) covers a directory
of mixed content with the format auto-detected per file. The optional base
(set per item, or once on a collection and inherited) is the directory a matched
file's path is made relative to; it defaults to the glob's fixed prefix.
target is a path template expanded per file and language. The common case is
directory-mirror: when the target names a directory (it ends with /, or its
last segment has no extension and no token), the source path relative to base
is reproduced under it, so target: output/{lang} mirrors the tree —
input/docs/api.md → output/fr-FR/docs/api.md. For custom layouts, tokens
({lang}, {relpath}, {path}, {dir}, {filename}, {name}/{basename},
{ext}) reshape the path explicitly. The resolver is core/project.ResolveTargetPath;
the full token reference lives in the project file reference.
Recipe extension mechanism
The framework recipe (KapiProject) carries an Extras map[string]yaml.Node
field with yaml:",inline" on KapiProject, Defaults, ContentCollection,
and ContentItem. Unknown top-level YAML keys are captured as raw nodes;
platform layers declare their own typed schema and decode
from Extras at load time. The framework knows nothing about platform-
specific extensions and round-trips them verbatim.
A platform package registers schemas at init():
coreproj.RegisterExtensionGroup("myplugin", []coreproj.Extension{
{Name: "server", Scope: coreproj.ScopeProject, Decoder: serverDecoder},
{Name: "hooks", Scope: coreproj.ScopeProject, Decoder: hooksDecoder},
// ...
})
Scope distinguishes which Extras map a key belongs to: ScopeProject,
ScopeDefaults, ScopeCollection, or ScopeItem. Each (Scope, Name)
binds to one decoder. KapiProject.Validate() walks every Extras map and
runs the matching decoder; unknown keys (no decoder registered) round-
trip without error so binaries with different sets of plugins linked in
remain forward-compatible.
Recipes can declare a hard dependency via requires:, a map of plugin name to
version constraint (use "*" for any version; semver forms such as ^1.0 are
also accepted). Validation fails when no extension under the named group has
been registered:
version: v1
requires:
myplugin: "*"
server:
url: https://platform.example.com/team/proj
A binary that doesn't link the myplugin extensions rejects this recipe
with a clear "binary not built with myplugin linked in" message. A recipe
without requires: loads in any binary; the extras pass through.
Implementation details — including the Scope enum, decoder helpers, and
a worked example — live in
Note: Plugin model.
Example: a platform "connected project" extension
The framework has no built-in notion of a server, sync, or connection — those
are not recipe fields. A platform builds a "connected project" on top of the
generic mechanism above: it registers a ScopeProject extension (say,
server:) and gates it with requires:, so a recipe carrying the key is
meaningful only when that plugin is installed.
version: v1
requires:
myplugin: "*"
server:
url: https://platform.example.com/my-team/abc123
A recipe with no such key is a pure local project; kapi tools that don't recognize the key tolerate it and round-trip it verbatim. The key's schema, the commands that act on it, and any credential handling are the platform's concern, documented in that platform's own docs — not here.
Content collections
A ContentCollection lists the source patterns kapi extracts from and the
format reader used for each. Extracted blocks flow through the project's
flow executor; persistent block state (hashes, per-locale targets,
annotations) lives in the project's block store.
For subprocess-based extractors (JSX via kapi-react, bespoke DSL walkers), the
format is exec:
items:
- path: "src/**/*.tsx"
format:
name: exec
config:
command: "vp kapi-react extract --stream"
Kapi runs the declared command once per collection with every matched file
path streamed on stdin (NUL-separated) and reads NDJSON block records from
stdout. The developer picks the package manager (vp, pnpm, npm, yarn,
or a direct binary path) — kapi runs whatever the command says verbatim.
Generated translations land wherever the recipe's writers point — typically
outside .kapi/.
State manifest
.kapi/manifest.yaml is kapi's bookkeeping: block counts, per-source SHA-256
fingerprints for staleness detection, generator identity, and last-updated
timestamps. Users do not hand-edit it. Deleting it is safe — it rebuilds from
cache/blocks.db; nothing authoritative lives only in the manifest.
Extraction manifests
.kapi/cache/extractions/<batch-id>/manifest.yaml records each kapi extract
run (see AD-017): the emitted
source→output pairs, per-file source SHA-256, TM leverage counts, the
XLIFF / PO version, and skeleton filenames. The batch id is stamped in
each emitted bilingual file so kapi merge can resolve a returning
file back to the right extraction without guessing from the filename.
Stale segments on merge are detected by comparing the manifest's
recorded source hash against the current source content.
The Defaults.Merge section of the recipe (conflict_policy) governs
how merge applies a translator's target when an on-disk target or TM
TU already exists. The Defaults.TM section (fuzzy_threshold,
read) governs TM pre-fill on extract. The Defaults.Segmentation
section (source, srx) toggles the SRX segmentation overlay — block
identity is stable across toggles, so a project can change these
fields between extractions safely.
Store interface
Flows and tools read and write blocks and overlays through the Store and
Session interfaces (package core/blockstore), not through raw channels.
The streaming contract is preserved as one capability among several.
type Store interface {
Begin(ctx context.Context) (Session, error)
Capabilities() Capabilities
Close() error
}
type Session interface {
Capabilities() Capabilities
Blocks(filter BlockFilter) iter.Seq2[*Block, error]
GetBlock(hash string) (*Block, error)
PutBlock(collection string, b *Block) error
GetOverlay(kind, blockHash string) (Overlay, error)
PutOverlay(s Overlay) error
ListOverlays(kind string) iter.Seq2[Overlay, error]
Commit() error
Rollback() error
Close() error
}
type Capabilities struct {
RandomAccess bool
Concurrent bool
Remote bool
Writable bool
Persistent bool
}
Block store providers
| Provider | Backing | Use case |
|---|---|---|
memory | Go maps | ephemeral flows, tests, ad-hoc CLI invocations |
cache | SQLite at .kapi/cache/blocks.db | default for kapi projects, long-lived local work |
Tools never open cache/blocks.db directly — they operate on a session. Swapping
defines the interface.
Flow executor operates on a Session
Tools keep the channel-based Tool.Process(ctx, in, out) contract
(AD-006: Tool System). A tool that needs the store
implements the optional SessionTool extension (in core/tool/session.go),
which adds a session handle alongside the same streaming channels:
type SessionTool interface {
Tool
SessionProcess(
ctx context.Context,
sess blockstore.Session,
in <-chan *model.Part,
out chan<- *model.Part,
) error
}
The executor opens one session per run, dispatches each stage through
SessionProcess when the tool implements SessionTool (otherwise plain
Process), and owns the transaction boundary — tools must not call
Commit/Rollback themselves:
session, err := store.Begin(ctx)
if err != nil {
return err
}
defer session.Close()
// per stage, wired to in/out channels:
if st, ok := t.(tool.SessionTool); ok {
err = st.SessionProcess(ctx, session, in, out)
} else {
err = t.Process(ctx, in, out)
}
if err != nil {
return session.Rollback()
}
return session.Commit()
SessionTool is the path for tools that want random access — term
enforcement, multi-pass statistics, QA across the whole store.
ProjectContext
A ProjectContext (package core/project) bridges the static recipe and the
live runtime. Every consumer that runs in project mode constructs one:
type ProjectContext struct {
Project *KapiProject
ProjectDir string
SourceLocale model.LocaleID
TargetLocales []model.LocaleID
AllowedSources []string
Encoding string
Concurrency int
ParallelBlocks int
LocaleFormat string
FormatDefaults map[string]FormatDefaults
}
func NewProjectContext(proj *KapiProject, projectPath string) *ProjectContext
AllowedSources derives from the plugins section. It always includes
"built-in" plus each declared plugin name. A project without a plugins
section sees built-in formats only.
Project-scoped format detection
func (ctx *ProjectContext) DetectFormat(
reg *registry.FormatRegistry, path string,
) string
Delegates to FormatRegistry.DetectFileWithPriorities(path, ctx.AllowedSources, overrides) — content-aware (the file head disambiguates a shared extension)
with per-call priority overrides. When a plugin (say okapi-bridge) is installed
globally but the project does not declare it, plugin formats at higher priority
are excluded and built-in formats are used instead. Explicitly declared formats
in content items (format: okf_json) bypass detection entirely and are always
honored.
overrides come from defaults.formats.<format>.priority: when an extension is
claimed by several formats at equal priority (e.g. .srt by both okf_vtt and
okf_regex), a recipe steers detection by bumping the preferred engine —
defaults:
formats:
okf_vtt:
priority: 110 # win .srt over okf_regex
This lets a single wildcard content item (path: "input/*", no format:)
auto-detect the right engine per file instead of pinning a format on one item
per extension. The override is applied per detection call, not by mutating the
registry, so concurrently open projects with different priorities don't race.
Content resolution
func (ctx *ProjectContext) ResolveContent(
reg *registry.FormatRegistry,
) ([]ResolvedFile, error)
type ResolvedFile struct {
Path string
Relative string
Format string
Collection string
Pattern string
Item *ContentItem
}
Matches content patterns against the filesystem, applies ignore rules, detects formats using project-scoped detection, and returns the resolved file list. Both the CLI and kapi-desktop use this single implementation.
Reader and writer configuration
func (ctx *ProjectContext) ConfigureReader(
reader Configurable, formatName string,
) error
func (ctx *ProjectContext) ConfigureWriter(writer format.DataFormatWriter)
ConfigureReader applies a format's FormatDefaults.Config overrides (via
cfg.ApplyMap) from defaults.formats.<format> onto the reader's config; it
takes the Configurable interface — any component exposing
Config() format.DataFormatConfig, which a DataFormatReader satisfies — and
is a no-op when the project declares no defaults for that format or the
component has no config. ConfigureWriter takes only the writer (no
formatName, no return) and sets its encoding from the project defaults.
Preset selection (e.g. defaults.formats.okf_html.preset: strict-extraction)
is resolved separately — not by ConfigureReader — through
resolver.ResolveFormatConfig (see cli/flow.go), which merges the named
preset's config before the reader is opened.
Flow execution settings
The executor's flow.ResourceContext carries the resource-resolution context
for a single run:
type ResourceContext struct {
ProjectDir string
OutputDir string
SourceLocale string
TargetLocale string
ToolName string
}
Project-scoped execution defaults (Concurrency, ParallelBlocks, Encoding,
FormatDefaults) live on core/project.ProjectContext, not on
ResourceContext.
CLI flags and desktop UI settings override project defaults when explicitly set. The project provides defaults, not mandates.
Plugin scoping
AllowedSources generalizes beyond format detection:
- Tool scoping —
AllowedTools()filters the tool registry to tools from declared plugins plus built-ins. The flow editor lists only available tools. - Preset scoping — framework presets from undeclared plugins are excluded from preset selectors.
- Flow validation — flows referencing tools from undeclared plugins produce warnings during project validation.
Sharing and CLI integration
A project is a folder. Sharing means sharing the folder — git, tarball, rsync. Kapi does not prescribe a bundling format.
The kapi CLI (AD-013: Kapi CLI) uses projects via the -p
flag or through kapi init:
kapi init # scaffold {name}.kapi + .kapi/
kapi run translate -p my-app.kapi # run a declared flow
kapi ai-translate -p my-app.kapi # tool runs against the project
kapi pseudo-translate file.json # tool runs ad-hoc, no project
kapi-desktop (AD-014: Kapi Desktop) opens .kapi files
as documents and operates on the project folder.
Consequences
- Incremental work: re-running a flow translates only blocks whose source hash
is not already in
targets/<locale>. - Concurrent tools: term match and TM lookup run in parallel, each writing an independent overlay layer.
- Multi-pass tools: compute statistics across the whole store, then use them in a second pass.
- Transaction semantics vary per provider: SQLite transaction for
cache, tools callingGetBlockper-block are slow against remote stores. - The project file is always free of credentials — safe for commit and sharing.
Related
- AD-002: Content Model — Block, Run, Overlay
- AD-004: Processing Engine — flow execution
- AD-006: Tool System — Tool and SessionTool interfaces
- AD-013: Kapi CLI — CLI use of projects
- AD-014: Kapi Desktop — desktop app use of projects
- Flow Steps Format — shared flow syntax
- .kapi Project File — schema reference