Gå til hovedinnhold

The KLF family & the .klz package

KLF is the native serialization of one content atom — the translatable block. A project owns more than blocks, though: a translation memory and a termbase, plus stand-off annotations and media. The KLF family gives each of those atoms a native, deterministic, lossless format, and the .klz package bundles them into one portable artifact.

Two tiers: native vs interchange

For every atom there are two serializations, playing the same two roles KLF and XLIFF play for blocks (see KLF vs XLIFF):

AtomNative (lossless — pack, cache, hash)Interchange (lossy — industry handoff)
blocks + targetsKLF.klfXLIFF / PO
stand-off annotationsKLF annotations.klfl
translation memoryKLF-TMkapi-tm-formatTMX
termbaseKLF-TBkapi-termbase-formatTBX

The native forms round-trip every field of the internal model. The interchange forms map only the standard, industry-portable subset — which is exactly why they cannot be used to pack a project losslessly:

  • TMX drops the entity mappings (including the conceptId cross-link from a TM entity to a termbase concept), provenance origins and import sessions, per-entry properties, and notes.
  • TBX drops the term source (terminology vs brand_vocabulary), the competitorTerm flag, and the extensible properties.

KLF-TM and KLF-TB preserve all of these. They share the KLF discipline: a kind magic string, a MAJOR.MINOR schemaVersion with the same forward-compatibility contract as KLF, and a deterministic serializer (sorted records, no HTML escaping, trailing newline) so the bytes are stable for content hashing and git diffing. KLF-TM even reuses the same Run model as KLF blocks for its variant content, so inline codes, placeholders, and plural/select constructs serialize identically.

The .klz package

A .klz is a deterministic zip bundling a project's authoritative content, one member per content type — the same content-type set the sync protocol moves over the wire:

project.klz
├── manifest.json inventory: per-member sha256 + Merkle rootHash
├── blocks/<id>.klf KLF (blocks + targets)
├── annotations/<id>.klfl KLF annotations (stand-off overlays)
├── tm.klftm KLF-TM (translation memory)
├── termbase.klftb KLF-TB (terminology)
├── media/<name> opaque blobs

│ ── working-state members (AD-025 §5) ──
├── source/<name> ingested source document(s)
├── overlays.klfo in-progress overlays (targets, annotations, …)
└── history.jsonl advisory provenance log (opt-in; excluded from rootHash)

(A .klz also carries the project recipe in manifest.json — flows, plugins, defaults, content — kept out of the rootHash, so the package is a runnable project in a file. Side-effecting recipe like server:/hooks: travels inert.)

The same container serves two profiles, set by the manifest kind:

  • kind: kapi-project — the whole project (all locales, full recipe, TM, termbase, overlays, source identity + skeletons). The snapshot / transport parcel, moved by pack / unpack.
  • kind: kapi-interchange — a task-scoped bilingual slice for one locale pair (blocks, inline codes, segmentation, skeleton, TM/term context). This is neokapi's interchange format, sent to a translator/reviewer by extract and ingested by merge (see KLF vs XLIFF).

Both are parcels, not workspaces: day-to-day work happens in the ambient .kapi project (AD-025 §7).

Because every member is a native lossless format, the whole package is lossless: unpacking can seed a fresh translation memory, termbase, and block store, and the project's regenerable caches (the block-store cache, sync hashes) rebuild faithfully from it. The manifest's per-member SHA-256 and Merkle rootHash give the package — and each member — a stable content identity; unpacking verifies both before trusting the contents.

What a package deliberately excludes

A .klz packs the source of truth, not the regenerable caches. It excludes the block-store cache (blocks.db) and the sync hash cache, which rebuild on demand, and it excludes secrets (the sync claim token). This is what makes the package the at-rest twin of the sync wire format: packing is the sync converters writing files instead of protobuf, and unpacking is the inverse.

Raw source bytes are also excluded by default: a .klz carries each source's identity (path, format, content hash) + round-trip skeleton — enough to merge — but not the originals, so it doesn't duplicate git-tracked source. Pass --with-source to kapi extract / kapi pack to embed the raw bytes too (needed to re-extract offline). And kapi pack refuses to write a content-less .klz (a project with nothing extracted/translated yet) — like git bundle refusing an empty bundle; share the .kapi recipe via git instead.

Working state: hand-off and resume

A .klz is not only an at-rest snapshot of finished content — it can also carry in-progress working state, so work can stop, move between machines, and resume where it left off. There are two routes, and neither needs per-step CLI verbs:

  • .klz as an ad-hoc workspace (extract / transform / merge). No project required. The .klz is a portable bundle; the runtime is a persistent shadow cache keyed to the file, so transforms are fast and incremental and the .klz is rewritten only when you pack (or pass --pack). extract ingests sources (+ a recipe), running a tool/flow on the .klz transforms it in place, and merge emits the finished files. info shows whether the cache is dirty.

    kapi extract src/*.json -o work.klz --target-lang fr,qps # ingest
    kapi ai-translate work.klz --target-lang fr # transform (cache)
    kapi info work.klz # dirty?
    kapi pack work.klz # eject for sharing
    kapi merge work.klz -o l10n/ # emit → l10n/<lang>/<name>

    A small recipe (target locales + output layout) travels with the file, so merge needs no flags. Transforming reuses work already done instead of recomputing it, and each document's work stays isolated.

  • Project snapshot (pack / unpack). Inside a .kapi project the working state also includes the project TM and termbase. kapi pack snapshots all of it; kapi unpack rehydrates it into another machine's .kapi/ state dir. Within a project, re-running a flow also reuses the project's persistent cache (.kapi/cache/blocks.db), so resume there is simply running again.

    kapi pack -o snapshot.klz # snapshot the project's working state
    kapi unpack snapshot.klz # rehydrate elsewhere, then run to resume

The working-state members are source/<name> (the ingested documents) and overlays.klfo (the in-progress overlays — targets/<locale>, annotations/<name>, segmentation, … keyed by (kind, blockHash)). Overlays in a multi-document package are scoped per source, since block ids are only unique within one document. A recipe block in the manifest (not a content member, so out of the rootHash) carries the workspace's target locales and output layout.

Progress is derived from content, not a journal

Because the block store is append-only and content-addressed, "has step X run?" is a pure function of the content: does X's overlay exist for the current block hashes? That is what makes cached resume correct — re-running is a no-op where the overlay is already present, and a source change re-hashes its block so only the affected work recomputes. There is no authoritative progress journal — it would be a second source of truth that could drift from the content (the dual-state footgun this codebase avoids).

The one optional log is advisory provenance: kapi pack --log stamps a hash-chained line into history.jsonl recording the pack (tamper-evident custody for hand-off; unpack verifies the chain and warns if broken). It is strictly subordinate to content — excluded from the package rootHash, never read to decide anything, and safe to delete with no loss of work. A default pack (no --log) is byte-deterministic.

When to use it

Use a .klz to move a whole project losslessly: backup and archival, seeding a fresh server or an offline desktop working copy, transferring a project between machines without a server, or building a deterministic test fixture. Use the interchange formats (XLIFF/PO, TMX, TBX) instead when content crosses into the wider localization industry — a translation vendor, an external CAT tool, or a TMS — and the lossy, standards-based subset is what the other side expects.

See also