The KLF family & the .klz package
KLF is the native serialization of one content atom — the translatable
block. A project owns more than blocks, though: a
translation memory and a termbase, plus stand-off annotations and media. The
KLF family gives each of those atoms a native, deterministic, lossless
format, and the .klz package bundles them into one portable artifact.
Two tiers: native vs interchange
For every atom there are two serializations, playing the same two roles KLF and XLIFF play for blocks (see KLF vs XLIFF):
| Atom | Native (lossless — pack, cache, hash) | Interchange (lossy — industry handoff) |
|---|---|---|
| blocks + targets | KLF — .klf | XLIFF / PO |
| stand-off annotations | KLF annotations — .klfl | — |
| translation memory | KLF-TM — kapi-tm-format | TMX |
| termbase | KLF-TB — kapi-termbase-format | TBX |
The native forms round-trip every field of the internal model. The interchange forms map only the standard, industry-portable subset — which is exactly why they cannot be used to pack a project losslessly:
- TMX drops the entity mappings (including the
conceptIdcross-link from a TM entity to a termbase concept), provenance origins and import sessions, per-entry properties, and notes. - TBX drops the term
source(terminology vsbrand_vocabulary), thecompetitorTermflag, and the extensible properties.
KLF-TM and KLF-TB preserve all of these. They share the KLF discipline: a kind
magic string, a MAJOR.MINOR schemaVersion with the same forward-compatibility
contract as KLF, and a deterministic serializer
(sorted records, no HTML escaping, trailing newline) so the bytes are stable for
content hashing and git diffing. KLF-TM even reuses the same Run model as KLF
blocks for its variant content, so inline codes, placeholders, and plural/select
constructs serialize identically.
The .klz package
A .klz is a deterministic zip bundling a project's authoritative content,
one member per content type — the same content-type set the
sync protocol moves over the wire:
project.klz
├── manifest.json inventory: per-member sha256 + Merkle rootHash
├── blocks/<id>.klf KLF (blocks + targets)
├── annotations/<id>.klfl KLF annotations (stand-off overlays)
├── tm.klftm KLF-TM (translation memory)
├── termbase.klftb KLF-TB (terminology)
├── media/<name> opaque blobs
│
│ ── working-state members (AD-025 §5) ──
├── source/<name> ingested source document(s)
├── overlays.klfo in-progress overlays (targets, annotations, …)
└── history.jsonl advisory provenance log (opt-in; excluded from rootHash)
(A .klz also carries the project recipe in manifest.json — flows, plugins,
defaults, content — kept out of the rootHash, so the package is a runnable
project in a file. Side-effecting recipe like server:/hooks: travels inert.)
The same container serves two profiles, set by the manifest kind:
kind: kapi-project— the whole project (all locales, full recipe, TM, termbase, overlays, source identity + skeletons). The snapshot / transport parcel, moved bypack/unpack.kind: kapi-interchange— a task-scoped bilingual slice for one locale pair (blocks, inline codes, segmentation, skeleton, TM/term context). This is neokapi's interchange format, sent to a translator/reviewer byextractand ingested bymerge(see KLF vs XLIFF).
Both are parcels, not workspaces: day-to-day work happens in the ambient .kapi
project (AD-025 §7).
Because every member is a native lossless format, the whole package is lossless:
unpacking can seed a fresh translation memory, termbase, and block store, and the
project's regenerable caches (the block-store cache, sync hashes) rebuild
faithfully from it. The manifest's per-member SHA-256 and Merkle rootHash give
the package — and each member — a stable content identity; unpacking verifies
both before trusting the contents.
What a package deliberately excludes
A .klz packs the source of truth, not the regenerable caches. It excludes
the block-store cache (blocks.db) and the sync hash cache, which rebuild on
demand, and it excludes secrets (the sync claim token). This is what makes the
package the at-rest twin of the sync wire format: packing is the sync converters
writing files instead of protobuf, and unpacking is the inverse.
Raw source bytes are also excluded by default: a .klz carries each source's
identity (path, format, content hash) + round-trip skeleton — enough to merge —
but not the originals, so it doesn't duplicate git-tracked source. Pass
--with-source to kapi extract / kapi pack to embed the raw bytes too (needed
to re-extract offline). And kapi pack refuses to write a content-less .klz
(a project with nothing extracted/translated yet) — like git bundle refusing an
empty bundle; share the .kapi recipe via git instead.
Working state: hand-off and resume
A .klz is not only an at-rest snapshot of finished content — it can also carry
in-progress working state, so work can stop, move between machines, and resume
where it left off. There are two routes, and neither needs per-step CLI verbs:
-
.klzas an ad-hoc workspace (extract/ transform /merge). No project required. The.klzis a portable bundle; the runtime is a persistent shadow cache keyed to the file, so transforms are fast and incremental and the.klzis rewritten only when youpack(or pass--pack).extractingests sources (+ a recipe), running a tool/flow on the.klztransforms it in place, andmergeemits the finished files.infoshows whether the cache is dirty.kapi extract src/*.json -o work.klz --target-lang fr,qps # ingestkapi ai-translate work.klz --target-lang fr # transform (cache)kapi info work.klz # dirty?kapi pack work.klz # eject for sharingkapi merge work.klz -o l10n/ # emit → l10n/<lang>/<name>A small recipe (target locales + output layout) travels with the file, so
mergeneeds no flags. Transforming reuses work already done instead of recomputing it, and each document's work stays isolated. -
Project snapshot (
pack/unpack). Inside a.kapiproject the working state also includes the project TM and termbase.kapi packsnapshots all of it;kapi unpackrehydrates it into another machine's.kapi/state dir. Within a project, re-running a flow also reuses the project's persistent cache (.kapi/cache/blocks.db), so resume there is simply running again.kapi pack -o snapshot.klz # snapshot the project's working statekapi unpack snapshot.klz # rehydrate elsewhere, then run to resume
The working-state members are source/<name> (the ingested documents) and
overlays.klfo (the in-progress overlays — targets/<locale>,
annotations/<name>, segmentation, … keyed by (kind, blockHash)). Overlays in a
multi-document package are scoped per source, since block ids are only unique
within one document. A recipe block in the manifest (not a content member, so
out of the rootHash) carries the workspace's target locales and output layout.
Progress is derived from content, not a journal
Because the block store is append-only and content-addressed, "has step X run?" is a pure function of the content: does X's overlay exist for the current block hashes? That is what makes cached resume correct — re-running is a no-op where the overlay is already present, and a source change re-hashes its block so only the affected work recomputes. There is no authoritative progress journal — it would be a second source of truth that could drift from the content (the dual-state footgun this codebase avoids).
The one optional log is advisory provenance: kapi pack --log stamps a
hash-chained line into history.jsonl recording the pack (tamper-evident custody
for hand-off; unpack verifies the chain and warns if broken). It is strictly
subordinate to content — excluded from the package rootHash, never read to
decide anything, and safe to delete with no loss of work. A default pack (no
--log) is byte-deterministic.
When to use it
Use a .klz to move a whole project losslessly: backup and archival, seeding a
fresh server or an offline desktop working copy, transferring a project between
machines without a server, or building a deterministic test fixture. Use the
interchange formats (XLIFF/PO, TMX, TBX) instead when content crosses into
the wider localization industry — a translation vendor, an external CAT tool, or
a TMS — and the lossy, standards-based subset is what the other side expects.
See also
- AD-025: KLF Family and the .klz Package — the decision and rationale.
- Specification — the KLF block format the family is built around.
- KLF vs XLIFF — the native-vs-interchange split for blocks.
- Reference implementations:
sievepen/klftm(KLF-TM),termbase/klftb(KLF-TB),klz(the package container).