Pre-translate with TM + termbase
This pipeline runs ad-hoc against a loose ./tm.db or a named TM, with the
source and target languages passed as flags. In a .kapi project it's better:
the recipe carries the languages and the project TM is resolved automatically, so
each release recycles the last one's leverage without repeating flags. Both forms
are shown below; see
Where TM and termbases live for which store
each command uses.
Goal: do the cheap, deterministic work before any machine or human translator touches the content. You're about to translate a batch of UI strings; your team has a translation memory from previous releases and an approved glossary. Pre-translation reuses what you already have, fills the rest with a visible placeholder, and pre-flags terminology problems — in seconds, with no API key.
The scenario
Your project has:
messages_en.json— the English source strings.project.tmx— a translation memory from previous releases.glossary.csv— an approved bilingual glossary maintained by your terminology team.
The pipeline is three deterministic steps:
TM leverage fills exact and fuzzy matches from previous releases;
pseudo-translation covers everything the TM didn't, so no segment is silently
left blank; term-check pre-flags any terminology problems before content goes
out.
Step 1 — load your TM
Import the translation memory so leverage can read it. With no flag the import
lands in a local tm.db file (or the project's TM when run inside a project);
Where TM and termbases live covers selecting
a named TM with --name or an explicit file with --file.
kapi tm import project.tmx -s en -t fr
Verify it loaded:
kapi tm stats
Step 2 — leverage the TM
Fill segments that have an exact or fuzzy match in the TM. With no flag,
tm-leverage reads the same TM you just imported into; pass --tm <name|path>
to leverage a different one. Exact matches are applied automatically; fuzzy
matches (70% similarity by default) are filled too, with the score recorded so
reviewers know which to double-check:
kapi tm-leverage messages_en.json \
-o step1_tm.json \
--source-lang en \
--target-lang fr
Tune the thresholds with --fuzzy-threshold and --fill-target-threshold.
Step 3 — pseudo-translate the misses
Everything the TM didn't cover gets a locale-shaped placeholder, so untranslated strings are visible rather than silently empty:
kapi pseudo-translate step1_tm.json -o step2_translated.json
Step 4 — pre-flag terminology
Run term-check over the result so terminology problems surface now, not after
a vendor returns the file:
kapi term-check step2_translated.json \
--source-lang en \
--target-lang fr \
--termbase product-terms
(Import the glossary into product-terms first — see
Enforce terminology.)
In a project
The ad-hoc steps above pass --source-lang, --target-lang, and a TM on every
command. In a .kapi project you declare those once and run the pipeline as a
named flow:
defaults:
source_language: en
target_languages: [fr]
flows:
pretranslate:
steps:
- tool: tm-leverage
- tool: pseudo-translate
- tool: term-check
kapi run pretranslate
kapi run discovers the nearest recipe, resolves the project TM and termbase
automatically, and applies the declared languages — no per-command flags. See
Create your first project.
Try it
The in-browser build pre-seeds a TM and a termbase. The rail tops up the TM from
project.tmx, then runs the three-step pipeline. Open the intermediate
step1_tm.json / step2_translated.json files in the pane to watch each step's
output feed the next.
Loading the walkthrough…
The interactive embed and the recorded video both come from
walkthroughs/kapi-terminology-pretranslation.scene.yaml. Change the scene spec and regenerate; don't edit the generated output by hand.
Going further: AI for the gaps
Pseudo-translation makes the untranslated segments visible; it isn't a real
translation. To machine-translate the TM misses instead, swap step 3 for
kapi ai-translate (which needs a provider
and credentials), then keep the term-check step as your guardrail. See
AI Translation for provider setup.
When this approach fits
- An established glossary — even 50–100 key terms noticeably improves consistency.
- Recurring content — UI strings, docs, and copy that get re-translated each release, where TM leverage recycles the most.
- Multiple target languages — a single concept-oriented termbase keeps every language aligned, so the same pipeline runs per locale.
Next
kapi tmandkapi tm-leverage— command references.- Where TM and termbases live — loose file, named store, or project store, and the flags that select each.
- Translation Memory — how TM matching works.
- Enforce terminology — the validation side of the same termbase.