Skip to main content

Pre-translate with TM + termbase

Works ad-hoc, optimized for a project

This pipeline runs ad-hoc against a loose ./tm.db or a named TM, with the source and target languages passed as flags. In a .kapi project it's better: the recipe carries the languages and the project TM is resolved automatically, so each release recycles the last one's leverage without repeating flags. Both forms are shown below; see Where TM and termbases live for which store each command uses.

Goal: do the cheap, deterministic work before any machine or human translator touches the content. You're about to translate a batch of UI strings; your team has a translation memory from previous releases and an approved glossary. Pre-translation reuses what you already have, fills the rest with a visible placeholder, and pre-flags terminology problems — in seconds, with no API key.

The scenario

Your project has:

  • messages_en.json — the English source strings.
  • project.tmx — a translation memory from previous releases.
  • glossary.csv — an approved bilingual glossary maintained by your terminology team.

The pipeline is three deterministic steps:

TM leveragechanpseudo-translatethe misseschanterm-check

TM leverage fills exact and fuzzy matches from previous releases; pseudo-translation covers everything the TM didn't, so no segment is silently left blank; term-check pre-flags any terminology problems before content goes out.

Step 1 — load your TM

Import the translation memory so leverage can read it. With no flag the import lands in a local tm.db file (or the project's TM when run inside a project); Where TM and termbases live covers selecting a named TM with --name or an explicit file with --file.

kapi tm import project.tmx -s en -t fr

Verify it loaded:

kapi tm stats

Step 2 — leverage the TM

Fill segments that have an exact or fuzzy match in the TM. With no flag, tm-leverage reads the same TM you just imported into; pass --tm <name|path> to leverage a different one. Exact matches are applied automatically; fuzzy matches (70% similarity by default) are filled too, with the score recorded so reviewers know which to double-check:

kapi tm-leverage messages_en.json \
-o step1_tm.json \
--source-lang en \
--target-lang fr

Tune the thresholds with --fuzzy-threshold and --fill-target-threshold.

Step 3 — pseudo-translate the misses

Everything the TM didn't cover gets a locale-shaped placeholder, so untranslated strings are visible rather than silently empty:

kapi pseudo-translate step1_tm.json -o step2_translated.json

Step 4 — pre-flag terminology

Run term-check over the result so terminology problems surface now, not after a vendor returns the file:

kapi term-check step2_translated.json \
--source-lang en \
--target-lang fr \
--termbase product-terms

(Import the glossary into product-terms first — see Enforce terminology.)

In a project

The ad-hoc steps above pass --source-lang, --target-lang, and a TM on every command. In a .kapi project you declare those once and run the pipeline as a named flow:

my-app.kapi
defaults:
source_language: en
target_languages: [fr]
flows:
pretranslate:
steps:
- tool: tm-leverage
- tool: pseudo-translate
- tool: term-check
kapi run pretranslate

kapi run discovers the nearest recipe, resolves the project TM and termbase automatically, and applies the declared languages — no per-command flags. See Create your first project.

Try it

The in-browser build pre-seeds a TM and a termbase. The rail tops up the TM from project.tmx, then runs the three-step pipeline. Open the intermediate step1_tm.json / step2_translated.json files in the pane to watch each step's output feed the next.

Loading the walkthrough…

The interactive embed and the recorded video both come from walkthroughs/kapi-terminology-pretranslation.scene.yaml. Change the scene spec and regenerate; don't edit the generated output by hand.

Going further: AI for the gaps

Pseudo-translation makes the untranslated segments visible; it isn't a real translation. To machine-translate the TM misses instead, swap step 3 for kapi ai-translate (which needs a provider and credentials), then keep the term-check step as your guardrail. See AI Translation for provider setup.

When this approach fits

  • An established glossary — even 50–100 key terms noticeably improves consistency.
  • Recurring content — UI strings, docs, and copy that get re-translated each release, where TM leverage recycles the most.
  • Multiple target languages — a single concept-oriented termbase keeps every language aligned, so the same pipeline runs per locale.

Next