Working with Inline Formatting
When documents are processed through the pipeline, neokapi preserves inline formatting like bold, italic, links, and embedded values like variables and placeholders. This is handled through the Run model: a block's content is a flat []Run sequence in which inline markup becomes typed inline-code runs, normalizing format-specific markup into a format-independent representation.
Seeing inline codes in the content model
When kapi reads a file with inline markup, the markup does not stay in the text.
Each inline element becomes an inline-code run — a PcOpen/PcClose pair
for paired tags, a Ph for self-closing tokens — sitting in the run sequence
between text runs. Below, kapi parses an HTML page; in the parsed source text the
inline <strong> and <a> elements appear as inline-code runs (rendered here as
chips) rather than as literal tags:
This is the same representation regardless of the source format: the inline-code
runs mark where inline formatting sits, while the original markup is held in the
run's Data field so the writer can reconstruct it exactly.
How It Works
neokapi extracts inline formatting from any supported format and represents it as inline-code runs carrying semantic metadata. This means <b> (HTML), ** (Markdown), and <w:b/> (DOCX) all produce a PcOpen/PcClose pair with the same semantic type (fmt:bold), enabling format-independent processing.
Each inline-code run carries metadata that editors can use to guide translators:
- Display text (
Disp): Human-readable labels (e.g.,[B]for bold,[/B]for bold close) - Constraints (
Constraints): Whether a code is deletable, cloneable, or reorderable - Equivalent text (
Equiv): Plain text fallback (e.g.,\nfor line breaks)
See Vocabularies for the full semantic type system.
Constraint Rules
Each inline code carries constraint metadata that defines what translators may do with it:
Flexible tags (like bold, italic, underline):
- Deletable: can be removed if the target language doesn't need them
- Cloneable: can be duplicated to apply formatting to more text
- Reorderable: can be rearranged in the sentence
Required elements (like line breaks, variables, placeholders):
- Non-deletable: must appear in the translation
- Non-cloneable: cannot be duplicated
- Often non-reorderable: position relative to other codes is fixed
Variables and placeholders (like {userName} or {count}):
- Non-deletable and non-cloneable: must be preserved exactly
- Reorderable: can be moved to match target language grammar
Editors can use these constraints to prevent invalid changes and provide real-time validation (e.g., warning about missing required tags or duplicate non-cloneable tags).
Editing text that carries inline codes
The constraints are not only advice for a human translator — they govern what
automated edits may do to a code when the surrounding text changes. A tool that
rewrites a block's text (a find/replace such as ksed, a
normalization or redaction transform) works on the text-only flattening of the
runs, in which inline codes have no width. It then re-anchors the codes that
survive. The rule, applied per code, is:
- A paired code (
PcOpen/PcClose) is treated as a span over the text. After the edit its endpoints are re-anchored. If the span still covers text it is kept, balanced — editing a word inside a bold span keeps the span around the new word. If the edit empties the span, a deletable span (bold, italic, a link) is removed rather than left as an empty<b></b>, and a non-deletable span is kept (empty) so it is never silently dropped. - A standalone code (
Ph,Sub) that sits in text the edit removes is dropped when deletable and kept (re-anchored to the edit boundary) when not — so a line break, a variable, or a subblock reference survives an edit that deletes the text around it. - Codes outside the edited range are unaffected.
Because the deletability of each code is resolved from the same vocabulary the QA
checks consult, an edit and a translation are held to one definition of which
codes are required. The framework primitive is model.ApplyTextEdits; it mirrors
the Okapi Framework, where each Code carries a deleteable flag and only the
deleteable ones may be dropped.
Format-Independent Processing
HTML Files
HTML inline elements (<b>, <a href="...">, <br/>) are extracted as inline-code runs — paired tags as PcOpen/PcClose, void elements as Ph. Block-level elements form Block boundaries.
Example:
Click here to visit our website for more information.
Markdown Files
Markdown emphasis (**, *, backticks, []()) maps to the same semantic types as HTML. **bold** and <b>bold</b> both resolve to fmt:bold.
Example:
Run
kapi initto set up your project. See the documentation for details.
JSON/YAML Localization Files
i18n variables like {userName} or {count} become Ph runs marked as non-deletable. They can be rearranged to match target language word order:
Hello {userName}, you have {count} new messages. Bonjour {userName}, vous avez {count} nouveaux messages.
XLIFF Exchange Files
XLIFF <pc> maps to a PcOpen/PcClose pair, <ph> to a Ph run, and <sc>/<ec> to a PcOpen/PcClose pair — the same run model, enabling consistent processing regardless of exchange format.