Moses Text format
Moses InlineText is the line-oriented format used by the Moses statistical
machine-translation system. Each line in the file is one text unit, which
makes the format a natural fit for line-aligned source/target corpora.
Inline codes are represented with XLIFF-flavoured markup: <g> paired tags,
<x/> placeholders, <bx/>/<ex/> paired open/close codes (all carrying
numeric ids), and <lb/> for line breaks within an entry.
neokapi reads and writes Moses InlineText. The reader walks the input
line by line — carriage return, line feed, and CRLF all act as line
separators — and emits each non-empty line as one translatable Block with
whitespace preserved; empty lines flow through as non-translatable data. It
also decodes the Moses inline markup, parsing <g>/<x/>/<bx/>/<ex/>
into inline codes and <lb/> into line breaks so the surrounding text
reaches the Block as translatable content while the markup survives as
opaque codes. This format has no configurable JSON schema in this reference.
This format has no configurable parameters.
Processing notes
One translatable Block per non-empty line; CR, LF, and CRLF all act as line separators and whitespace is preserved.
Decodes Moses inline markup —
<g>,<x/>,<bx/>,<ex/>into inline codes and<lb/>into line breaks.Empty lines flow through as non-translatable data.
Limitations
Each line is treated as an independent text unit; there is no document-level structure beyond line ordering.
Empty lines carry no translatable content and pass through as data.
← Back to the Format Reference