Gå til hovedinnhold

Moses Text format

Moses InlineText is the line-oriented format used by the Moses statistical machine-translation system. Each line in the file is one text unit, which makes the format a natural fit for line-aligned source/target corpora. Inline codes are represented with XLIFF-flavoured markup: <g> paired tags, <x/> placeholders, <bx/>/<ex/> paired open/close codes (all carrying numeric ids), and <lb/> for line breaks within an entry.

neokapi reads and writes Moses InlineText. The reader walks the input line by line — carriage return, line feed, and CRLF all act as line separators — and emits each non-empty line as one translatable Block with whitespace preserved; empty lines flow through as non-translatable data. It also decodes the Moses inline markup, parsing <g>/<x/>/<bx/>/<ex/> into inline codes and <lb/> into line breaks so the surrounding text reaches the Block as translatable content while the markup survives as opaque codes. This format has no configurable JSON schema in this reference.

IDmosestext
SourceBuilt-in
MIME Typestext/x-mosestext
CapabilitiesRead + Write

This format has no configurable parameters.

Processing notes

  • One translatable Block per non-empty line; CR, LF, and CRLF all act as line separators and whitespace is preserved.

  • Decodes Moses inline markup — <g>, <x/>, <bx/>, <ex/> into inline codes and <lb/> into line breaks.

  • Empty lines flow through as non-translatable data.

Limitations

  • Each line is treated as an independent text unit; there is no document-level structure beyond line ordering.

  • Empty lines carry no translatable content and pass through as data.

← Back to the Format Reference