Skip to main content

Segmentation tool

The Segmentation tool splits a block's text into sentence-level segments. Segmentation determines the unit of translation and of translation-memory matching, so consistent segmentation is important for leverage and review. By default the tool segments source text using built-in SRX-style rules that handle common sentence boundaries and abbreviations; rules can also be loaded from an SRX file.

The number of segments produced is recorded on each block. Already-segmented text is left alone unless re-segmentation is requested. Target text can be segmented independently, with its own rules file.

IDsegmentation
SourceBuilt-in
Categorytext-processing
Cardinalitymonolingual
Tagstext-processing

Parameters

ParameterTypeDefaultDescription
credentialstringStored credential name for the llm engine
enginestringSegmenter backend: srx (rule-based; default)/ uax29 (Unicode baseline)/ llm (semantic chunks)/ sat (ML model)
instructionstringOptional guidance for the llm engine
layerstringSegmentation overlay layer name; empty uses the engine's natural layer
modelstringModel name for the llm or sat engine
overwriteSegmentationbooleanfalseRe-segment already-segmented blocks replacing previous segmentation
providerstringAI provider id for the llm engine
renumberCodesbooleanfalseRenumber inline code IDs when materializing segments to a bilingual format
satModelstringSaT model for the sat engine (e.g. sat-3l-sm
segmentSourcebooleantrueSegment the source text
segmentTargetbooleanfalseSegment existing target text
sourceSrxPathstringPath to an SRX 2.0 rules file for source text (srx engine)
targetSrxPathstringPath to an SRX 2.0 rules file for target text (srx engine)
thresholdnumberBoundary probability threshold for the sat engine (0 = model default)
treatIsolatedCodesAsWhitespacebooleanfalseTreat isolated inline codes as whitespace during segmentation
trimLeadingWhitespacebooleantrueExclude leading whitespace from each segment span
trimTrailingWhitespacebooleantrueExclude trailing whitespace from each segment span

Configure these parameters interactively and copy the flow-step YAML on the Tool Reference.

Examples

Segment source text with default rules

Split source into sentences using the built-in rules.

segmentSource: true

Re-segment with a custom SRX file

Replace existing segmentation using project-specific rules.

segmentSource: true
overwriteSegmentation: true
sourceSrxPath: ./rules/segmentation.srx

Processing notes

  • Operates on translatable blocks only; non-translatable blocks pass through unchanged.

  • The resulting segment count is written to a block property.

Limitations

  • The built-in rule set targets common Latin-script sentence boundaries; non-Latin scripts may need a custom SRX file.

← Back to the Tool Reference