Segmentation tool

The Segmentation tool splits a block's text into sentence-level segments. Segmentation determines the unit of translation and of translation-memory matching, so consistent segmentation is important for leverage and review. By default the tool segments source text using built-in SRX-style rules that handle common sentence boundaries and abbreviations; rules can also be loaded from an SRX file.

The number of segments produced is recorded on each block. Already-segmented text is left alone unless re-segmentation is requested. Target text can be segmented independently, with its own rules file.

IDsegmentation

SourceBuilt-in

Categorytext-processing

Cardinalitymonolingual

Tagstext-processing

Parameters

Parameter	Type	Default	Description
`credential`	string		Stored credential name for the llm engine
`engine`	string		Segmenter backend: srx (rule-based; default)/ uax29 (Unicode baseline)/ llm (semantic chunks)/ sat (ML model)
`instruction`	string		Optional guidance for the llm engine
`layer`	string		Segmentation overlay layer name; empty uses the engine's natural layer
`model`	string		Model name for the llm or sat engine
`overwriteSegmentation`	boolean	false	Re-segment already-segmented blocks replacing previous segmentation
`provider`	string		AI provider id for the llm engine
`renumberCodes`	boolean	false	Renumber inline code IDs when materializing segments to a bilingual format
`satModel`	string		SaT model for the sat engine (e.g. sat-3l-sm
`segmentSource`	boolean	true	Segment the source text
`segmentTarget`	boolean	false	Segment existing target text
`sourceSrxPath`	string		Path to an SRX 2.0 rules file for source text (srx engine)
`targetSrxPath`	string		Path to an SRX 2.0 rules file for target text (srx engine)
`threshold`	number		Boundary probability threshold for the sat engine (0 = model default)
`treatIsolatedCodesAsWhitespace`	boolean	false	Treat isolated inline codes as whitespace during segmentation
`trimLeadingWhitespace`	boolean	true	Exclude leading whitespace from each segment span
`trimTrailingWhitespace`	boolean	true	Exclude trailing whitespace from each segment span

Configure these parameters interactively and copy the flow-step YAML on the Tool Reference.

Examples

Segment source text with default rules

Split source into sentences using the built-in rules.

segmentSource: true

Re-segment with a custom SRX file

Replace existing segmentation using project-specific rules.

segmentSource: true
overwriteSegmentation: true
sourceSrxPath: ./rules/segmentation.srx

Processing notes

Operates on translatable blocks only; non-translatable blocks pass through unchanged.
The resulting segment count is written to a block property.

Limitations

The built-in rule set targets common Latin-script sentence boundaries; non-Latin scripts may need a custom SRX file.

← Back to the Tool Reference