Skip to main content

Repetition Analysis tool

The Repetition Analysis tool tracks source text as blocks stream through the pipeline and tags each block according to whether its source has been seen before. The first time a given source text appears it is marked as a first occurrence; subsequent identical occurrences are marked as repetitions. Each block also records a group key linking equal segments, the running count of occurrences, and its 1-based index within the group.

Source text is trimmed of surrounding whitespace before comparison. The output feeds scoping and pricing: repeated segments can be translated once and reused, so identifying them quantifies the leverage available from repetition.

IDrepetition-analysis
SourceBuilt-in
Categoryanalysis
Cardinalitymonolingual
Tagsanalysis

Parameters

ParameterTypeDefaultDescription
caseSensitivebooleantrueWhether comparison is case-sensitive

Configure these parameters interactively and copy the flow-step YAML on the Tool Reference.

Examples

Case-insensitive repetition

Treat segments that differ only in case as repetitions.

caseSensitive: false

Processing notes

  • Operates on translatable blocks; non-translatable structure passes through unchanged.

  • Counts the source side only.

Limitations

  • Matches exact (trimmed) source text only; near-duplicates and fuzzy matches are not detected here.

  • Repetition state is tracked across the whole run; ordering of occurrences depends on block order in the pipeline.

← Back to the Tool Reference