Gå til hovedinnhold

Encoding Convert tool

The Encoding Convert tool validates and normalizes block text against a target character encoding. It round-trips the text — encoding to the target charset and decoding back to UTF-8 — which surfaces and replaces characters the target cannot represent. The chosen encoding name is recorded in a block property so a downstream writer can emit the document in that encoding.

Before converting, the tool can decode escape sequences found in the input — numeric character references, HTML character entity references, and Java-style \uXXXX escapes — so that the real characters are evaluated against the target encoding. A target encoding name is required, and a target locale is required when applying to the target.

IDencoding-convert
SourceBuilt-in
Categorytext-processing
Cardinalitymonolingual

Parameters

ParameterTypeDefaultDescription
applySourcebooleanfalseApply encoding conversion to source text
applyTargetbooleantrueApply encoding conversion to target text
escapeAllbooleanfalseEscape all extended (non-ASCII) characters in output
reportUnsupportedbooleantrueReport characters not supported by the target encoding
targetEncodingstringTarget encoding name (e.g. utf-8 or iso-8859-1 or shift-jis)
targetLocalestringTarget locale for processing
unescapeCERbooleantrueUnescape HTML character entity references (e.g. á) when reading input
unescapeJavabooleantrueUnescape Java-style \\uXXXX escape sequences when reading input
unescapeNCRbooleantrueUnescape numeric character references (e.g. á) when reading input

Configure these parameters interactively and copy the flow-step YAML on the Tool Reference.

Examples

Normalize targets to ISO-8859-1

Validate target text against Latin-1 before writing.

targetEncoding: iso-8859-1
targetLocale: fr-FR

Decode entities, then normalize to Shift-JIS

Unescape HTML entities in input and convert to a Japanese encoding.

targetEncoding: shift-jis
targetLocale: ja-JP
unescapeCER: true

Processing notes

  • Operates on translatable blocks only; non-translatable blocks pass through unchanged.

  • The target encoding name is written to a block property for downstream writers.

Limitations

  • Conversion validates representability by round-tripping through the encoding; characters the encoding cannot represent are replaced.

← Back to the Tool Reference