Skip to main content

HTML format (.html, .htm, .xhtml)

The HTML format reads HTML documents, extracts translatable text and localizable attributes, and writes the translations back while preserving the surrounding markup. Inline elements (such as b, i, a) become inline codes within a block, so formatting and links survive translation.

The reader ships with sensible defaults for which elements hold translatable text, which are inline, and which attributes (such as alt and title) are localizable. The elements and attributes maps let you override or extend those rules per element and per attribute, mirroring the okf_html bridge configuration. Parser behaviour (whitespace handling) is grouped under parser.

IDhtml
SourceBuilt-in
Extensions.html, .htm, .xhtml
MIME Typestext/html, application/xhtml+xml
CapabilitiesRead + Write

How kapi reads it

Parameters

ParameterTypeDefaultDescription
attributesobjectGlobal attribute extraction rules -- maps attribute names to their rule configuration (ruleTypes, allElementsExcept, onlyTheseElements, conditions)
codeFinderRulesarrayRegex patterns that match inline codes within translatable text
elementsobjectElement extraction rules -- maps element names to their rule configuration (ruleTypes, conditions, idAttributes, translatableAttributes)
parserobjectSettings that control how the HTML parser reads input
useCodeFinderbooleanfalseEnable regex-based detection of inline codes (placeholders, variables, tags) within translatable text

Configure these parameters interactively and copy the YAML on the Format Reference.

Examples

Preserve whitespace

Keep significant whitespace in text nodes instead of collapsing it.

parser:
  preserveWhitespace: true

Make a custom element translatable

Extract the text of a custom <summary> element.

elements:
  summary:
    ruleTypes:
      - TEXTUNIT

Processing notes

  • Inline elements become inline codes within blocks; block-level elements form the surrounding structure.

  • Localizable attributes (such as alt and title) are extracted as their own translatable units.

Limitations

  • The reader applies built-in element/attribute defaults; the elements and attributes maps adjust them rather than replacing the entire rule set.

← Back to the Format Reference