Gå til hovedinnhold

Vocabularies

A vocabulary is the semantic type system that gives meaning to inline codes. When a reader lifts an inline element out of the text into a span, it assigns that span a semantic type from a vocabulary — fmt:bold, link:hyperlink, code:variable, and so on. The vocabulary entry says what the type means, how it should be rendered and labeled, and what a translator is allowed to do with it. This is the layer that makes inline handling format-independent: <b> (HTML), ** (Markdown), and <w:b/> (DOCX) all resolve to the same fmt:bold type, so everything downstream treats them identically.

What a semantic type carries

Each type maps a span to a consistent set of metadata:

LayerWhat it providesExample
CategoryLogical groupingformatting, code, structure
LabelHuman-readable nameBold, Variable
HTML renderingPreview output<b>, </b>
Display textEditor chip label[B], [/B]
Color schemeVisual stylingBlue for bold, orange for variables
ConstraintsEditing rulesDeletable, cloneable, reorderable
Text equivalentPlain text fallback\n for line breaks

The constraints are the part that matters most for correctness. They encode what a translator may do with a code:

  • Deletable — may the code be removed? Formatting like bold is deletable; required elements like line breaks, variables, and placeholders are not.
  • Cloneable — may the code be duplicated? Bold can be applied to more text; a variable must not be repeated.
  • Reorderable — may the code move relative to others? A variable can move to match target word order; a fixed structural code may not.

Editors and QA checks read these constraints to prevent invalid changes — blocking deletion of a required tag, flagging a duplicated variable, or warning about a missing code — without knowing anything about the source file format.

Layered vocabularies

Vocabularies are layered: a vocabulary can extend another, inheriting its types and adding or overriding its own. The framework ships a base vocabulary of the types common to all formats (bold, italic, underline, code, hyperlink, image, line break) plus extensions for HTML-rich content (strikethrough, sub/superscript, highlight, ruby, footnotes) and for code tokens and i18n placeholders (variables, placeholders, functions, generic markup). A format reader maps its native constructs to these shared types; an application can layer a domain-specific vocabulary on top when it needs types the built-ins do not cover.

Why this enables format-independent reuse

Because every format reduces inline codes to the same semantic types, content becomes comparable across formats. The same vocabulary that drives editor rendering also feeds translation-memory matching: an entry created from an HTML source can match a Markdown source because both reduce to the same structural projection. One classification serves preview, validation, and reuse.