Vocabularies
A vocabulary is the semantic type system that gives meaning to inline codes.
When a reader lifts an inline element out of the text into a
span, it assigns that span a semantic type from a
vocabulary — fmt:bold, link:hyperlink, code:variable, and so on. The
vocabulary entry says what the type means, how it should be rendered and labeled,
and what a translator is allowed to do with it. This is the layer that makes
inline handling format-independent: <b> (HTML), ** (Markdown), and <w:b/>
(DOCX) all resolve to the same fmt:bold type, so everything downstream treats
them identically.
What a semantic type carries
Each type maps a span to a consistent set of metadata:
| Layer | What it provides | Example |
|---|---|---|
| Category | Logical grouping | formatting, code, structure |
| Label | Human-readable name | Bold, Variable |
| HTML rendering | Preview output | <b>, </b> |
| Display text | Editor chip label | [B], [/B] |
| Color scheme | Visual styling | Blue for bold, orange for variables |
| Constraints | Editing rules | Deletable, cloneable, reorderable |
| Text equivalent | Plain text fallback | \n for line breaks |
The constraints are the part that matters most for correctness. They encode what a translator may do with a code:
- Deletable — may the code be removed? Formatting like bold is deletable; required elements like line breaks, variables, and placeholders are not.
- Cloneable — may the code be duplicated? Bold can be applied to more text; a variable must not be repeated.
- Reorderable — may the code move relative to others? A variable can move to match target word order; a fixed structural code may not.
Editors and QA checks read these constraints to prevent invalid changes — blocking deletion of a required tag, flagging a duplicated variable, or warning about a missing code — without knowing anything about the source file format.
Layered vocabularies
Vocabularies are layered: a vocabulary can extend another, inheriting its
types and adding or overriding its own. The framework ships a base vocabulary of
the types common to all formats (bold, italic, underline, code, hyperlink,
image, line break) plus extensions for HTML-rich content (strikethrough,
sub/superscript, highlight, ruby, footnotes) and for code tokens and i18n
placeholders (variables, placeholders, functions, generic markup). A format
reader maps its native constructs to these shared types; an application can layer
a domain-specific vocabulary on top when it needs types the built-ins do not
cover.
Why this enables format-independent reuse
Because every format reduces inline codes to the same semantic types, content becomes comparable across formats. The same vocabulary that drives editor rendering also feeds translation-memory matching: an entry created from an HTML source can match a Markdown source because both reduce to the same structural projection. One classification serves preview, validation, and reuse.
Related reading
- Inline Formatting — how spans appear in the content model.
- Content Model — where spans and fragments live.
- Authoring Vocabularies — the JSON file format, mapping native elements, and creating a custom vocabulary.