Terminology
neokapi manages terminology with a concept-oriented model inspired by the TBX
(TermBase eXchange) standard: language-neutral concepts group multi-locale
terms, each carrying a lifecycle status and optional grammatical metadata. The
same model backs the kapi termbase commands, the term-lookup and
term-enforce pipeline tools, and the termbase/ Go library.
Concept-oriented model
A concept is a language-neutral knowledge unit. It carries a domain and a definition, and groups terms across locales. Each term has a lifecycle status, and a locale may hold several terms (a preferred form plus admitted variants).
Concept (e.g., "cloud storage")
├── Domain: "infrastructure"
├── Definition: "Remote file storage accessed via internet"
├── Term: "cloud storage" (en, preferred)
├── Term: "stockage cloud" (fr, preferred)
├── Term: "stockage en nuage" (fr, admitted)
├── Term: "Cloud-Speicher" (de, preferred)
└── Term: "クラウドストレージ" (ja, preferred)
This differs from a flat glossary (source→target pairs) and is what enables multiple terms per locale, status-driven enforcement, and rich metadata attached to a single language-neutral concept.
Term lifecycle statuses
| Status | Meaning | Usage |
|---|---|---|
preferred | The recommended term | Always suggest to translators |
approved | Accepted for use | Valid alternative |
admitted | Allowed but not recommended | Show with lower priority |
deprecated | Being phased out | Warn when found in translations |
proposed | Under review, not yet approved | Show as suggestion with caveat |
forbidden | Must not be used | Flag as error in QA |
Concept relations
Concepts are not islands. A termbase persists typed, directed relations between concepts, so a renamed product points at its replacement and a deprecated term points at the one to use instead. The relation vocabulary is aligned with SKOS:
| Category | Labels | Meaning |
|---|---|---|
| Hierarchy | broader, narrower | A parent/child concept relationship |
| Composition | part-of, has-part | A whole/component relationship |
| Association | related | A non-hierarchical association |
| Succession | replaced-by | A concept superseded by another |
| Guidance | use-instead | A discouraged term points at a preferred one |
| Cross-scheme | exact-match, close-match | Equivalence across schemes |
| Stance | competitor | A competitor's term |
A relation is a first-class record with an ID, a source and target concept, a type from the vocabulary above, an optional note, and an optional validity (below). The termbase validates that the type is known and that both concepts exist before persisting an edge.
Relation and term validity
A relation, and an individual term, may carry a validity: a half-open time
interval [valid-from, valid-to) plus a set of free-form tags. A query supplies
a scope — a point in time and a set of tags — and only edges and terms whose
validity matches the scope are returned. A nil validity is unbounded (it matches
every scope); a nil scope applies no filtering.
This makes the same termbase answer scope-dependent questions: which terms were
preferred as of last quarter, or which relations hold within a given market.
Tags are open-ended (the framework assigns them no meaning); a caller chooses a
tag vocabulary — for example a market key — and uses it consistently. A nil
validity matches every scope; a nil scope filters nothing.
Status transitions
A term's status changes over its lifetime. ValidateTransition(from, to)
accepts any transition between known statuses — history is the guard, not a
trap — while IsGovernedTransition(from, to) reports whether a change is
consequential enough to deserve review: any transition to forbidden or
preferred, or any transition from forbidden. The framework only
classifies; a platform built on it decides what governance a governed transition
requires.
Storage backends
Two backends ship in the termbase/ package, both thread-safe
(RWMutex-protected) and implementing the full TermBase interface:
- In-memory (
termbase.NewInMemoryTermBase) — fast and ephemeral, used for session-scoped batch processing. - SQLite (
termbase.NewSQLiteTermBase) — persistent file-based storage for CLI workflows, with fuzzy matching via SQL-based Levenshtein distance.
The TermBase interface also accommodates server-side backends for multi-user
deployments with project scoping, terminology streams, and workspace isolation.
CLI usage
Resource location
All termbase commands (except list) accept these mutually exclusive flags:
| Flag | Resolves to | Example |
|---|---|---|
--name <n> | ~/.config/kapi/termbases/<n>.db | --name project-terms |
--local | ./termbase.db (current directory) | --local |
--file <path> | Explicit file path | --file /shared/glossary.db |
| (no flag) | Same as --local |
Databases are created on demand if they don't exist.
# Import terms (CSV or JSON)
kapi termbase import terms.csv --name project-terms --format csv -s en -t fr
kapi termbase import terms.json --format json
# Export terms
kapi termbase export --name project-terms --format csv -o terms.csv -s en -t fr
# Look up a term (exact, or --fuzzy)
kapi termbase lookup "encryption" --name project-terms -s en -t fr
kapi termbase lookup "authenticating users" -s en -t fr --fuzzy
# Search concepts, view statistics, list named termbases
kapi termbase search "auth" -s en --limit 50
kapi termbase stats --name project-terms
kapi termbase list
The kapi termbase commands cover import, export, lookup, search,
statistics, and listing. Concept relations are not edited from the
command line: they are authored visually. Kapi Desktop opens a per-concept
dashboard — the @neokapi/concept-ui component, which shows a concept's
terms, geography, constraints, a local relations widget, and a timeline —
over a local termbase, where an editor adds, retypes, scopes, and removes
edges directly. The relation data this produces is the same
ConceptRelation records persisted by the termbase and read through the Go
API below.
Pipeline integration
Two pipeline tools bring terminology into the translation flow:
term-lookupscans each Block's source text and attaches matched terminology asTermAnnotationentries (source term, target suggestions, positions, status). It can also power per-block suggestions in an editor.term-enforcechecks that translated blocks use the expected terminology. Violations are reported as block properties (term-enforce-errors,term-enforce-violations) and as annotations with expected-vs-actual detail.
Go library
Interface
type TermBase interface {
AddConcept(concept Concept) error
GetConcept(id string) (Concept, bool)
DeleteConcept(id string) error
Lookup(sourceText string, opts LookupOptions) []TermMatch
LookupAll(sourceText string, opts LookupOptions) []TermMatch
Search(query string, sourceLocale, targetLocale model.LocaleID, offset, limit int) ([]Concept, int)
// Relations between concepts, optionally validity-scoped.
AddRelation(rel ConceptRelation) error
DeleteRelation(id string) error
RelationsOf(conceptID string, scope *graph.Scope) []ConceptRelation // both directions
ListRelations(scope *graph.Scope) []ConceptRelation
Count() int
Concepts() []Concept
Close() error
}
(Methods take a context.Context in the real interface; it is elided here for
readability.)
Lookup finds the best match for a single term. LookupAll scans running text
and returns every term occurrence with positions — this is what powers the
term-lookup tool and editor suggestions. By default LookupAll matches
case-insensitively (terminology should be recognized regardless of
capitalization); set CaseSensitive to override.
Key types
type Concept struct {
ID string
Domain string // subject area (security, ui, marketing)
Definition string // language-neutral description
Terms []Term
Properties map[string]string // extensible metadata
CreatedAt time.Time
UpdatedAt time.Time
}
type Term struct {
Text string
Locale model.LocaleID
Status model.TermStatus // preferred, approved, admitted, deprecated, proposed, forbidden
PartOfSpeech string
Gender string
Note string
Validity *graph.Validity // optional time + tag scope (nil = unbounded)
}
type ConceptRelation struct {
ID string
SourceID string
TargetID string
RelationType string // a SKOS-aligned label: broader, use-instead, replaced-by, …
Note string
Validity *graph.Validity // optional time + tag scope (nil = unbounded)
CreatedAt time.Time
}
type TermMatch struct {
Concept Concept
Term Term // the matched source term
Score float64 // 0.0-1.0
MatchType model.MatchStrategy // exact, normalized, fuzzy
Position model.TextRange // position in source text
}
type LookupOptions struct {
SourceLocale model.LocaleID
TargetLocale model.LocaleID
CaseSensitive bool
MinScore float64 // minimum fuzzy score (default 0.8)
MatchModes []model.MatchStrategy
Domains []string // restrict to specific domains
StatusFilter []model.TermStatus // only return terms with these statuses
}
Concept helpers: SourceTerm(locale), TargetTerms(locale),
PreferredTerm(locale).
Example
package main
import (
"fmt"
"github.com/neokapi/neokapi/core/model"
"github.com/neokapi/neokapi/termbase"
)
func main() {
tb := termbase.NewInMemoryTermBase()
defer tb.Close()
tb.AddConcept(termbase.Concept{
ID: "c1",
Domain: "security",
Definition: "Process of encoding information",
Terms: []termbase.Term{
{Text: "encryption", Locale: "en", Status: model.TermPreferred},
{Text: "chiffrement", Locale: "fr", Status: model.TermPreferred},
},
})
matches := tb.LookupAll(
"The encryption module handles end-to-end encryption",
termbase.LookupOptions{SourceLocale: "en", TargetLocale: "fr"},
)
for _, m := range matches {
fmt.Printf("Found %q at [%d:%d] → %s (%s)\n",
m.Term.Text, m.Position.Start, m.Position.End,
m.Concept.TargetTerms("fr")[0].Text, m.Term.Status)
}
}
Import / export
// JSON preserves the full concept-oriented structure
count, err := termbase.ImportJSON(tb, reader)
err = termbase.ExportJSON(tb, writer, "My Termbase")
// CSV is a flat source/target form with optional metadata
opts := termbase.CSVImportOptions{
SourceLocale: "en", TargetLocale: "fr", Domain: "general", HasHeader: true,
}
count, err = termbase.ImportCSV(tb, reader, opts)
err = termbase.ExportCSV(tb, writer, "en", "fr", true)
CSV columns are source,target,domain (domain optional). JSON carries the full
concept structure:
{
"name": "Project Terms",
"version": "1.0",
"concepts": [
{
"id": "c1",
"domain": "security",
"definition": "Encryption where only endpoints can decrypt",
"terms": [
{ "text": "end-to-end encryption", "locale": "en", "status": "preferred" },
{ "text": "chiffrement de bout en bout", "locale": "fr", "status": "preferred" }
]
}
]
}
Terminology and translation memory
Terminology and translation memory are deliberately separate systems because they answer different questions:
- TM — "How was this sentence translated before?" (segment pairs).
- Terminology — "What is the correct term for this concept?" (multi-locale knowledge units).
They share the Block annotation system as their integration point, so both
TM matches and term matches are available to any downstream tool or editor.
Terminology and segmentation are run-anchored overlays produced in the content-preparation pass that readies a source before translation.