neokapi: Architecture
neokapi is an open-source localization engine built in Go. It provides
format-aware document parsing, composable processing tools, and a concurrent
streaming pipeline for translation workflows. The kapi CLI and desktop
app and Kapi React are surfaces built on
top of this engine — but the content model, format readers and writers, tools,
and pipeline are equally a Go library you can import and drive directly. If you
want to start with running code, jump to the
Go quickstart; for the reasoning behind each major
design choice, see the Architecture Decisions.
Processing Pipeline
The edges are the flow's source and sink — bindings that decide where
content enters and leaves. The default, shown above, is the file binding: a
reader turns source files of any format into a stream of
Parts and a writer turns the
stream back into translated files. The same flow can instead bind to the project
store, a .klz workspace, or an interchange file — with no reader or writer
(flows: source and sink).
Between the edges runs a flow: a serial chain of
tools connected by buffered channels of Parts. The tools divide by capability — annotators attach stand-off
overlays and annotations
(segmentation, terminology, entities, QA findings, analysis results),
translators fill in targets, and QA tools check and enforce — while
translation memory and the
termbase feed the relevant stages.
Concurrency runs at three levels at once: each stage is its own goroutine joined
by channels with automatic backpressure; a block-handling stage such as AI
translation can fan out across N goroutines with an ordered fan-in; and the
executor runs many documents in parallel, bounded by MaxConcurrency. Context
cancellation propagates to every stage. Readers, writers, and tools can be
supplied by plugins — the Java
Okapi Bridge, the kapi-sat
segmenter, the kapi-pdfium PDF
reader, or any remote plugin — dispatched as subprocesses over gRPC. See
AD-001 and
AD-004.
Package Layout
neokapi/
├── go.mod # module github.com/neokapi/neokapi
├── go.work # coordinates the framework + CLI + app modules
│
├── core/ # Platform-agnostic framework packages
│ ├── model/ # Part, Block, Layer, Run, Target, Overlay, Data, Media
│ ├── format/ # DataFormatReader/Writer interfaces, detection
│ ├── tool/ # Tool interface, BaseTool dispatch
│ ├── flow/ # Executor, Builder, FlowDefinition
│ ├── registry/ # FormatRegistry, ToolRegistry
│ ├── encoding/ # Text encoding utilities
│ ├── locale/ # BCP-47 locale handling
│ ├── editor/ # Block index serialization and preview generation
│ ├── version/ # Build version info
│ ├── formats/ # Built-in format implementations
│ │ └── … # one package each (reader.go, writer.go, config.go)
│ ├── ai/ # AI pipeline tools, NER, prompt assembly
│ ├── mt/ # Machine-translation pipeline tools
│ ├── brand/ # Brand voice profiles, scoring, starter packs
│ ├── tools/ # Utility tools (wordcount, pseudo, segmentation, …)
│ ├── storage/ # Shared SQLite infrastructure (Open, Migrate)
│ ├── project/ # .kapi project file format (Load, Save, Validate)
│ ├── plugin/ # Plugin system (gRPC, loader, bridge, registry)
│ └── testutil/ # Shared test helpers
│
├── sievepen/ # Translation memory (interface, in-memory, SQLite)
├── termbase/ # Terminology (interface, in-memory, SQLite)
├── providers/
│ ├── ai/ # package aiprovider — LLM backends
│ └── mt/ # package mtprovider — MT backends
│
├── cli/ # Shared CLI base (module: …/cli)
├── kapi/ # Kapi standalone CLI (module: …/kapi)
├── apps/kapi-desktop/ # Kapi Desktop (Wails v3; module: …/kapi-desktop)
├── packages/
│ ├── ui/ # @neokapi/ui-primitives — shared shadcn/ui primitives
│ └── flow-editor/ # @neokapi/flow-editor — shared React flow editor
└── docs/ # Architecture decisions, notes
The framework module (repo root) stays platform-agnostic. sievepen/,
termbase/, and providers/ are top-level framework packages — not nested
under core/. Front-ends such as the CLI and the desktop app, and any other
consumer, attach through the plugin and extension registries rather than by
direct imports, so the framework never depends on a particular platform.
The framework concepts
To see these concepts working together in a few lines of Go — register the formats, read a file into the content model, run a tool, and write the result — start with the Go quickstart. The framework rests on a few concepts, each with its own page:
- Content Model — the format-independent
representation. A document becomes a stream of
Parts carrying layers, blocks, fragments, spans, data, and media. Embedded content (HTML inside JSON, CDATA in XML) is modeled as nested layers, each with its own format. - Formats — paired readers and writers that produce and consume the content model. The neokapi analogue of an Okapi filter.
- Tools — the processing units. Each reads Parts from a channel, transforms them, and writes them out. The analogue of an Okapi step.
- Flows — named, ordered compositions of tools. The analogue of an Okapi pipeline.
- Pipeline — the concurrent executor that runs a flow: goroutines, buffered channels, and context-driven cancellation. The analogue of the Okapi PipelineDriver.
For the concrete Go interfaces and method signatures behind these concepts, see the Interface Reference. For the design rationale, see the Architecture Decisions.