Gå til hovedinnhold

neokapi: Architecture

neokapi is an open-source localization engine built in Go. It provides format-aware document parsing, composable processing tools, and a concurrent streaming pipeline for translation workflows. The kapi CLI and desktop app and Kapi React are surfaces built on top of this engine — but the content model, format readers and writers, tools, and pipeline are equally a Go library you can import and drive directly. If you want to start with running code, jump to the Go quickstart; for the reasoning behind each major design choice, see the Architecture Decisions.

Processing Pipeline

neokapi processing architectureA streaming pipeline: format readers and writers at the edges, a serial chain of annotate, translate and QA tools in the middle with the translate stage fanning out across parallel goroutines, translation-memory and termbase resources feeding it from above, and a gRPC plugin band (Okapi bridge, kapi-sat segmenter, remote plugins) feeding it from below. Each stage is a goroutine joined by Part channels, and the whole pipeline runs over many documents in parallel.documents in parallel · MaxConcurrencyTermbaseterm-lookupTranslation Memorytm-leverageSourcesapp.jsonpage.htmlguide.docxstrings.xmlReaderDataFormatchanAnnotatesegment · terms · NERchanfan-out · N goroutinesordered fan-inAI Translategoroutine 1AI Translategoroutine 2AI Translategoroutine 3AI Translategoroutine 4chanQAqa-check · enforcechanWriterDataFormatTargetsapp.fr.jsonapp.de.jsonapp.ja.jsonPlugin system · gRPC subprocess bridgeOkapi BridgeJava filterskapi-satML segmentationRemote / native plugincustom tool · format

The edges are the flow's source and sink — bindings that decide where content enters and leaves. The default, shown above, is the file binding: a reader turns source files of any format into a stream of Parts and a writer turns the stream back into translated files. The same flow can instead bind to the project store, a .klz workspace, or an interchange file — with no reader or writer (flows: source and sink). Between the edges runs a flow: a serial chain of tools connected by buffered channels of Parts. The tools divide by capability — annotators attach stand-off overlays and annotations (segmentation, terminology, entities, QA findings, analysis results), translators fill in targets, and QA tools check and enforce — while translation memory and the termbase feed the relevant stages.

Concurrency runs at three levels at once: each stage is its own goroutine joined by channels with automatic backpressure; a block-handling stage such as AI translation can fan out across N goroutines with an ordered fan-in; and the executor runs many documents in parallel, bounded by MaxConcurrency. Context cancellation propagates to every stage. Readers, writers, and tools can be supplied by plugins — the Java Okapi Bridge, the kapi-sat segmenter, the kapi-pdfium PDF reader, or any remote plugin — dispatched as subprocesses over gRPC. See AD-001 and AD-004.

Package Layout

neokapi/
├── go.mod # module github.com/neokapi/neokapi
├── go.work # coordinates the framework + CLI + app modules

├── core/ # Platform-agnostic framework packages
│ ├── model/ # Part, Block, Layer, Run, Target, Overlay, Data, Media
│ ├── format/ # DataFormatReader/Writer interfaces, detection
│ ├── tool/ # Tool interface, BaseTool dispatch
│ ├── flow/ # Executor, Builder, FlowDefinition
│ ├── registry/ # FormatRegistry, ToolRegistry
│ ├── encoding/ # Text encoding utilities
│ ├── locale/ # BCP-47 locale handling
│ ├── editor/ # Block index serialization and preview generation
│ ├── version/ # Build version info
│ ├── formats/ # Built-in format implementations
│ │ └── … # one package each (reader.go, writer.go, config.go)
│ ├── ai/ # AI pipeline tools, NER, prompt assembly
│ ├── mt/ # Machine-translation pipeline tools
│ ├── brand/ # Brand voice profiles, scoring, starter packs
│ ├── tools/ # Utility tools (wordcount, pseudo, segmentation, …)
│ ├── storage/ # Shared SQLite infrastructure (Open, Migrate)
│ ├── project/ # .kapi project file format (Load, Save, Validate)
│ ├── plugin/ # Plugin system (gRPC, loader, bridge, registry)
│ └── testutil/ # Shared test helpers

├── sievepen/ # Translation memory (interface, in-memory, SQLite)
├── termbase/ # Terminology (interface, in-memory, SQLite)
├── providers/
│ ├── ai/ # package aiprovider — LLM backends
│ └── mt/ # package mtprovider — MT backends

├── cli/ # Shared CLI base (module: …/cli)
├── kapi/ # Kapi standalone CLI (module: …/kapi)
├── apps/kapi-desktop/ # Kapi Desktop (Wails v3; module: …/kapi-desktop)
├── packages/
│ ├── ui/ # @neokapi/ui-primitives — shared shadcn/ui primitives
│ └── flow-editor/ # @neokapi/flow-editor — shared React flow editor
└── docs/ # Architecture decisions, notes

The framework module (repo root) stays platform-agnostic. sievepen/, termbase/, and providers/ are top-level framework packages — not nested under core/. Front-ends such as the CLI and the desktop app, and any other consumer, attach through the plugin and extension registries rather than by direct imports, so the framework never depends on a particular platform.

The framework concepts

To see these concepts working together in a few lines of Go — register the formats, read a file into the content model, run a tool, and write the result — start with the Go quickstart. The framework rests on a few concepts, each with its own page:

  • Content Model — the format-independent representation. A document becomes a stream of Parts carrying layers, blocks, fragments, spans, data, and media. Embedded content (HTML inside JSON, CDATA in XML) is modeled as nested layers, each with its own format.
  • Formats — paired readers and writers that produce and consume the content model. The neokapi analogue of an Okapi filter.
  • Tools — the processing units. Each reads Parts from a channel, transforms them, and writes them out. The analogue of an Okapi step.
  • Flows — named, ordered compositions of tools. The analogue of an Okapi pipeline.
  • Pipeline — the concurrent executor that runs a flow: goroutines, buffered channels, and context-driven cancellation. The analogue of the Okapi PipelineDriver.

For the concrete Go interfaces and method signatures behind these concepts, see the Interface Reference. For the design rationale, see the Architecture Decisions.