Segmentation Lab
The Lab's segmentation lesson shows sentence segmentation as a stand-off overlay; this lab compares the engines that produce it. Switch between the pure-Go SRX rules, the raw UAX-29 Unicode baseline (ICU4X, a companion WebAssembly module), and the Hybrid — ICU4X breaks refined by SRX exceptions, how neokapi segments natively. The SaT ML segmenter is a native plugin (kapi-sat), shown here but disabled in the browser. Watch how each treats abbreviations, decimals, and quotes.
Loading the interactive lab…