Skip to main content

Vision Lab

Drop in an image and the Lab runs the real kapi-vision models right here in your browser. Text is detected and recognized by PP-OCRv5; document layout (headings, paragraphs, tables, figures) comes from PP-DocLayoutV3 — the same ONNX models the native plugin runs, executed via onnxruntime-web. The OCR models (~21 MB) load on first use; the layout model (~132 MB) downloads only when you ask for it. You can also drop in a .docx — the embedded image is pulled straight from the document and run through the same models. Toggle handwriting fallback to re-read low-confidence lines with TrOCR (loaded on demand): PP-OCR handles clean text fast, TrOCR rescues the hard lines. Nothing is mocked — only the runtime differs.

Loading the interactive lab…