Multimodal Showcase

A guided tour of how kapi localizes images, audio, and video — translating the text inside each asset and rendering the result. This showcase is pre-recorded: the extraction (OCR, speech recognition, demux) is baked in, so it plays anywhere, instantly, with no model download or ffmpeg. To run the real engines in your browser, see the live Vision Lab (and the audio/video labs).

Loading the interactive lab…