Moshi for Mere Mortals

By TongKe Xue* Rohit Swamy Charles Niu | June 16, 2026

* Primary contributor

In 2024, Kyutai released Moshi, a full-duplex voice-to-voice model with 200 ms practical latency. This note documents the diagrams we drew while studying Moshi and Mimi, with pseudocode and equations for readers who want the extra detail.

Read the paper 14 pages • PDF

This browser cannot display the embedded PDF.

Open Moshi for Mere Mortals as a PDF.

Follow @lognprg Follow @bicro_ Follow @coatol5