Moshi for Mere Mortals

By TongKe Xue* Rohit Swamy Charles Niu | June 16, 2026

* Primary contributor

In 2024, Kyutai released Moshi, a full-duplex voice-to-voice model with 200 ms practical latency. This note documents the diagrams we drew while studying Moshi and Mimi, with pseudocode and equations for readers who want the extra detail.