Moshi for Mere Mortals
* Primary contributor
In 2024, Kyutai released Moshi, a full-duplex voice-to-voice model with 200 ms practical latency. This note documents the diagrams we drew while studying Moshi and Mimi, with pseudocode and equations for readers who want the extra detail.