-
Notifications
You must be signed in to change notification settings - Fork 12.4k
Closed
Labels
Description
With the first Sesame CSM model openly available, we should implement a local example similar to their online research demo. It seems that the released CSM model uses Kyutai's Mimi audio codec which we have to implement in a similar way as we did with the WavTokenizer. Next we can modify the talk-llama example to support audio generation with the CSM. This way we will be able to plug any LLM for the text response generation and use Sesame for speech input/output.
arch-btw, ibakhpiano, ngxson, theprashant-one, flashburns and 10 morearch-btw, randxie, flashburns, Green-Sky, yas19sin and 3 morescalar27, rmatif, flashburns, Vadim170 and Forest-Person