-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Add Note2audio model #544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Note2audio model #544
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
Super cool! Note that we should add the vocoder in the very last step (it'll require some tf graph/onnx hacking ) |
It will require the conversion from TF's SoundStream 😅 I will focus on the T5v1.1 style encoder decoder now. BTW tell me if the file where I am putting the model is correct or if it needs changing! |
Very cool! Let me know if you need any help with T5X and weight conversion. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Closing in favor of #1044 |
The note2audio model is pretty complexe, it uses a T5 style EncoderDecoder. During the diffusion process, conditioning can be given to the encoder in two ways, MIDI file and the previous spectrogram. Two seperate network take care of the concatenation and then the Spectrogram Decoder generates a spectrogram.
Finally, SoundStream is used as a Vocoder to convert the MelSpectrogram to a raw audio. We only need to use the decoder part of SoundStream.