Skip to content

Conversation

@dg845
Copy link
Collaborator

@dg845 dg845 commented Jan 6, 2026

What does this PR do?

This PR adds pipelines for the LTX 2.0 video generation model (code, weights). LTX 2.0 is an audio-video foundation model that generates videos with synced audio; it supports generation tasks such as text-to-video (T2V), text-image-to-video (TI2V), and more.

You can try out T2V generation as follows:

python scripts/ltx2_test_full_pipeline.py \
    --model_id Lightricks/LTX-2 \
    --revision refs/pr/3 \
    --cpu_offload

Note that LTX 2.0 video generation uses a lot of memory; it is necessary to use CPU offloading even for an A100 which has 80 GB VRAM (assuming no other memory optimizations other than bf16 inference are used).

Similarly, you can try out I2V generation with

python scripts/ltx2_test_full_pipeline_i2v.py \
    --model_id Lightricks/LTX-2 \
    --revision refs/pr/3 \
    --image_path https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg \
    --cpu_offload

Here is an I2V sample from the above:

ltx2_i2v_sample.mp4

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@yiyixuxu
@sayakpaul
@ofirbb

dg845 and others added 30 commits December 12, 2025 07:52
LTX 2.0 Vocoder Implementation
LTX 2.0 Video VAE Implementation
@sayakpaul
Copy link
Member

Cc: @matanby if you want to test this PR on your end. We will shortly be adding the upsampling pipeline as well.

@bghira
Copy link
Contributor

bghira commented Jan 6, 2026

no audio encode?

@dg845
Copy link
Collaborator Author

dg845 commented Jan 6, 2026

@bghira, so I understand correctly, is the request for an analogue of diffusers.pipeline.ltx2.export_utils.encode_video that only encodes the audio? encode_video should be able to create videos with audio.

@bghira
Copy link
Contributor

bghira commented Jan 6, 2026

the audio autoencoder is missing encode() function which exists in the LTX-2 repo from Lightricks, and ComfyUI is having audio encoding as well

@dg845
Copy link
Collaborator Author

dg845 commented Jan 6, 2026

@bghira thanks for the clarification! We will support the audio VAE encoder.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments.

num_rope_elems = num_pos_dims * 2

# 3. Create a 1D grid of frequencies for RoPE
freqs_dtype = torch.float64 if self.double_precision else torch.float32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit): we could keep the self.freqs_dtype inside the init to skip doing it multiple times.

Comment on lines +1187 to +1190
video_cross_attn_rotary_emb = self.cross_attn_rope(video_coords[:, 0:1, :], device=hidden_states.device)
audio_cross_attn_rotary_emb = self.cross_attn_audio_rope(
audio_coords[:, 0:1, :], device=audio_hidden_states.device
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit): would be nice to have a comment about the small indexing going on there.

@yiyixuxu yiyixuxu mentioned this pull request Jan 7, 2026
@sayakpaul sayakpaul requested a review from yiyixuxu January 7, 2026 12:13
sayakpaul and others added 3 commits January 7, 2026 15:46
* Initial implementation of LTX 2.0 latent upsampling pipeline

* Add new LTX 2.0 spatial latent upsampler logic

* Add test script for LTX 2.0 latent upsampling

* Add option to enable VAE tiling in upsampling test script

* Get latent upsampler working with video latents

* Fix typo in BlurDownsample

* Add latent upsample pipeline docstring and example

* Remove deprecated pipeline VAE slicing/tiling methods

* make style and make quality

* When returning latents, return unpacked and denormalized latents for T2V and I2V

* Add model_cpu_offload_seq for latent upsampling pipeline

---------

Co-authored-by: Daniel Gu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants