Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
de01058
performance optimization
hadipash Oct 9, 2024
37fd542
linting
hadipash Oct 9, 2024
8d6d25b
fix
hadipash Oct 10, 2024
39a15fc
linting
hadipash Oct 10, 2024
4a83b34
Merge branch 'master' into perf_op
hadipash Feb 26, 2025
5c01279
Merge branch 'master' into perf_op
hadipash Mar 7, 2025
8bffac6
Merge branch 'master' into perf_op
hadipash Mar 17, 2025
74f91c6
add test_hunyuan_vae.py
HaoyangLee Mar 25, 2025
2be0fd3
init infer_vae_v2.py
HaoyangLee Mar 27, 2025
7945126
update infer_vae_v2.py
HaoyangLee Mar 27, 2025
ede97cb
add shell script
HaoyangLee Mar 27, 2025
ff6a0f8
update infer_vae_v2.py
HaoyangLee Mar 27, 2025
f863571
update infer_vae_v2.py
HaoyangLee Mar 27, 2025
2941d85
add cond for infer_vae_v2.py
HaoyangLee Mar 28, 2025
be85b48
update
HaoyangLee Mar 28, 2025
33ddb2c
encoder -> encode
HaoyangLee Apr 9, 2025
b08b6a3
save latent mean and std instead of latent itself
HaoyangLee Apr 9, 2025
3c9177d
add num_frames
HaoyangLee Apr 9, 2025
f16cb53
set vae_micro_batch_size to None by default
HaoyangLee Apr 9, 2025
09c900e
add test_hunyuan_vae.py
HaoyangLee Mar 25, 2025
ce83fcf
add training
hadipash Mar 26, 2025
1def1d3
add sequence parallel
hadipash Mar 31, 2025
fd88e28
fix graph mode
hadipash Apr 9, 2025
0b961dc
update configs
hadipash Apr 10, 2025
77b2bc1
add SP for inference
hadipash Apr 10, 2025
72ce279
fix weights loading, add lazy inline
hadipash Apr 11, 2025
54b3537
Merge remote-tracking branch 'Haoyang/opensora2_0' into osv2.0_train
hadipash Apr 15, 2025
371f221
update mindone to installable
hadipash Apr 17, 2025
ac1f0e8
refactor inference pipelines
hadipash Apr 17, 2025
60c6fb4
support independent text embeddings drop
hadipash Apr 22, 2025
cd34937
add I/V2V training support
hadipash Apr 22, 2025
843e2bc
fix VAE
hadipash Apr 29, 2025
41507f0
refactor VAE inference
hadipash May 13, 2025
eab5d10
cache RoPE
hadipash May 15, 2025
c3703fd
generate VAE embeddings during training in Pynative
hadipash May 23, 2025
b58be68
add TP to VAE
hadipash May 27, 2025
8ca7322
update docs and add scripts
hadipash Jun 11, 2025
7e5b4ef
drop custom `repeat_interleave`
hadipash Jun 12, 2025
86866d4
Merge pull request #15 from hadipash/perf_op
hadipash Jun 12, 2025
e9cd952
small fix
hadipash Jun 19, 2025
ea055b1
small fix
hadipash Jun 24, 2025
714d948
Merge branch 'master' into osv2.0_train
hadipash Jul 14, 2025
8c9436d
fix repo linting
hadipash Jul 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 36 additions & 11 deletions examples/opensora_hpcai/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,13 @@ Other useful documents and links are listed below.
install [CANN 8.0.0.beta1](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1)
as recommended by the official installation website.

2. Install requirements
2. Install MindONE

```shell
pip install -e .[training]
```

3. Install requirements

```shell
pip install -r requirements.txt
Expand Down Expand Up @@ -366,16 +372,16 @@ First, you will need to generate text embeddings with:
```shell
# CLIP-Large
TRANSFORMERS_OFFLINE=1 python scripts/v2.0/text_embedding.py \
--model.from_pretrained="DeepFloyd/t5-v1_1-xxl" \
--model.max_length=512 \
--prompts_file=YOUR_PROMPTS.txt \
--output_path=assets/texts/t5_512
# T5
TRANSFORMERS_OFFLINE=1 python scripts/v2.0/text_embedding.py \
--model.from_pretrained="openai/clip-vit-large-patch14" \
--model.max_length=77 \
--prompts_file=YOUR_PROMPTS.txt \
--output_path=assets/texts/clip_77
# T5
TRANSFORMERS_OFFLINE=1 python scripts/v2.0/text_embedding.py \
--model.from_pretrained="DeepFloyd/t5-v1_1-xxl" \
--model.max_length=512 \
--prompts_file=YOUR_PROMPTS.txt \
--output_path=assets/texts/t5_512
```

Repeat the same for negative prompts.
Expand All @@ -384,10 +390,10 @@ Then, you can generate videos by running the following command:

```shell
python scripts/v2.0/inference_v2.py --config=configs/opensora-v2-0/inference/256px.yaml \
text_emb.t5_dir=assets/texts/t5_512 \
text_emb.neg_t5_dir=assets/texts/t5_512_neg \
text_emb.clip_dir=assets/texts/clip_77 \
text_emb.neg_clip_dir=assets/texts/clip_77_neg
prompts.t5_dir=assets/texts/t5_512 \
prompts.neg_t5_dir=assets/texts/t5_512_neg \
prompts.clip_dir=assets/texts/clip_77 \
prompts.neg_clip_dir=assets/texts/clip_77_neg
```

#### Inference Performance
Expand Down Expand Up @@ -616,8 +622,26 @@ video_embed_folder

## Training

### Open-Sora 2.0

Once the data in a CSV file is prepared, training can be started by running the appropriate bash scripts located in the
`scripts/v2.0/run` directory.

#### Training Performance

| Model name | Cards | Batch size | Mode | JIT level | Method | Resolution | Frames | Sequence Parallel | ZeRO stage | VAE cache | Text Cache | Step time (s) | Recipe |
|:----------:|:-----:|:----------:|:--------------------------:|:---------:|:------:|:----------:|:------:|:-----------------:|:----------:|:---------:|:----------:|:-------------:|:------------------------------------------------------:|
| 11B | 8 | 1 | Graph | O1 | t2v | 256x256 | 129 | - | 3 | Yes | Yes | 4.58 | [yaml](configs/opensora-v2-0/train/stage1_latent.yaml) |
| 11B | 8 | 1 | Pynative | - | t2v | 256x256 | 129 | - | 3 | Yes | Yes | 6.57 | [yaml](configs/opensora-v2-0/train/stage1_latent.yaml) |
| 11B | 8 | 2 | MMDiT Graph + VAE Pynative | O1 | t2v | 256x256 | 129 | - | 3 | No | Yes | 11.8 | [yaml](configs/opensora-v2-0/train/stage1.yaml) |
| 11B | 8 | 1 | Graph | O1 | t2v | 768x768 | 129 | 8 | 3 | Yes | Yes | 16.5 | [yaml](configs/opensora-v2-0/train/stage2_latent.yaml) |
| 11B | 8 | 1 | Pynative | - | t2v | 768x768 | 129 | 8 | 3 | Yes | Yes | 18.9 | [yaml](configs/opensora-v2-0/train/stage2_latent.yaml) |

### Open-Sora 1.2

<details>
<summary>Instructions</summary>

Once you prepare the data in a csv file, you may run the following commands to launch training on a single card.

```shell
Expand Down Expand Up @@ -675,6 +699,7 @@ More details on the bucket configuration can be found in [Multi-resolution Train

The instruction for launching the dynamic training task is smilar to the previous section. An example running script is `scripts/run/run_train_os1.2_stage2.sh`.

</details>

### Open-Sora 1.1

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
from_pretrained: hpcai-tech/Open-Sora-v2/hunyuan_vae.safetensors
in_channels: 3
out_channels: 3
layers_per_block: 2
latent_channels: 16
use_spatial_tiling: True
use_temporal_tiling: False
dtype: bf16
Original file line number Diff line number Diff line change
Expand Up @@ -22,17 +22,9 @@ model:
cond_embed: True
dtype: bf16

ae:
from_pretrained: hpcai-tech/Open-Sora-v2/hunyuan_vae.safetensors
in_channels: 3
out_channels: 3
layers_per_block: 2
latent_channels: 16
use_spatial_tiling: True
use_temporal_tiling: False
dtype: bf16
ae: ../ae/hunyuan_vae.yaml

text_emb:
prompts:
prompts:
neg_prompts:
t5_dir:
Expand Down Expand Up @@ -64,12 +56,11 @@ sampling_option:
image_osci: True # enable image guidance oscillation
scale_temporal_osci: True
method: i2v
motion_score: "4" # motion score for video generation
motion_score: 4 # motion score for video generation
batch_size: 1
cond_type: "t2v"
cond_type: t2v

saving_option:
output_path: ../../../samples # save directory
fps: 24 # fps for video generation and saving

# T2I. TODO: separate config
Expand Down Expand Up @@ -119,6 +110,6 @@ sampling_option_t2i:
image_osci: True # enable image guidance oscillation
scale_temporal_osci: True
method: distill
motion_score: "4" # motion score for video generation
motion_score: 4 # motion score for video generation
batch_size: 1
cond_type: "t2v"
cond_type: t2v
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ env:
mode: 1
debug: False

enable_sequence_parallel: True

model:
from_pretrained: hpcai-tech/Open-Sora-v2/Open_Sora_v2.safetensors
guidance_embed: False
Expand All @@ -22,17 +24,9 @@ model:
cond_embed: True
dtype: bf16

ae:
from_pretrained: hpcai-tech/Open-Sora-v2/hunyuan_vae.safetensors
in_channels: 3
out_channels: 3
layers_per_block: 2
latent_channels: 16
use_spatial_tiling: True
use_temporal_tiling: False
dtype: bf16
ae: ../ae/hunyuan_vae.yaml

text_emb:
prompts:
prompts:
neg_prompts:
t5_dir:
Expand Down Expand Up @@ -64,12 +58,11 @@ sampling_option:
image_osci: True # enable image guidance oscillation
scale_temporal_osci: True
method: i2v
motion_score: "4" # motion score for video generation
motion_score: 4 # motion score for video generation
batch_size: 1
cond_type: "t2v"
cond_type: t2v

saving_option:
output_path: ../../../samples # save directory
fps: 24 # fps for video generation and saving

# T2I. TODO: separate config
Expand Down Expand Up @@ -119,6 +112,6 @@ sampling_option_t2i:
image_osci: True # enable image guidance oscillation
scale_temporal_osci: True
method: distill
motion_score: "4" # motion score for video generation
motion_score: 4 # motion score for video generation
batch_size: 1
cond_type: "t2v"
cond_type: t2v
106 changes: 106 additions & 0 deletions examples/opensora_hpcai/configs/opensora-v2-0/train/image.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
env:
mode: 0
jit_level: O1
max_device_memory: 59GB
seed: 42
distributed: False
debug: False

model:
from_pretrained:
guidance_embed: False
fused_qkv: False
use_liger_rope: True
# model architecture
in_channels: 64
vec_in_dim: 768
context_in_dim: 4096
hidden_size: 3072
mlp_ratio: 4.0
num_heads: 24
depth: 19
depth_single_blocks: 38
axes_dim: [ 16, 56, 56 ]
theta: 10_000
qkv_bias: True
cond_embed: False
recompute_every_nth_block: 1
dtype: bf16

ae: ../ae/hunyuan_vae.yaml

dataset:
v2_pipeline: True
sample_n_frames: 5
csv_path: CSV_PATH
video_folder: VIDEO_FOLDER
text_emb_folder:
t5: UL2_FOLDER
clip: BYT5_FOLDER
empty_text_emb:
t5: EMPTY_TEXT_EMB
clip: EMPTY_TEXT_EMB
text_drop_prob:
t5: 0.31622777
clip: 0.31622777
vae_scale_factor: 0.476986
vae_shift_factor: 0
apply_transforms_dataset: True
output_columns: [ "video", "video_ids", "t5_caption", "txt_ids", "clip_caption", "shift_alpha" ]

bucket_config:
init_args:
bucket_config:
256px:
1: [ 1.0, 50 ]
768px:
1: [ 0.5, 11 ]
1024px:
1: [ 0.5, 7 ]

dataloader:
shuffle: True
num_workers_dataset: 4

train:
pipeline:
is_causal_vae: True

sequence_parallel:
shards: 1 # 1 == no SP

options:
steps: 20000

lr_scheduler:
name: constant
lr: 1e-5
warmup_steps: 0

optimizer:
name: adamw_bf16
eps: 1e-15
betas: [ 0.9, 0.999 ]
weight_decay: 0

loss_scaler:
class_path: mindspore.nn.FixedLossScaleUpdateCell # or DynamicLossScaleUpdateCell in FP16
init_args:
loss_scale_value: 1

settings:
zero_stage: 2
gradient_accumulation_steps: 1
clip_grad: True
clip_norm: 1.0

save:
ckpt_save_policy: latest_k
ckpt_save_interval: 500
ckpt_max_keep: 10
log_interval: 1
save_ema_only: False
record_lr: False

save:
output_path: ../../../output/image # the path is relative to this config
Loading