-
Notifications
You must be signed in to change notification settings - Fork 6.6k
Description
Model/Pipeline/Scheduler description
ConsistencyModelPipeline
In diffusers/examples/research_projects /consistency_training/ example, When I use multi_gpu then there is this error:- Traceback (most recent call last): File "/kaggle/working/train_cm_ct_unconditional.py", line 1438, in main(args) File "/kaggle/working/train_cm_ct_unconditional.py", line 1198, in main args.huber_c = 0.00054 * args.resolution * math.sqrt(unet.config.in_channels) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1695, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: 'DistributedDataParallel' object has no attribute 'config' Traceback (most recent call last): File "/kaggle/working/train_cm_ct_unconditional.py", line 1438, in main(args) File "/kaggle/working/train_cm_ct_unconditional.py", line 1198, in main args.huber_c = 0.00054 * args.resolution * math.sqrt(unet.config.in_channels) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1695, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: 'DistributedDataParallel' object has no attribute 'config' [2024-06-11 19:37:38,530] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 149) of binary: /opt/conda/bin/python3.10 Traceback (most recent call last): File "/opt/conda/bin/accelerate", line 8, in sys.exit(main()) File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main args.func(args) File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1073, in launch_command multi_gpu_launcher(args) File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 718, in multi_gpu_launcher distrib_run.run(args) File "/opt/conda/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run elastic_launch( File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
Open source status
- The model implementation is available.
- The model weights are available (Only relevant if addition is not a scheduler).
Provide useful links for the implementation
@vanakema
https://github.com/huggingface/diffusers/blob/main/examples/research_projects/consistency_training/train_cm_ct_unconditional.py
!accelerate launch train_cm_ct_unconditional.py
--dataset_name="cifar10"
--dataset_image_column_name="img"
--output_dir="/kaggle/working/"
--mixed_precision="no"
--resolution=32
--max_train_steps=1000
--max_train_samples=10000
--dataloader_num_workers=4
--noise_precond_type="cm"
--input_precond_type="cm"
--train_batch_size=4
--learning_rate=1e-04
--lr_scheduler="constant"
--lr_warmup_steps=0
--use_8bit_adam
--use_ema
--validation_steps=100
--eval_batch_size=4
--checkpointing_steps=10000
--checkpoints_total_limit=10
--class_conditional
--num_classes=10
In diffusers/examples/research_projects /consistency_training/ example, When I use multi_gpu then there is this error:- Traceback (most recent call last): File "/kaggle/working/train_cm_ct_unconditional.py", line 1438, in main(args) File "/kaggle/working/train_cm_ct_unconditional.py", line 1198, in main args.huber_c = 0.00054 * args.resolution * math.sqrt(unet.config.in_channels) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1695, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: 'DistributedDataParallel' object has no attribute 'config' Traceback (most recent call last): File "/kaggle/working/train_cm_ct_unconditional.py", line 1438, in main(args) File "/kaggle/working/train_cm_ct_unconditional.py", line 1198, in main args.huber_c = 0.00054 * args.resolution * math.sqrt(unet.config.in_channels) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1695, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: 'DistributedDataParallel' object has no attribute 'config' [2024-06-11 19:37:38,530] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 149) of binary: /opt/conda/bin/python3.10 Traceback (most recent call last): File "/opt/conda/bin/accelerate", line 8, in sys.exit(main()) File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main args.func(args) File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1073, in launch_command multi_gpu_launcher(args) File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 718, in multi_gpu_launcher distrib_run.run(args) File "/opt/conda/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run elastic_launch( File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
@dg845