Skip to content

Conversation

@carmocca
Copy link
Contributor

@carmocca carmocca commented Nov 10, 2021

What does this PR do?

Kudos to @SeanNaren for the idea.

This adds ~2 minutes to the special test runtime. This comes from the overhead of running pytest one added time for each test with --collect-only

I have rolled back some of the parameterizations. If you know of any others, feel free to push a change to this PR.

The current list of tests is:

================================================================================
Ran	tests/callbacks/test_pruning.py:163::test_pruning_callback_ddp
Ran	tests/callbacks/test_stochastic_weight_avg.py:141::test_swa_callback_ddp
Ran	tests/callbacks/test_tqdm_progress_bar.py:524::test_progress_bar_max_val_check_interval
Ran	tests/core/test_metric_result_integration.py:485::test_result_collection_reload_2_gpus
Ran	tests/utilities/test_deepspeed_collate_checkpoint.py:25::test_deepspeed_collate_checkpoint
Ran	tests/utilities/test_all_gather_grad.py:50::test_all_gather_collection
Ran	tests/utilities/test_all_gather_grad.py:101::test_all_gather_sync_grads
Ran	tests/accelerators/test_accelerator_connector.py:326::test_accelerator_choice_ddp_cpu_and_plugin
Ran	tests/accelerators/test_multi_nodes_gpu.py:34::test_logging_sync_dist_true_ddp
Ran	tests/accelerators/test_multi_nodes_gpu.py:71::test__validation_step__log
Ran	tests/accelerators/test_ddp.py:111::test_ddp_wrapper
Ran	tests/checkpointing/test_checkpoint_callback_frequency.py:90::test_top_k_ddp
Ran	tests/trainer/test_trainer.py:1455::test_trainer_predict_special
Ran	tests/trainer/test_trainer.py:1889::test_ddp_terminate_when_deadlock_is_detected
Ran	tests/trainer/logging_/test_train_loop_logging.py:436::test_logging_sync_dist_true_ddp
Ran	tests/trainer/optimization/test_manual_optimization.py:843::test_step_with_optimizer_closure_with_different_frequencies_ddp
Ran	tests/trainer/optimization/test_manual_optimization.py:913::test_step_with_optimizer_closure_with_different_frequencies_ddp_with_toggle_model
Ran	tests/trainer/optimization/test_optimizers.py:540::test_optimizer_state_on_device
Ran	tests/lite/test_lite.py:383::test_deepspeed_multiple_models
Ran	tests/lite/test_parity.py:193::test_boring_lite_model_ddp
Ran	tests/profiler/test_profiler.py:295::test_pytorch_profiler_trainer_ddp
Skipped	tests/profiler/test_profiler.py:428::test_pytorch_profiler_nested_emit_nvtx
Ran	tests/plugins/test_ddp_plugin_with_comm_hook.py:29::test_ddp_fp16_compress_comm_hook
Ran	tests/plugins/test_ddp_plugin_with_comm_hook.py:49::test_ddp_sgd_comm_hook
Ran	tests/plugins/test_ddp_plugin_with_comm_hook.py:73::test_ddp_fp16_compress_wrap_sgd_comm_hook
Ran	tests/plugins/test_ddp_plugin_with_comm_hook.py:98::test_ddp_spawn_fp16_compress_comm_hook
Ran	tests/plugins/test_ddp_plugin_with_comm_hook.py:115::test_ddp_post_local_sgd_comm_hook
Ran	tests/plugins/test_ddp_plugin.py:36::test_ddp_with_2_gpus
Ran	tests/plugins/test_ddp_plugin.py:67::test_ddp_barrier_non_consecutive_device_ids
Ran	tests/plugins/test_ddp_fully_sharded_with_full_state_dict.py:92::test_fully_sharded_plugin_checkpoint
Ran	tests/plugins/test_ddp_fully_sharded_with_full_state_dict.py:101::test_fully_sharded_plugin_checkpoint_multi_gpus
Ran	tests/plugins/test_ddp_fully_sharded_with_full_state_dict.py:139::test_fsdp_gradient_clipping_raises
Ran	tests/plugins/test_amp_plugins.py:193::test_amp_apex_ddp_fit
Ran	tests/plugins/test_deepspeed_plugin.py:205::test_warn_deepspeed_ignored
Ran	tests/plugins/test_deepspeed_plugin.py:261::test_deepspeed_run_configure_optimizers
Ran	tests/plugins/test_deepspeed_plugin.py:298::test_deepspeed_config
Ran	tests/plugins/test_deepspeed_plugin.py:326::test_deepspeed_custom_precision_params
Ran	tests/plugins/test_deepspeed_plugin.py:388::test_deepspeed_multigpu
Ran	tests/plugins/test_deepspeed_plugin.py:404::test_deepspeed_fp32_works
Ran	tests/plugins/test_deepspeed_plugin.py:411::test_deepspeed_stage_3_save_warning
Ran	tests/plugins/test_deepspeed_plugin.py:431::test_deepspeed_multigpu_single_file
Ran	tests/plugins/test_deepspeed_plugin.py:540::test_deepspeed_multigpu_stage_3
Ran	tests/plugins/test_deepspeed_plugin.py:553::test_deepspeed_multigpu_stage_3_manual_optimization
Ran	tests/plugins/test_deepspeed_plugin.py:602::test_deepspeed_multigpu_stage_3_checkpointing
Ran	tests/plugins/test_deepspeed_plugin.py:609::test_deepspeed_multigpu_stage_3_warns_resume_training
Ran	tests/plugins/test_deepspeed_plugin.py:636::test_deepspeed_multigpu_stage_3_resume_training
Ran	tests/plugins/test_deepspeed_plugin.py:690::test_deepspeed_multigpu_stage_3_checkpointing_full_weights_manual
Ran	tests/plugins/test_deepspeed_plugin.py:697::test_deepspeed_multigpu_stage_2_accumulated_grad_batches
Ran	tests/plugins/test_deepspeed_plugin.py:702::test_deepspeed_multigpu_stage_2_accumulated_grad_batches_offload_optimizer
Ran	tests/plugins/test_deepspeed_plugin.py:743::test_deepspeed_multigpu_test
Ran	tests/plugins/test_deepspeed_plugin.py:753::test_deepspeed_multigpu_partial_partition_parameters
Ran	tests/plugins/test_deepspeed_plugin.py:780::test_deepspeed_multigpu_test_rnn
Ran	tests/plugins/test_deepspeed_plugin.py:851::test_deepspeed_multigpu_no_schedulers
Ran	tests/plugins/test_deepspeed_plugin.py:863::test_deepspeed_skip_backward_raises
Ran	tests/plugins/test_deepspeed_plugin.py:875::test_deepspeed_warn_train_dataloader_called
Ran	tests/plugins/test_deepspeed_plugin.py:890::test_deepspeed_setup_train_dataloader
Ran	tests/plugins/test_deepspeed_plugin.py:927::test_deepspeed_scheduler_step_count
Ran	tests/plugins/test_deepspeed_plugin.py:935::test_deepspeed_scheduler_step_count_epoch
Ran	tests/plugins/test_deepspeed_plugin.py:970::test_deepspeed_configure_gradient_clipping
Ran	tests/plugins/test_deepspeed_plugin.py:991::test_deepspeed_gradient_clip_by_value
Ran	tests/plugins/test_deepspeed_plugin.py:1005::test_different_accumulate_grad_batches_fails
Ran	tests/plugins/test_deepspeed_plugin.py:1015::test_specific_gpu_device_id
Ran	tests/plugins/test_deepspeed_plugin.py:1052::test_deepspeed_with_meta_device
Ran	tests/plugins/test_sharded_plugin.py:178::test_ddp_sharded_plugin_test_multigpu
Ran	tests/plugins/test_sharded_plugin.py:204::test_ddp_sharded_plugin_manual_optimization_spawn
Ran	tests/plugins/test_sharded_plugin.py:212::test_ddp_sharded_plugin_manual_optimization
Ran	tests/models/test_sync_batchnorm.py:70::test_sync_batchnorm_ddp
Ran	tests/models/test_hooks.py:170::test_transfer_batch_hook_ddp
Ran	tests/models/test_hooks.py:425::test_trainer_model_hook_system_fit_deepspeed
Ran	tests/utilities/test_warnings.py
Ran	manual ddp launch test
================================================================================

Does your PR introduce any breaking changes? If yes, please list them.

None

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • [n/a] Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • [n/a] Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

@carmocca carmocca added the ci Continuous Integration label Nov 10, 2021
@carmocca carmocca self-assigned this Nov 10, 2021
@carmocca carmocca changed the title [WIP] Support special test parametrizations Support special test parametrizations Nov 12, 2021
This reverts commit b01f26d.
@carmocca carmocca added this to the 1.6 milestone Nov 12, 2021
@carmocca carmocca marked this pull request as ready for review November 12, 2021 16:08
@github-actions
Copy link
Contributor

Build Error! No Linked Issue found. Please link an issue or mention it in the body using #<issue_id>

2 similar comments
@github-actions
Copy link
Contributor

Build Error! No Linked Issue found. Please link an issue or mention it in the body using #<issue_id>

@github-actions
Copy link
Contributor

Build Error! No Linked Issue found. Please link an issue or mention it in the body using #<issue_id>

@tchaton
Copy link
Contributor

tchaton commented Nov 15, 2021

Hey @carmocca,

This is really dope! However, do we believe it is worth adding an extra 2 min to the CI to isolate the parametrization? I believe the overhead is coming from DDP being instantiated and processes behind created for each combination.

IMO, even if this is awesome, I think we shouldn't merge it as the overhead will only grow in the future.

@carmocca
Copy link
Contributor Author

That's totally sensible. This PR was exploratory.

I have one more idea to try which would be collecting only once, instead of as many times as we have special tests.
With the optimization of first filtering the list of files to collect with grep.

That might just have a few seconds of extra runtime.

@carmocca carmocca closed this Nov 15, 2021
@carmocca carmocca deleted the ci/support-special-parametrizations branch November 15, 2021 12:47
@carmocca carmocca restored the ci/support-special-parametrizations branch November 16, 2021 17:33
@carmocca carmocca deleted the ci/support-special-parametrizations branch November 16, 2021 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Continuous Integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants