Support special test parametrizations #10462

carmocca · 2021-11-10T17:42:27Z

What does this PR do?

Kudos to @SeanNaren for the idea.

This adds ~2 minutes to the special test runtime. This comes from the overhead of running pytest one added time for each test with --collect-only

I have rolled back some of the parameterizations. If you know of any others, feel free to push a change to this PR.

The current list of tests is:

================================================================================
Ran	tests/callbacks/test_pruning.py:163::test_pruning_callback_ddp
Ran	tests/callbacks/test_stochastic_weight_avg.py:141::test_swa_callback_ddp
Ran	tests/callbacks/test_tqdm_progress_bar.py:524::test_progress_bar_max_val_check_interval
Ran	tests/core/test_metric_result_integration.py:485::test_result_collection_reload_2_gpus
Ran	tests/utilities/test_deepspeed_collate_checkpoint.py:25::test_deepspeed_collate_checkpoint
Ran	tests/utilities/test_all_gather_grad.py:50::test_all_gather_collection
Ran	tests/utilities/test_all_gather_grad.py:101::test_all_gather_sync_grads
Ran	tests/accelerators/test_accelerator_connector.py:326::test_accelerator_choice_ddp_cpu_and_plugin
Ran	tests/accelerators/test_multi_nodes_gpu.py:34::test_logging_sync_dist_true_ddp
Ran	tests/accelerators/test_multi_nodes_gpu.py:71::test__validation_step__log
Ran	tests/accelerators/test_ddp.py:111::test_ddp_wrapper
Ran	tests/checkpointing/test_checkpoint_callback_frequency.py:90::test_top_k_ddp
Ran	tests/trainer/test_trainer.py:1455::test_trainer_predict_special
Ran	tests/trainer/test_trainer.py:1889::test_ddp_terminate_when_deadlock_is_detected
Ran	tests/trainer/logging_/test_train_loop_logging.py:436::test_logging_sync_dist_true_ddp
Ran	tests/trainer/optimization/test_manual_optimization.py:843::test_step_with_optimizer_closure_with_different_frequencies_ddp
Ran	tests/trainer/optimization/test_manual_optimization.py:913::test_step_with_optimizer_closure_with_different_frequencies_ddp_with_toggle_model
Ran	tests/trainer/optimization/test_optimizers.py:540::test_optimizer_state_on_device
Ran	tests/lite/test_lite.py:383::test_deepspeed_multiple_models
Ran	tests/lite/test_parity.py:193::test_boring_lite_model_ddp
Ran	tests/profiler/test_profiler.py:295::test_pytorch_profiler_trainer_ddp
Skipped	tests/profiler/test_profiler.py:428::test_pytorch_profiler_nested_emit_nvtx
Ran	tests/plugins/test_ddp_plugin_with_comm_hook.py:29::test_ddp_fp16_compress_comm_hook
Ran	tests/plugins/test_ddp_plugin_with_comm_hook.py:49::test_ddp_sgd_comm_hook
Ran	tests/plugins/test_ddp_plugin_with_comm_hook.py:73::test_ddp_fp16_compress_wrap_sgd_comm_hook
Ran	tests/plugins/test_ddp_plugin_with_comm_hook.py:98::test_ddp_spawn_fp16_compress_comm_hook
Ran	tests/plugins/test_ddp_plugin_with_comm_hook.py:115::test_ddp_post_local_sgd_comm_hook
Ran	tests/plugins/test_ddp_plugin.py:36::test_ddp_with_2_gpus
Ran	tests/plugins/test_ddp_plugin.py:67::test_ddp_barrier_non_consecutive_device_ids
Ran	tests/plugins/test_ddp_fully_sharded_with_full_state_dict.py:92::test_fully_sharded_plugin_checkpoint
Ran	tests/plugins/test_ddp_fully_sharded_with_full_state_dict.py:101::test_fully_sharded_plugin_checkpoint_multi_gpus
Ran	tests/plugins/test_ddp_fully_sharded_with_full_state_dict.py:139::test_fsdp_gradient_clipping_raises
Ran	tests/plugins/test_amp_plugins.py:193::test_amp_apex_ddp_fit
Ran	tests/plugins/test_deepspeed_plugin.py:205::test_warn_deepspeed_ignored
Ran	tests/plugins/test_deepspeed_plugin.py:261::test_deepspeed_run_configure_optimizers
Ran	tests/plugins/test_deepspeed_plugin.py:298::test_deepspeed_config
Ran	tests/plugins/test_deepspeed_plugin.py:326::test_deepspeed_custom_precision_params
Ran	tests/plugins/test_deepspeed_plugin.py:388::test_deepspeed_multigpu
Ran	tests/plugins/test_deepspeed_plugin.py:404::test_deepspeed_fp32_works
Ran	tests/plugins/test_deepspeed_plugin.py:411::test_deepspeed_stage_3_save_warning
Ran	tests/plugins/test_deepspeed_plugin.py:431::test_deepspeed_multigpu_single_file
Ran	tests/plugins/test_deepspeed_plugin.py:540::test_deepspeed_multigpu_stage_3
Ran	tests/plugins/test_deepspeed_plugin.py:553::test_deepspeed_multigpu_stage_3_manual_optimization
Ran	tests/plugins/test_deepspeed_plugin.py:602::test_deepspeed_multigpu_stage_3_checkpointing
Ran	tests/plugins/test_deepspeed_plugin.py:609::test_deepspeed_multigpu_stage_3_warns_resume_training
Ran	tests/plugins/test_deepspeed_plugin.py:636::test_deepspeed_multigpu_stage_3_resume_training
Ran	tests/plugins/test_deepspeed_plugin.py:690::test_deepspeed_multigpu_stage_3_checkpointing_full_weights_manual
Ran	tests/plugins/test_deepspeed_plugin.py:697::test_deepspeed_multigpu_stage_2_accumulated_grad_batches
Ran	tests/plugins/test_deepspeed_plugin.py:702::test_deepspeed_multigpu_stage_2_accumulated_grad_batches_offload_optimizer
Ran	tests/plugins/test_deepspeed_plugin.py:743::test_deepspeed_multigpu_test
Ran	tests/plugins/test_deepspeed_plugin.py:753::test_deepspeed_multigpu_partial_partition_parameters
Ran	tests/plugins/test_deepspeed_plugin.py:780::test_deepspeed_multigpu_test_rnn
Ran	tests/plugins/test_deepspeed_plugin.py:851::test_deepspeed_multigpu_no_schedulers
Ran	tests/plugins/test_deepspeed_plugin.py:863::test_deepspeed_skip_backward_raises
Ran	tests/plugins/test_deepspeed_plugin.py:875::test_deepspeed_warn_train_dataloader_called
Ran	tests/plugins/test_deepspeed_plugin.py:890::test_deepspeed_setup_train_dataloader
Ran	tests/plugins/test_deepspeed_plugin.py:927::test_deepspeed_scheduler_step_count
Ran	tests/plugins/test_deepspeed_plugin.py:935::test_deepspeed_scheduler_step_count_epoch
Ran	tests/plugins/test_deepspeed_plugin.py:970::test_deepspeed_configure_gradient_clipping
Ran	tests/plugins/test_deepspeed_plugin.py:991::test_deepspeed_gradient_clip_by_value
Ran	tests/plugins/test_deepspeed_plugin.py:1005::test_different_accumulate_grad_batches_fails
Ran	tests/plugins/test_deepspeed_plugin.py:1015::test_specific_gpu_device_id
Ran	tests/plugins/test_deepspeed_plugin.py:1052::test_deepspeed_with_meta_device
Ran	tests/plugins/test_sharded_plugin.py:178::test_ddp_sharded_plugin_test_multigpu
Ran	tests/plugins/test_sharded_plugin.py:204::test_ddp_sharded_plugin_manual_optimization_spawn
Ran	tests/plugins/test_sharded_plugin.py:212::test_ddp_sharded_plugin_manual_optimization
Ran	tests/models/test_sync_batchnorm.py:70::test_sync_batchnorm_ddp
Ran	tests/models/test_hooks.py:170::test_transfer_batch_hook_ddp
Ran	tests/models/test_hooks.py:425::test_trainer_model_hook_system_fit_deepspeed
Ran	tests/utilities/test_warnings.py
Ran	manual ddp launch test
================================================================================

Does your PR introduce any breaking changes? If yes, please list them.

None

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
[n/a] Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
[n/a] Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

tests/special_tests.sh

for more information, see https://pre-commit.ci

This reverts commit b01f26d.

github-actions · 2021-11-12T16:08:50Z

Build Error! No Linked Issue found. Please link an issue or mention it in the body using #<issue_id>

github-actions · 2021-11-12T16:10:35Z

Build Error! No Linked Issue found. Please link an issue or mention it in the body using #<issue_id>

github-actions · 2021-11-12T16:10:51Z

Build Error! No Linked Issue found. Please link an issue or mention it in the body using #<issue_id>

tchaton · 2021-11-15T12:26:23Z

Hey @carmocca,

This is really dope! However, do we believe it is worth adding an extra 2 min to the CI to isolate the parametrization? I believe the overhead is coming from DDP being instantiated and processes behind created for each combination.

IMO, even if this is awesome, I think we shouldn't merge it as the overhead will only grow in the future.

carmocca · 2021-11-15T12:46:32Z

That's totally sensible. This PR was exploratory.

I have one more idea to try which would be collecting only once, instead of as many times as we have special tests.
With the optimization of first filtering the list of files to collect with grep.

That might just have a few seconds of extra runtime.

carmocca added 2 commits November 10, 2021 18:40

Support special test parametrizations

fd9c962

Debug

b01f26d

carmocca added the ci Continuous Integration label Nov 10, 2021

carmocca self-assigned this Nov 10, 2021

carmocca added 8 commits November 10, 2021 18:49

Fix

9a23b52

Fix typo and update test

8e1a6d2

Improve visibility

52360a3

Update a few tests

8488cb8

Fix

9a27671

Fix

f157d0e

Add comment

9fc293e

Undo change

190dc45

carmocca commented Nov 12, 2021

View reviewed changes

tests/special_tests.sh Outdated Show resolved Hide resolved

carmocca and others added 2 commits November 12, 2021 16:36

Update tests/special_tests.sh

3dd01fe

[pre-commit.ci] auto fixes from pre-commit.com hooks

37395ec

for more information, see https://pre-commit.ci

carmocca changed the title ~~[WIP] Support special test parametrizations~~ Support special test parametrizations Nov 12, 2021

Revert "Debug"

5286153

This reverts commit b01f26d.

carmocca added this to the 1.6 milestone Nov 12, 2021

carmocca marked this pull request as ready for review November 12, 2021 16:08

carmocca requested review from Borda, SeanNaren, awaelchli, justusschock, kaushikb11, rohitgr7, tchaton and williamFalcon as code owners November 12, 2021 16:08

carmocca closed this Nov 15, 2021

carmocca deleted the ci/support-special-parametrizations branch November 15, 2021 12:47

carmocca restored the ci/support-special-parametrizations branch November 16, 2021 17:33

carmocca deleted the ci/support-special-parametrizations branch November 16, 2021 17:38

carmocca mentioned this pull request Nov 16, 2021

Support special test parametrizations #10569

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support special test parametrizations #10462

Support special test parametrizations #10462

Uh oh!

carmocca commented Nov 10, 2021 •

edited

Loading

Uh oh!

Uh oh!

github-actions bot commented Nov 12, 2021

Uh oh!

github-actions bot commented Nov 12, 2021

Uh oh!

github-actions bot commented Nov 12, 2021

Uh oh!

tchaton commented Nov 15, 2021 •

edited

Loading

Uh oh!

carmocca commented Nov 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support special test parametrizations #10462

Support special test parametrizations #10462

Uh oh!

Conversation

carmocca commented Nov 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

PR review

Uh oh!

Uh oh!

github-actions bot commented Nov 12, 2021

Uh oh!

github-actions bot commented Nov 12, 2021

Uh oh!

github-actions bot commented Nov 12, 2021

Uh oh!

tchaton commented Nov 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carmocca commented Nov 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

carmocca commented Nov 10, 2021 •

edited

Loading

tchaton commented Nov 15, 2021 •

edited

Loading