Skip to content

[RFC] Introduce strategy flag to Trainer #9053

@kaushikb11

Description

@kaushikb11

🚀 Feature

Motivation

The motivation is to have a separate accelerator_strategy flag to support passing training type aliases (ddp, ddp_spawn, etc) and custom TrainingTypePlugin objects.

Trainer(strategy="ddp", accelerator="gpu", devices=4)
Trainer(strategy=DDPPlugin(find_unused_parameters=False), accelerator="gpu", devices=4)
Trainer(strategy="ddp_spawn", accelerator="cpu", devices=4)
Trainer(strategy="ddp_spawn", accelerator="tpu", devices=4)

xxxxxxxxxxxxxx

Background

At the moment, there’s a single flag accelerator tied for Accelerators as well as Training Type plugins. We wish to have them decoupled and would like to add a separate flag accelerator_strategy for Training Type plugins!

trainer = Trainer(accelerator=GPUaccelerator(..))
trainer = Trainer(accelerator='ddp_spawn')

Alternate flags to set Training Types

  • accelerator
    • type: Optional[Union[str, Accelerator]] = None
    • Supports training types and Accelerator Objects
  • distributed_backend
    • type: Optional[str] = None
    • Deprecated, should use accelerator instead
  • plugins
    • type: Optional[Union[List[Union[Plugin, ClusterEnvironment, str]], Plugin, ClusterEnvironment, str]] = None
    • Supports custom lightning plugins & environment

What's the difference between passing training type to accelerator, distributed_backend, or plugins?

  • accelerator and distributed_backend only support DistributedType (ddp, ddp_spawn, etc), whereas plugins support Custom Training Types (DDPPlugin(), ddp_find_unused_parameters_false, etc).

xxxxxxxxxxxxxxxxxxxxx

Proposed Solution

  • Introduce strategy flag to Trainer.
  • Support the exceptions and deprecations mentioned below

Exceptions:

  • Trainer(distributed_backend="ddp_cpu", strategy="ddp_spawn")
  • Trainer(accelerator="ddp", strategy="ddp_spawn")
  • Trainer(plugins="ddp_find_unused_parameters_false", strategy="ddp_spawn")

Deprecations: (Deprecated in v1.5 & will be removed in v1.6)

  • Passing training type to accelerator flag
  • Passing training type to plugins flag

xxxxxxxxxxxxxxxxxxxxx

Related PR: #8597
Related Issue: #6090

If you agree with this change, react with 🎉, if not then 🙅🏽 with comments.

Alternatives

  • Only deprecate passing the TrainingTypePlugin into the plugins argument not the accelerator argument.
  • Use simpler strategy argument instead of accelerator_strategy.

Additional context


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning

  • Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch

  • Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

Metadata

Metadata

Assignees

No one assigned

    Labels

    designIncludes a design discussionfeatureIs an improvement or enhancementhelp wantedOpen to be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions