Skip to content

Conversation

@danieltudosiu
Copy link
Contributor

Description

Added a basic gradient clipping draft for the SupervisedTrainer. The implementation choice must be discussed.

Status

Work in progress

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

Added a basic gradient clipping draft for the SupervisedTrainer.

Signed-off-by: Petru-Daniel Tudosiu <[email protected]>
@danieltudosiu danieltudosiu changed the title [WIP] Gradient clipping draft [WIP] Gradient clipping Apr 7, 2021
@danieltudosiu
Copy link
Contributor Author

@wyli / @Nic-Ma I need your opinion about how to attack this one.

In my mind, each Trainer class must have its own gradient clipping function since the number of optimizers and scalers can vary.

What do you think of the logic draft? Should I write it for the GAN one as well (I observed the GAN one is not on par with the SupervisedTrainer)?

I think the following solutions do not work:

  • backward_hook
    • "inf" norm will not be usable since it needs all model's parameters
  • Ignite Handler
    • It will need to be created after the creation of the engine to be able to access the scaler and amp attributes of the trainer.

@wyli
Copy link
Contributor

wyli commented Mar 15, 2022

closing this, as preferred solutions are described in #3892 (reply in thread)

@wyli wyli closed this Mar 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants