Skip to content

Conversation

@Maru-mee
Copy link

@Maru-mee Maru-mee commented Sep 29, 2025

Hello.

This Pull Request addresses an instability issue during the initial steps of training.

Problem

The current initialization of exp_avg_res_row and exp_avg_res_col to zero causes an abnormally large spike in res_approx at step 0 (e.g. res_approx.mean().items()=100000) , resulting in excessively large parameter updates.

This large initial value leads to two problems:

  1. Model Instability
    Certain layers may sustain damage.
    e.g., resulting in unexpected artifacts or unnatural color shifts in the output.
  2. Persistent Abnormality
    If the user utilizes a high beta[2] value (e.g., > 0.99), this initial abnormal value persists, severely hindering the optimizer's ability to return to a normal state.

Changing point

Non-zero initial value is required to prevent this immediate instability. This PR sets the initial value of the res related states to 1.0.

This change ensures that training starts from a neutral, stable state, effectively preventing catastrophic model failure at the start of the run.

Fix res related states init to 1.0 against res_approx abnormal spike
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant