-
Notifications
You must be signed in to change notification settings - Fork 617
Closed
Labels
Description
Describe the feature and the current behavior/state.
m_init, v_init = tf.nn.moments(x_init, data_norm_axes)
to
ctx = tf.distribute.get_replica_context()
n = ctx.num_replicas_in_sync
m_init = tf.reduce_mean(x_init, axis=data_norm_axes)
v_init = tf.reduce_mean(tf.square(x_init), axis=data_norm_axes)
m_init, v_init = ctx.all_reduce(
tf.distribute.ReduceOp.SUM,
[m_init, v_init]
)
m_init = m_init / n
v_init = v_init / n
Relevant information
- Are you willing to contribute it (yes/no): yes
- Are you willing to maintain it going forward? (yes/no): yes
- Is there a relevant academic paper? (if so, where): no
- Is there already an implementation in another framework? (if so, where): no
- Was it part of tf.contrib? (if so, where): no
Which API type would this fall under (layer, metric, optimizer, etc.)
Weght Normalization layer
https://github.com/tensorflow/addons/blob/v0.7.1/tensorflow_addons/layers/wrappers.py#L191
Who will benefit with this feature?
Someone wants to train a large model.
Any other info.
Your data-dependent initialization is so GREAT, Thanks