Skip to content

Weight Normalization for multi-gpu #1276

@MokkeMeguru

Description

@MokkeMeguru

Describe the feature and the current behavior/state.

m_init, v_init = tf.nn.moments(x_init, data_norm_axes)

to

ctx = tf.distribute.get_replica_context()
n = ctx.num_replicas_in_sync

m_init = tf.reduce_mean(x_init, axis=data_norm_axes)
v_init = tf.reduce_mean(tf.square(x_init), axis=data_norm_axes)
m_init, v_init = ctx.all_reduce(
    tf.distribute.ReduceOp.SUM,
    [m_init, v_init]
)

m_init = m_init / n
v_init = v_init / n

Relevant information

  • Are you willing to contribute it (yes/no): yes
  • Are you willing to maintain it going forward? (yes/no): yes
  • Is there a relevant academic paper? (if so, where): no
  • Is there already an implementation in another framework? (if so, where): no
  • Was it part of tf.contrib? (if so, where): no

Which API type would this fall under (layer, metric, optimizer, etc.)
Weght Normalization layer
https://github.com/tensorflow/addons/blob/v0.7.1/tensorflow_addons/layers/wrappers.py#L191

Who will benefit with this feature?
Someone wants to train a large model.

Any other info.
Your data-dependent initialization is so GREAT, Thanks

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions