Weight Normalization for multi-gpu

**Describe the feature and the current behavior/state.**
```
m_init, v_init = tf.nn.moments(x_init, data_norm_axes)
```
to 
```
ctx = tf.distribute.get_replica_context()
n = ctx.num_replicas_in_sync

m_init = tf.reduce_mean(x_init, axis=data_norm_axes)
v_init = tf.reduce_mean(tf.square(x_init), axis=data_norm_axes)
m_init, v_init = ctx.all_reduce(
    tf.distribute.ReduceOp.SUM,
    [m_init, v_init]
)

m_init = m_init / n
v_init = v_init / n
```

**Relevant information**
- Are you willing to contribute it (yes/no): yes
- Are you willing to maintain it going forward? (yes/no): yes
- Is there a relevant academic paper? (if so, where): no
- Is there already an implementation in another framework? (if so, where): no
- Was it part of tf.contrib? (if so, where): no

**Which API type would this fall under (layer, metric, optimizer, etc.)**
Weght Normalization layer
https://github.com/tensorflow/addons/blob/v0.7.1/tensorflow_addons/layers/wrappers.py#L191

**Who will benefit with this feature?**
Someone wants to train a large model.

**Any other info.**
Your data-dependent initialization is so GREAT, Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Weight Normalization for multi-gpu #1276

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Weight Normalization for multi-gpu #1276

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions