-
Notifications
You must be signed in to change notification settings - Fork 617
Description
It's an issue switched from tf repo (here) and here is the pending pr I've sent there. According to the reviewer's suggestion, I probably should add the module to here first.
Describe the feature and the current behavior/state.
Mixout is a module proposed here. In short, it resembles dropout, but rather than setting the randomly selected weights to zero, it replaces them with the weights in the pre-trained model. By doing so it helps to improve the stability in downstream fine-tuning tasks.Will this change the current api? How?
Yes, it would require a new API like tf.nn.mixout with similar signature with tf.nn.dropoutWho will benefit with this feature?
People who wanna use BERT in downstream tasks with small datasets. This feature (as claimed in the paper) improve stability.Any Other info.
A pytorch version has been provided by the author.
Relevant information
-
Are you willing to contribute it: yes
-
Are you willing to maintain it going forward? yes
-
Is there a relevant academic paper? yes, here
-
Is there already an implementation in another framework? there is a pytorch version provided by the author, yet I don't think it's merged in the framework.
-
Was it part of tf.contrib? (if so, where): no
Which API type would this fall under (layer, metric, optimizer, etc.)
custom_ops (since it's categorized under tensorflow/python/ops/nn_ops
), yet I'm not sure which folder I shall add it to (among activation/layer/image/seq2seq/text
)