Skip to content

Add Mixout module #1960

@crystina-z

Description

@crystina-z

It's an issue switched from tf repo (here) and here is the pending pr I've sent there. According to the reviewer's suggestion, I probably should add the module to here first.

Describe the feature and the current behavior/state.
Mixout is a module proposed here. In short, it resembles dropout, but rather than setting the randomly selected weights to zero, it replaces them with the weights in the pre-trained model. By doing so it helps to improve the stability in downstream fine-tuning tasks.

Will this change the current api? How?
Yes, it would require a new API like tf.nn.mixout with similar signature with tf.nn.dropout

Who will benefit with this feature?
People who wanna use BERT in downstream tasks with small datasets. This feature (as claimed in the paper) improve stability.

Any Other info.
A pytorch version has been provided by the author.

Relevant information

  • Are you willing to contribute it: yes

  • Are you willing to maintain it going forward? yes

  • Is there a relevant academic paper? yes, here

  • Is there already an implementation in another framework? there is a pytorch version provided by the author, yet I don't think it's merged in the framework.

  • Was it part of tf.contrib? (if so, where): no

Which API type would this fall under (layer, metric, optimizer, etc.)
custom_ops (since it's categorized under tensorflow/python/ops/nn_ops), yet I'm not sure which folder I shall add it to (among activation/layer/image/seq2seq/text)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions