Parent (i.e. non-leaf) Modules w/ Parameters Not Properly Handled By Finetuning Callback

## 🐛 Bug

Parent (i.e. non-leaf) modules that have parameters themselves (rather than just their nested submodules having parameters) are not properly handled by several finetuning callback methods. The parent module parameters are ignored by all BaseFineTuning methods that depend upon BaseFineTuning.flatten_modules. I'll add a PR to address the issue shortly but am including a test to replicate the issue below. As a practical example of the issue, I initially encountered this bug when using BaseFinetuning w/ deberta, specifically, the [DisentangledSelfAttention](https://github.com/huggingface/transformers/blob/f8bd8c6c7eef0a7280cd58f89016ede5b77f142f/src/transformers/models/deberta/modeling_deberta.py#L502) parent module

#### New test to reproduce (tests/callbacks/test_finetuning_callback.py):
```python
def test_parent_module_w_param_model():
    """Test flattening, freezing, and thawing of models which contain parent (non-leaf) modules with parameters
    directly themselves rather than exclusively their submodules containing parameters.
    """
    class ConvBlock(nn.Module):

        def __init__(self, in_channels, out_channels):
            super().__init__()
            self.conv = nn.Conv2d(in_channels, out_channels, 3)
            self.act = nn.ReLU()
            self.bn = nn.BatchNorm2d(out_channels)

        def forward(self, x):
            x = self.conv(x)
            x = self.act(x)
            return self.bn(x)

    class ConvBlockParam(nn.Module):

        def __init__(self, in_channels, out_channels):
            super().__init__()
            self.conv = nn.Conv2d(in_channels, out_channels, 3)
            self.act = nn.ReLU()
            # add trivial test parameter to conv block to validate parent (non-leaf) module parameter handling
            self.parent_param = nn.Parameter(torch.zeros((1), dtype=torch.float))
            self.bn = nn.BatchNorm2d(out_channels)

        def forward(self, x):
            x = self.conv(x)
            x = self.act(x)
            return self.bn(x)

    model = nn.Sequential(
        OrderedDict([
            ("encoder", nn.Sequential(ConvBlockParam(3, 64), ConvBlock(64, 128))),
            ("decoder", ConvBlock(128, 10)),
        ])
    )

    # There are 10 leaf modules or parent modules w/ parameters in the test model
    assert len(BaseFinetuning.flatten_modules(model)) == 10

    BaseFinetuning.freeze(model.encoder, train_bn=True)
    assert not model.encoder[0].conv.weight.requires_grad  # Validate a leaf module parameter is frozen
    assert not model.encoder[0].parent_param.requires_grad  # Validate the parent module parameter is frozen
    assert model.encoder[0].bn.weight.requires_grad

    BaseFinetuning.make_trainable(model)
    encoder_params = list(BaseFinetuning.filter_params(model.encoder, train_bn=True))
    # The 9 parameters of the encoder are:
    # conv0.weight, conv0.bias, bn0.weight, bn0.bias, parent_param
    # conv1.weight, conv1.bias, bn1.weight, bn1.bias
    assert len(encoder_params) == 9
```
### Expected behavior

parent_param in the above example model with nested modules should be appropriately handled by BaseFinetuning.[flatten_modules, freeze, make_trainable, filter_params] instead of omitted.

### Environment
* CUDA:
        - GPU:
                - GeForce RTX 2070 SUPER
                - GeForce RTX 2070
        - available:         True
        - version:           11.1
* Packages:
        - numpy:             1.20.2
        - pyTorch_debug:     False
        - pyTorch_version:   1.8.1
        - pytorch-lightning: 1.4.0dev
        - tqdm:              4.61.0
* System:
        - OS:                Linux
        - architecture:
                - 64bit
                - ELF
        - processor:         x86_64
        - python:            3.8.10
        - version:           #80-Ubuntu SMP Mon Apr 12 17:35:00 UTC 2021

### Additional context

I've got a fix ready and will be submitting a PR shortly. Thanks for all the great work on this awesome framework!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parent (i.e. non-leaf) Modules w/ Parameters Not Properly Handled By Finetuning Callback #7930

🐛 Bug

New test to reproduce (tests/callbacks/test_finetuning_callback.py):

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parent (i.e. non-leaf) Modules w/ Parameters Not Properly Handled By Finetuning Callback #7930

Description

🐛 Bug

New test to reproduce (tests/callbacks/test_finetuning_callback.py):

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions