Very slow first-time gradient calculation

Hi. I'm currently implementing self-supervised depth estimation network, similar to Monodepth2 and would like to point out a possible performance concern.

Taking gradient for the first time takes very long time. On CPU it is around 5 minutes, while on GPU it is ~10-11 minutes.
Further iterations are much faster though (~8 seconds on CPU, ~0.5 seconds on GPU). But this makes iterating on the code quite painful.

Model has UNet-like architecture, with EfficientNet-B0 as encoder and decoder that emits outputs (disparities) at every upsampling level. The loss function is then calculated over these outputs. I can post code a bit later, once it is in a more presentable form, but for the architecture it is similar to [Segmentation.jl](https://github.com/pxl-th/Segmentation.jl).

To get the timings I just timed gradient calculation:

```julia
@time ∇ = gradient(θ) do
    train_loss(model, x, train_parameters)
end
```

**Timings for taking first gradient:**

- CPU

```
330.033668 seconds (401.83 M allocations: 24.610 GiB, 2.25% gc time, 97.00% compilation time)
```

- GPU

```
675.810394 seconds (602.73 M allocations: 30.976 GiB, 2.96% gc time, 88.58% compilation time)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Very slow first-time gradient calculation #1119

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Very slow first-time gradient calculation #1119

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions