Skip to content

Very slow first-time gradient calculation #1119

@pxl-th

Description

@pxl-th

Hi. I'm currently implementing self-supervised depth estimation network, similar to Monodepth2 and would like to point out a possible performance concern.

Taking gradient for the first time takes very long time. On CPU it is around 5 minutes, while on GPU it is ~10-11 minutes.
Further iterations are much faster though (~8 seconds on CPU, ~0.5 seconds on GPU). But this makes iterating on the code quite painful.

Model has UNet-like architecture, with EfficientNet-B0 as encoder and decoder that emits outputs (disparities) at every upsampling level. The loss function is then calculated over these outputs. I can post code a bit later, once it is in a more presentable form, but for the architecture it is similar to Segmentation.jl.

To get the timings I just timed gradient calculation:

@time= gradient(θ) do
    train_loss(model, x, train_parameters)
end

Timings for taking first gradient:

  • CPU
330.033668 seconds (401.83 M allocations: 24.610 GiB, 2.25% gc time, 97.00% compilation time)
  • GPU
675.810394 seconds (602.73 M allocations: 30.976 GiB, 2.96% gc time, 88.58% compilation time)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions