-
-
Notifications
You must be signed in to change notification settings - Fork 216
Description
Hi. I'm currently implementing self-supervised depth estimation network, similar to Monodepth2 and would like to point out a possible performance concern.
Taking gradient for the first time takes very long time. On CPU it is around 5 minutes, while on GPU it is ~10-11 minutes.
Further iterations are much faster though (~8 seconds on CPU, ~0.5 seconds on GPU). But this makes iterating on the code quite painful.
Model has UNet-like architecture, with EfficientNet-B0 as encoder and decoder that emits outputs (disparities) at every upsampling level. The loss function is then calculated over these outputs. I can post code a bit later, once it is in a more presentable form, but for the architecture it is similar to Segmentation.jl.
To get the timings I just timed gradient calculation:
@time ∇ = gradient(θ) do
train_loss(model, x, train_parameters)
end
Timings for taking first gradient:
- CPU
330.033668 seconds (401.83 M allocations: 24.610 GiB, 2.25% gc time, 97.00% compilation time)
- GPU
675.810394 seconds (602.73 M allocations: 30.976 GiB, 2.96% gc time, 88.58% compilation time)