Skip to content

Inconsistent Behaviour in Broadcasting #1036

@willtebbutt

Description

@willtebbutt

As of version 0.6.15, the following:

julia> Zygote.gradient(^, 0.0, 0.9)
(0.0, 0.0)

julia> Zygote.gradient((x, y) -> sum(x .^ y), zeros(3), fill(0.9, 3))
([0.0, 0.0, 0.0], [0.0, 0.0, 0.0])

julia> f(x, y) = x ^ y;

julia> Zygote.gradient((x, y) -> sum(f.(x, y)), zeros(3), fill(0.9, 3))
([Inf, Inf, Inf], [NaN, NaN, NaN])

The same code in version 0.6.14:

julia> Zygote.gradient(^, 0.0, 0.9)
(0.0, 0.0)

julia> Zygote.gradient((x, y) -> sum(x .^ y), zeros(3), fill(0.9, 3))
([0.0, 0.0, 0.0], [0.0, 0.0, 0.0])

julia> f(x, y) = x ^ y;

julia> Zygote.gradient((x, y) -> sum(f.(x, y)), zeros(3), fill(0.9, 3))
([0.0, 0.0, 0.0], [0.0, 0.0, 0.0])

the crucial difference being in the last case for each.

If you do this with ForwardDiff, you get

julia> ForwardDiff.gradient(x -> x[1]^x[2], [0.0, 0.9])
2-element Vector{Float64}:
  Inf
 NaN

so this smells to me like ForwardDiff is being invoked for f, whereas individual chain rules are used for sum(x .^ y) because Zygote un-fuses broadcasting where it can (it can't un-fuse f).

@oxinabox @mcabbott @DhairyaLGandhi I'm assuming that this is related to our recent efficiency upgrades?

This is breaking the build in KernelFunctions.jl. We can of course allow the tests to fail for the time being, but it essentially means that we can't use Zygote with one of our kernels.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions