-
Notifications
You must be signed in to change notification settings - Fork 93
Improvements to numeric exponent rules #224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just needs version bump.
Thanks! This function's pretty important, so I'll hold off on merging for a few days for others to review. |
It's also not clear to me whether this should be considered a breaking change. |
Hmm yeah, it depends on whether or not we call this a bug. I want to call it a bug because the previous behaviour is a bit odd and is inconsistent with what we're generally aiming for. However, it might cause some people's code to change behaviour. |
Yeah, the main (only?) place a user could see a change would be if they passed the exponent as an int to an entry point of AD, e.g.: julia> Zygote.gradient((x, p) -> x^real(p), -10.0, 2) # real's pullback drops the imaginary part on master:(-20.0, 230.25850929940458) this pr(-20.0, 0.0) But realistically, the gradient on master cannot be used for anything like gradient descent, because if the 2 is perturbed by a non-integer value, then the gradient will raise a |
Sounds like old behavour was a bug. |
I think so too. If there are no objections, I'll merge on Tuesday (after bumping version number). |
This PR makes a few improvements to the rules for
^(::Number, ::Number)
.For the general rule, it slightly improves the efficiency by removing the second
^
call.It also adds a new real rule that avoids unnecessarily complexifying the tangents and cotangents. The general rule embeds the numbers in the complex plane for complex differentiation. However, for real negative base, exponentiation is undefined (literally throws an error) unless the exponent is exactly an integer. So for negative base, the derivative wrt the exponent is actually undefined (hence we can't even call FD on it). The new rule adopts the subgradient convention when the base is negative.
Oh, and since the rules are defined in
fastmath.jl
, I moved the test to the corresponding test file.Here are are a few examples:
Example 1:
frule
with positive real base, real exponentOnly the type has changed. Instead of getting a purely real complex tangent, we just get the equivalent real tangent. The
rrule
has the same behavior.on master:
this pr:
Example 2:
frule
with negative real base, integer exponentWe get the same tangent as we would have gotten had the input tangent on
p
beenZero()
on master:
this pr:
Example 3:
rrule
with negative real base, integer exponentThe cotangent on
p
is 0.on master:
this pr: