-
Notifications
You must be signed in to change notification settings - Fork 68
Description
OK, so this is pretty speculative as the reverse differentiation packages are not there yet, but let's dream for a moment. It would be awesome to be able to just use reverse-mode differentiation on code like
function G(α)
F(x) = x - α
x = nlsolve(F, zeros(length(α)))
H(x) = sum(x)
H(x)
end
and take the gradient of G wrt α. Of course, both F
and H
are examples, and can be arbitrary functions.
So how to get the gradient of G? One can of course forward diff through G, which is not too hard to support from the perspective of nlsolve (although I haven't tried). But that's pretty inefficient if α is high-dimensional. One can try reverse-diffing through G, but that's pretty heavy since this has to basically record all the iterations. A better idea is to exploit the mathematical structure of the problem, and in particular the relationship dx/dα = -(∂F/∂x)^-1 ∂F/∂α (differentiate F(x(α),α)=0 wrt α), assuming nlsolve is converged perfectly. Reverse-mode autodiff requires the user to compute (dx/dα)^T δx, which is -(∂F/∂α)^T (∂F/∂x)^-T δx. If the jacobian is not provided (Broyden or Anderson), this can be done by using an iterative solver such as GMRES, and where the individual matvecs with (∂F/∂x)^T are performed with reverse diff.
The action point for nlsolve here is to write a reverse ChainRule (https://github.com/JuliaDiff/ChainRules.jl) for nlsolve. This might be tricky because nlsolve takes a function as argument, but we might get by with just calling a diff function on F recursively. CC @jrevels to check this isn't a completely stupid idea. Of course, this isn't necessarily specific to nlsolve; the same ideas apply to optim (writing ∇F = 0) and diffeq (adjoint equations) for instance.