-
Notifications
You must be signed in to change notification settings - Fork 93
Summarised the geometric meaning of push-forwards and pullbacks #135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Awesome, thanks. I think it should be in its own file. |
Is the reason, the PR preview isn't working, that the branch is on a different fork? |
Yes, I think so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this, @aisopous! Just a couple of nit-picky things, but this is great already, so to me it would be ok to merge anyways. Keep in mind that a lot of my knowledge in this area is self-thought, so please don't hesitate to correct me, if I'm wrong on some of these things.
|
||
###### Some terminology/conventions. | ||
|
||
Let ``p`` be an element of type M, which is defined by some assignment of numbers ``x_1,...,x_m``, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite get what you want to say here, do you mean how a type is represented in memory? Wouldn't we also want to require some type of "smoothness", so we can do calculus on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right in saying that manifold should imply smoothness.
I wish to interpret things geometrically, which is to say I am interested in some geometric structure beyond the "set" of elements. Sometimes the word "space" is used. Smoothness isn't necessarily something we want to think about: Push-forwards and pull-backs can be defined without it.
Let ``p`` be an element of type M, which is defined by some assignment of numbers ``x_1,...,x_m``, | ||
say ``(x_1,...,x_m) = (a_1,...,1_m)`` | ||
|
||
A _function_ ``f:M -> K`` on ``M`` is (for simplicity) a polynomial ``K[x_1, ... x_m]`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really make things simpler? I would probably just require that f
is analytic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the right set of functions in the AD context is. I don't think "smooth" or even "analytic" are exactly right. I feel these two kind of imply a-priori definition of symbolic derivatives, infinite limits or some sort of finite differences.
I think we basically combining rational functions with look-ups? (add, subtract, multiply, divide, lookup table)
A _function_ ``f:M -> K`` on ``M`` is (for simplicity) a polynomial ``K[x_1, ... x_m]`` | ||
|
||
The tangent space ``T_pM`` of ``T`` is the ``K``-vector space spanned by derivations ``d/dx``. | ||
The tangent space acts linearly on the space of functions. They act as usual on functions. Our starting point is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might add here, that they map a curve through p
to the derivative of f
in that direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curves are a nice interpretation of tangent vectors, but I'm not convinced they are the right one for AD, since I don't think they necessarily add anything.
For reference, we can define an isomorphism
maps from the first order neighborhood of 0 in K
(infinitesimal curves) -> T_pM
by taking the derivation in the direction of the curve.
|
||
The collection of tangent spaces ``{T_pM}`` for ``p\in M`` is called the _tangent bundle_ of ``M``. | ||
|
||
Let ``df`` denote the first order information of ``f`` at each point. This is called the differential of ``f``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"first order information" sounds a bit vague to me. Can't we define df
as element of the tangent space T_{f(p)}K
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The distinction between tangents and functions as duals to one another is crucial here.
For example, Griehwank defines (IIRC) df
at any p
as the linear function, vanishing at p
, which approximates f
to first order. Notice that then, abusing terminology, we may write f = f(p) + df_p
The technical definition should be as follows:
Let m
be the set of functions vanishing at p
. Then there is a natural map from functions on M
(rational functions with lookups, say, and let's denote it by K(M)
) to functions vanishing to order 1, i.e.d: K(M) -> m/m^2
, defined by f \mapsto f - f(p)
modulo m^2
. This is precisely enough to algebraically specify rules of differentiation, and give us parametrisations of tangent and cotangent spaces.
I'm not committed to this "algebraic" view of things, I am simply keen to see if this would be clearer than the "smooth" view.
Example (maps into non-smooth spaces)
struct PointOnDegenerateConic
x::float
y::float
function PointOnDegenerateConic(x, y)
@assert x*y == 0
new(x, y)
end
At any point away from the origin, everything should look just like on a real line. With the non-smooth definitions, we can also make sense of the origin for free. At origin, we have a two-dimensional tangent space -- the vector space dual to linear functions of x
and y
. Path interpretation is not really helpful here, unless we are okay with infinitesimal paths that lead nowhere.
function ProjectToConic(x, y)
if |x| == |y|
return (0, 0)
return |x| > |y| ? (x, 0) : (0, y)
end
Again we have a non-smooth mapping, but pull-backs of linearised functions on the conic, and push-forwards of vectors in R^2
to vectors on the conic should be pretty interpretable. Crucially, they can also be computed without any fuss!
Let ``N`` be another type, defined by numbers ``y_1,...,y_n``, and let ``g:M -> N`` be a _map_, that is, | ||
an ``n``-dimensional vector ``(g_1, ..., g_m)`` of functions on ``M``. | ||
|
||
We define the _push-forward_ ``g_*:TM -> TN`` between tangent bundles by ``g_*(X)(h) = X(g\circ h)`` for any tangent vector ``X`` and function ``f``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We define the _push-forward_ ``g_*:TM -> TN`` between tangent bundles by ``g_*(X)(h) = X(g\circ h)`` for any tangent vector ``X`` and function ``f``. | |
We define the _push-forward_ ``g_*:TM -> TN`` between tangent bundles by ``g_*(X)(h) = X(g\circ h)`` for any tangent vector ``X`` and smooth, real-valued function ``h``. |
``g^*(dy_j) = d(g_j)``. Notice that this is a covector, and we could have defined the pullback by its action on vectors by | ||
``g^*(dh)(X) = g_*(X)(dh) = X(g\circ h)`` for any function ``f`` on ``N`` and ``X\in TM``. In particular, | ||
``g^*(dy_j)(d/dx_i) = d(g_j)/dx_i``. If you work out the action in a basis of the cotangent space, you see that it acts | ||
by the adjoint of the Jacobian. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might add:
Note that for complex functions, how you define the adjoint of the Jacobian depends on the basis you choose as covectors, for example ``dRe(z), dIm(z)`` or ``dz, dz̅``
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the tangent space as I've defined it would in fact be only n
dimensional, whereas complexifying the tangent space of the underlying real manifold leads to a 2n
dimensional tangent space or holomorphic and anti-holomorphic vectors. I've only defined push-forwards (and pull-backs) of holomorphic (co)-tangents.
This works in a coordinate invariant way, and works without the notion of a metric. | ||
_Gradients_ recall are vectors, yet they should contain the same information of the differential ``df``. | ||
Assuming we use the standard euclidean metric, we can identify ``df`` and ``\nabla f`` as vectors. | ||
But pulling back gradients still should not be a thing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure, what you mean here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I should explain here, is how pull-backs are applied to gradients via the identification of linear functions and vectors by
v <--> w \mapsto <v, w>
(inner product with a fixed vector is a linear function on the dual of V), and why thinking about pulling back co-vectors may be a better idea. Off the top of my head, I am not sure why. There is the whole story of not choosing an inner product just to do AD, of course, which might be useful if we want to do things that are not gradient descent.
Co-Authored-By: simeonschaub <[email protected]>
Co-Authored-By: simeonschaub <[email protected]>
Co-Authored-By: simeonschaub <[email protected]>
moved to JuliaDiff/ChainRulesCore.jl#147 |
Summarised the content of earlier slightly imprecise discussion on what pushforwards and pullbacks are. Probably should be rewritten, but I want to know which parts are readable and which aren't.