-
Notifications
You must be signed in to change notification settings - Fork 93
Summarised the geometric meaning of push-forwards and pullbacks #135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -80,9 +80,6 @@ Almost always the _pushforward_/_pullback_ will be declared locally within the ` | |||||
The **pushforward** of ``f`` takes the _sensitivity_ of the input of ``f`` to a quantity, and gives the _sensitivity_ of the output of ``f`` to that quantity | ||||||
The **pullback** of ``f`` takes the _sensitivity_ of a quantity to the output of ``f``, and gives the _sensitivity_ of that quantity to the input of ``f``. | ||||||
|
||||||
#### Math | ||||||
This is all a bit simplied by talking in 1D. | ||||||
|
||||||
##### Lighter Math | ||||||
For a chain of expressions: | ||||||
``` | ||||||
|
@@ -118,6 +115,68 @@ then I can use the pushforward to find ``\dfrac{∂f}{∂x}`` | |||||
|
||||||
``\dfrac{∂f}{∂x}=\mathrm{pushforward}_{h(b)|b=g(x)}\left(\left.\dfrac{∂g}{∂a}\right|_{a=x}\right)`` | ||||||
|
||||||
##### Geometric interpretation of reverse and forwards mode AD | ||||||
|
||||||
Let us think of our types geometrically. In other words, elements of a type form a _manifold_. | ||||||
This document will explain this point of view in some detail. | ||||||
|
||||||
###### Some terminology/conventions. | ||||||
|
||||||
Let ``p`` be an element of type M, which is defined by some assignment of numbers ``x_1,...,x_m``, | ||||||
say ``(x_1,...,x_m) = (a_1,...,1_m)`` | ||||||
|
||||||
A _function_ ``f:M -> K`` on ``M`` is (for simplicity) a polynomial ``K[x_1, ... x_m]`` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this really make things simpler? I would probably just require that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure what the right set of functions in the AD context is. I don't think "smooth" or even "analytic" are exactly right. I feel these two kind of imply a-priori definition of symbolic derivatives, infinite limits or some sort of finite differences. I think we basically combining rational functions with look-ups? (add, subtract, multiply, divide, lookup table) |
||||||
|
||||||
The tangent space ``T_pM`` of ``T`` is the ``K``-vector space spanned by derivations ``d/dx``. | ||||||
aisopous marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
The tangent space acts linearly on the space of functions. They act as usual on functions. Our starting point is | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I might add here, that they map a curve through There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Curves are a nice interpretation of tangent vectors, but I'm not convinced they are the right one for AD, since I don't think they necessarily add anything. For reference, we can define an isomorphism |
||||||
that we know how to write down ``d/dx(f) = df/dx``. | ||||||
|
||||||
The collection of tangent spaces ``{T_pM}`` for ``p\in M`` is called the _tangent bundle_ of ``M``. | ||||||
|
||||||
Let ``df`` denote the first order information of ``f`` at each point. This is called the differential of ``f``. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "first order information" sounds a bit vague to me. Can't we define There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The distinction between tangents and functions as duals to one another is crucial here. For example, Griehwank defines (IIRC) The technical definition should be as follows: I'm not committed to this "algebraic" view of things, I am simply keen to see if this would be clearer than the "smooth" view. Example (maps into non-smooth spaces)struct PointOnDegenerateConic
x::float
y::float
function PointOnDegenerateConic(x, y)
@assert x*y == 0
new(x, y)
end At any point away from the origin, everything should look just like on a real line. With the non-smooth definitions, we can also make sense of the origin for free. At origin, we have a two-dimensional tangent space -- the vector space dual to linear functions of function ProjectToConic(x, y)
if |x| == |y|
return (0, 0)
return |x| > |y| ? (x, 0) : (0, y)
end Again we have a non-smooth mapping, but pull-backs of linearised functions on the conic, and push-forwards of vectors in |
||||||
If the derivatives of ``f`` and ``g`` agree at ``p``, we say that ``df`` and ``dg`` represent the same cotangent at ``p``. | ||||||
The covectors ``dx_1, ..., dx_m`` form the basis of the cotangent space T^*_pM at ``p``. Notice that this vector space is | ||||||
dual to ``T_p`` | ||||||
|
||||||
The collection of cotangent spaces ``{T^*_pM}`` for ``p\in M`` is called the _cotangent bundle_ of ``M``. | ||||||
|
||||||
###### Push-forwards and pullbacks | ||||||
|
||||||
Let ``N`` be another type, defined by numbers ``y_1,...,y_n``, and let ``g:M -> N`` be a _map_, that is, | ||||||
an ``n``-dimensional vector ``(g_1, ..., g_m)`` of functions on ``M``. | ||||||
|
||||||
We define the _push-forward_ ``g_*:TM -> TN`` between tangent bundles by ``g_*(X)(h) = X(g\circ h)`` for any tangent vector ``X`` and function ``f``. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
We have ``g_*(d/dx_i)(y_j) = dg_j/dx_i, so the push-forward is equal to the Jacobian when written in coordinates. | ||||||
aisopous marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
||||||
Similarly, the pullback of the differential ``df`` is defined by | ||||||
``g^*(df) = d(g\circ f)``. So for a coordinate differential ``dy_j``, we have | ||||||
``g^*(dy_j) = d(g_j)``. Notice that this is a covector, and we could have defined the pullback by its action on vectors by | ||||||
``g^*(dh)(X) = g_*(X)(dh) = X(g\circ h)`` for any function ``f`` on ``N`` and ``X\in TM``. In particular, | ||||||
``g^*(dy_j)(d/dx_i) = d(g_j)/dx_i``. If you work out the action in a basis of the cotangent space, you see that it acts | ||||||
by the adjoint of the Jacobian. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I might add:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, the tangent space as I've defined it would in fact be only |
||||||
|
||||||
Notice that the pullback of a differential and the pushforward of a vector have a very different meaning, and this should | ||||||
be reflected on how they are used in code. | ||||||
|
||||||
The information contained in the push-forward map is exactly _what does my function do to tangent vectors_. | ||||||
Pullbacks, acting on differentials of functions, act by taking the first order information of a function. | ||||||
aisopous marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
This works in a coordinate invariant way, and works without the notion of a metric. | ||||||
_Gradients_ recall are vectors, yet they should contain the same information of the differential ``df``. | ||||||
Assuming we use the standard euclidean metric, we can identify ``df`` and ``\nabla f`` as vectors. | ||||||
But pulling back gradients still should not be a thing. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not really sure, what you mean here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What I should explain here, is how pull-backs are applied to gradients via the identification of linear functions and vectors by |
||||||
|
||||||
If the goal is to evaluate the gradient of a function ``f=g\circ h:M -> N -> K``, where ``g`` is a map and ``h`` is a function, | ||||||
we have two obvious options: | ||||||
First, we may push-forward a basis of ``M`` to ``TK`` which we identify with K itself. | ||||||
This results in ``m`` scalars, representing components of the gradient. | ||||||
Step-by-step in coordinates: | ||||||
1. Compute the push-forward of the basis of ``T_pM``, i.e. just the columns of the Jacobian ``dg_i/dx_j``. | ||||||
2. Compute the push-forward of the function ``h`` (consider it as a map, K is also a manifold!) to get ``h_*(g_*T_pM) = \sum_j dh/dy_i (dg_i/dx_j) | ||||||
|
||||||
Second, we pull back the differential ``dh``: | ||||||
1. compute ``dh = dh/dy_1,...,dh/dy_n`` in coordinates. | ||||||
2. pull back by (in coordinates) multiplying with the adjoint of the Jacobian, resulting in ``g_*(dh) = \sum_i(dg_i/dx_j)(dh/dy_i)``. | ||||||
|
||||||
|
||||||
#### The anatomy of pushforward and pullback | ||||||
|
||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite get what you want to say here, do you mean how a type is represented in memory? Wouldn't we also want to require some type of "smoothness", so we can do calculus on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right in saying that manifold should imply smoothness.
I wish to interpret things geometrically, which is to say I am interested in some geometric structure beyond the "set" of elements. Sometimes the word "space" is used. Smoothness isn't necessarily something we want to think about: Push-forwards and pull-backs can be defined without it.