Skip to content

Conversation

aisopous
Copy link

@aisopous aisopous commented Dec 7, 2019

Summarised the content of earlier slightly imprecise discussion on what pushforwards and pullbacks are. Probably should be rewritten, but I want to know which parts are readable and which aren't.

@oxinabox
Copy link
Member

oxinabox commented Dec 7, 2019

Awesome, thanks.
I will try and review this week

I think it should be in its own file.

@simeonschaub
Copy link
Member

Is the reason, the PR preview isn't working, that the branch is on a different fork?

@oxinabox
Copy link
Member

oxinabox commented Dec 9, 2019

Is the reason, the PR preview isn't working, that the branch is on a different fork?

Yes, I think so.
No deploy key.

Copy link
Member

@simeonschaub simeonschaub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this, @aisopous! Just a couple of nit-picky things, but this is great already, so to me it would be ok to merge anyways. Keep in mind that a lot of my knowledge in this area is self-thought, so please don't hesitate to correct me, if I'm wrong on some of these things.


###### Some terminology/conventions.

Let ``p`` be an element of type M, which is defined by some assignment of numbers ``x_1,...,x_m``,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite get what you want to say here, do you mean how a type is represented in memory? Wouldn't we also want to require some type of "smoothness", so we can do calculus on it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right in saying that manifold should imply smoothness.

I wish to interpret things geometrically, which is to say I am interested in some geometric structure beyond the "set" of elements. Sometimes the word "space" is used. Smoothness isn't necessarily something we want to think about: Push-forwards and pull-backs can be defined without it.

Let ``p`` be an element of type M, which is defined by some assignment of numbers ``x_1,...,x_m``,
say ``(x_1,...,x_m) = (a_1,...,1_m)``

A _function_ ``f:M -> K`` on ``M`` is (for simplicity) a polynomial ``K[x_1, ... x_m]``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really make things simpler? I would probably just require that f is analytic.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the right set of functions in the AD context is. I don't think "smooth" or even "analytic" are exactly right. I feel these two kind of imply a-priori definition of symbolic derivatives, infinite limits or some sort of finite differences.

I think we basically combining rational functions with look-ups? (add, subtract, multiply, divide, lookup table)

A _function_ ``f:M -> K`` on ``M`` is (for simplicity) a polynomial ``K[x_1, ... x_m]``

The tangent space ``T_pM`` of ``T`` is the ``K``-vector space spanned by derivations ``d/dx``.
The tangent space acts linearly on the space of functions. They act as usual on functions. Our starting point is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might add here, that they map a curve through p to the derivative of f in that direction.

Copy link
Author

@aisopous aisopous Dec 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curves are a nice interpretation of tangent vectors, but I'm not convinced they are the right one for AD, since I don't think they necessarily add anything.

For reference, we can define an isomorphism
maps from the first order neighborhood of 0 in K (infinitesimal curves) -> T_pM
by taking the derivation in the direction of the curve.


The collection of tangent spaces ``{T_pM}`` for ``p\in M`` is called the _tangent bundle_ of ``M``.

Let ``df`` denote the first order information of ``f`` at each point. This is called the differential of ``f``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"first order information" sounds a bit vague to me. Can't we define df as element of the tangent space T_{f(p)}K?

Copy link
Author

@aisopous aisopous Dec 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The distinction between tangents and functions as duals to one another is crucial here.

For example, Griehwank defines (IIRC) df at any p as the linear function, vanishing at p, which approximates f to first order. Notice that then, abusing terminology, we may write f = f(p) + df_p

The technical definition should be as follows:
Let m be the set of functions vanishing at p. Then there is a natural map from functions on M (rational functions with lookups, say, and let's denote it by K(M)) to functions vanishing to order 1, i.e.d: K(M) -> m/m^2, defined by f \mapsto f - f(p) modulo m^2. This is precisely enough to algebraically specify rules of differentiation, and give us parametrisations of tangent and cotangent spaces.

I'm not committed to this "algebraic" view of things, I am simply keen to see if this would be clearer than the "smooth" view.

Example (maps into non-smooth spaces)

struct PointOnDegenerateConic
x::float
y::float

function PointOnDegenerateConic(x, y)
    @assert x*y == 0
    new(x, y)
end

At any point away from the origin, everything should look just like on a real line. With the non-smooth definitions, we can also make sense of the origin for free. At origin, we have a two-dimensional tangent space -- the vector space dual to linear functions of x and y. Path interpretation is not really helpful here, unless we are okay with infinitesimal paths that lead nowhere.

function ProjectToConic(x, y) 
    if |x| == |y|
         return (0, 0)
    return |x| > |y| ? (x, 0) : (0, y)
end

Again we have a non-smooth mapping, but pull-backs of linearised functions on the conic, and push-forwards of vectors in R^2 to vectors on the conic should be pretty interpretable. Crucially, they can also be computed without any fuss!

Let ``N`` be another type, defined by numbers ``y_1,...,y_n``, and let ``g:M -> N`` be a _map_, that is,
an ``n``-dimensional vector ``(g_1, ..., g_m)`` of functions on ``M``.

We define the _push-forward_ ``g_*:TM -> TN`` between tangent bundles by ``g_*(X)(h) = X(g\circ h)`` for any tangent vector ``X`` and function ``f``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We define the _push-forward_ ``g_*:TM -> TN`` between tangent bundles by ``g_*(X)(h) = X(g\circ h)`` for any tangent vector ``X`` and function ``f``.
We define the _push-forward_ ``g_*:TM -> TN`` between tangent bundles by ``g_*(X)(h) = X(g\circ h)`` for any tangent vector ``X`` and smooth, real-valued function ``h``.

``g^*(dy_j) = d(g_j)``. Notice that this is a covector, and we could have defined the pullback by its action on vectors by
``g^*(dh)(X) = g_*(X)(dh) = X(g\circ h)`` for any function ``f`` on ``N`` and ``X\in TM``. In particular,
``g^*(dy_j)(d/dx_i) = d(g_j)/dx_i``. If you work out the action in a basis of the cotangent space, you see that it acts
by the adjoint of the Jacobian.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might add:

Note that for complex functions, how you define the adjoint of the Jacobian depends on the basis you choose as covectors, for example ``dRe(z), dIm(z)`` or ``dz, dz̅``

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the tangent space as I've defined it would in fact be only n dimensional, whereas complexifying the tangent space of the underlying real manifold leads to a 2n dimensional tangent space or holomorphic and anti-holomorphic vectors. I've only defined push-forwards (and pull-backs) of holomorphic (co)-tangents.

This works in a coordinate invariant way, and works without the notion of a metric.
_Gradients_ recall are vectors, yet they should contain the same information of the differential ``df``.
Assuming we use the standard euclidean metric, we can identify ``df`` and ``\nabla f`` as vectors.
But pulling back gradients still should not be a thing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure, what you mean here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I should explain here, is how pull-backs are applied to gradients via the identification of linear functions and vectors by
v <--> w \mapsto <v, w> (inner product with a fixed vector is a linear function on the dual of V), and why thinking about pulling back co-vectors may be a better idea. Off the top of my head, I am not sure why. There is the whole story of not choosing an inner product just to do AD, of course, which might be useful if we want to do things that are not gradient descent.

@nickrobinson251
Copy link
Contributor

aisopous and others added 3 commits March 20, 2020 10:31
Co-Authored-By: simeonschaub <[email protected]>
Co-Authored-By: simeonschaub <[email protected]>
Co-Authored-By: simeonschaub <[email protected]>
@oxinabox
Copy link
Member

moved to JuliaDiff/ChainRulesCore.jl#147

@oxinabox oxinabox closed this Apr 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants