Add HSGP Latent GP approximation #6458

bwengals · 2023-01-18T19:04:14Z

This PR add's the basics required for the HSGP GP approximation. It replaces #6036, much thanks to @ferrine for rebasing and cleaning that PR up! This PR also modifies the covariance function classes to allow an additional method power_spectral_density which is needed for HSGP (or random fourier features). Any covariance function that defines a power_spectral_density method will work. Like with the other GP implementations, one can also add covariances, eta1**2 * pm.gp.cov.ExpQuad(2, ls=ls1) + eta2**2 * pm.gp.cov.Matern52(2, ls=ls2). The implementation also works for any number of input dimensions.

Not part of this PR, but plan on adding in later:

support for Periodic covariance
warnings for bad choices of m and L or c.
refactoring the internals to allow advanced users to bypass the GP api and work directly with the basis and coefficients for multi-GP models
include more utility functions for checking the accuracy of the approximation

Checklist

Explain important implementation details 👆
Make sure that the pre-commit linting/style checks pass.
Link relevant issues (preferably in nice commit messages)
Are the changes covered by tests and docstrings?
Fill out the short summary sections 👇

Major / Breaking Changes

None

New features

HSGPs

Bugfixes

None

Documentation

...

Maintenance

...

codecov · 2023-01-18T19:16:46Z

Codecov Report

Merging #6458 (ba7859f) into main (9836d00) will increase coverage by 0.01%.
The diff coverage is 95.04%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6458      +/-   ##
==========================================
+ Coverage   92.02%   92.04%   +0.01%     
==========================================
  Files          92       93       +1     
  Lines       15563    15719     +156     
==========================================
+ Hits        14322    14468     +146     
- Misses       1241     1251      +10

Impacted Files	Coverage Δ
pymc/gp/hsgp_approx.py	`92.30% <92.30%> (ø)`
pymc/gp/cov.py	`97.84% <97.93%> (-0.25%)`	⬇️
pymc/gp/__init__.py	`100.00% <100.00%> (ø)`

ferrine

I've managed to check this PR in action and it was smooth

michaelosthege

Some more type hints would be good. I'm not sure if gp/cov.py is passing mypy already, but if not I'd would be good to check the output of `python scripts/run_mypy.py --verbose`` and fix any errors that might appear in the changed lines

pymc/gp/cov.py

pymc/gp/hsgp.py

Co-authored-by: Michael Osthege <[email protected]>

…er comments

bwengals · 2023-01-24T22:17:32Z

Have a type hint question, what's the best type for something like X? It can be a union of np.ndarray, tensorconstant, a tensor of PyMC variables. Really anything tensorlike should work. This seems a bit verbose though. What do you think @michaelosthege or @fonnesbeck?

michaelosthege · 2023-01-25T01:02:23Z

Have a type hint question, what's the best type for something like X? It can be a union of np.ndarray, tensorconstant, a tensor of PyMC variables. Really anything tensorlike should work. This seems a bit verbose though. What do you think @michaelosthege or @fonnesbeck?

It's true that we don't type hint "tensor-like" in most places. We should probably add a convenient type alias somewhere..
TensorConstant inherits TensorVariable, so Union[np.ndarray, pt.TensorVariable] should work here.

Even if you decide to not type hint the tensor-like, type hints for other kwargs in the signature should be added, because IIRC mypy automatically skips functions that don't have any type hints in their signature

bwengals · 2023-01-26T17:42:28Z

Ah thanks @michaelosthege that makes sense, and explains why I couldn't find that type somewhere else in the codebase. Going with your suggestion Union[np.ndarray, pt.TensorVariable].

…nal_components

pymc/gp/cov.py

pymc/gp/hsgp.py

pymc/tests/gp/test_cov.py

michaelosthege · 2023-01-31T17:14:25Z

@bwengals where do you see this PR w.r.t. to the finish 🏁 line?
The three remaining threads should be a low bar to get resolved (or ignore), but from your last commit it looks like you might have more changes in mind?

bwengals · 2023-02-01T23:16:55Z

Pretty close! I'd thought things were wrapping up, but taking a few iterations to settle on the exact API. Other than your suggestions (thank you btw), I think I'd like to improve the tests for HSGP a bit and that should be pretty much it I think. Have some thoughts for next steps but will try to save them for a future PR.

michaelosthege · 2023-02-03T17:39:47Z

Pretty close! I'd thought things were wrapping up, but taking a few iterations to settle on the exact API. Other than your suggestions (thank you btw), I think I'd like to improve the tests for HSGP a bit and that should be pretty much it I think. Have some thoughts for next steps but will try to save them for a future PR.

Re API I'd be interested in drawing posterior predictive samples at high resolution Xnew, or even better drawing callable like here: #6475

Maybe you already have one, but a test for this use case would be great

michaelosthege

A bit stuck now on mypy errors. I'm not sure how to fix the remaining ones and would definitely appreciate any clues?

I commented a few explainers of the remaining mypy problems. Let me know if you have questions.

pymc/gp/hsgp.py

Co-authored-by: Michael Osthege <[email protected]>

bwengals · 2023-03-06T09:08:58Z

thanks a ton for your help on that @michaelosthege. I refactored a bit and tried to take your suggestions. I think handling either one of c or L being given was making things difficult, so hopefully it's a bit clearer now.

michaelosthege

The HGSP module must be included in docs/source/api/gp.rst otherwise it won't be rendered in the docs.

I commented a few (nitpicky) things about docstring formatting.. Let me know if you want me to help out taking care of these (this weekend).

michaelosthege · 2023-03-11T12:28:05Z

pymc/gp/cov.py

+    elif np.asarray(value).squeeze().shape == ():
+        return np.squeeze(value)
+    elif isinstance(value, numbers.Real):
+        return value


Is the third branch not test-covered because np.asarray(value).squeeze().shape == () applies to numbers already?

yep, looks like it. I'll remove it

pymc/gp/cov.py

pymc/gp/hsgp.py

michaelosthege · 2023-03-11T12:51:15Z

pymc/gp/hsgp.py

+        elif self._parameterization == "centered":
+            return self.mean_func(Xnew) + phi[:, i:] @ beta
+
+    def conditional(self, name: str, Xnew: TensorVariable, *args, **kwargs):


Suggested change

def conditional(self, name: str, Xnew: TensorVariable, *args, **kwargs):

def conditional(self, name: str, Xnew: TensorVariable, **kwargs):

The args are not used within the function!

michaelosthege · 2023-03-11T12:52:19Z

pymc/gp/hsgp.py

+            Optional arguments such as `dims`.
+        """
+        fnew = self._build_conditional(Xnew)
+        return pm.Deterministic(name, fnew, dims=kwargs.get("dims"))


Suggested change

return pm.Deterministic(name, fnew, dims=kwargs.get("dims"))

return pm.Deterministic(name, fnew, **kwargs)

This way other kwargs will be forwarded too. If you don't want that, I suggeste having only dims in the signature and not a **kwargs

michaelosthege · 2023-03-11T12:54:39Z

pymc/gp/hsgp.py

+    def prior(self, name: str, X: TensorVariable, *args, **kwargs):
+        R"""
+        Returns the (approximate) GP prior distribution evaluated over the input locations `X`.
+
+        Parameters
+        ----------
+        name: string
+            Name of the random variable
+        X: array-like
+            Function input values.
+        dims: None
+            Dimension name for the GP random variable.
+        """


See similar comments below:

args/kwargs getting dropped silently

Parameters section formatting

Docstring doesn't match signature

Right, I added the args/kwargs issues you're pointing to make mypy pass. The code originally just passed dims like the docstring says. It's also the only input used. Without doing this mypy complains that the HSGP function signatures don't match the signatures of the base class. Is there a third option? I could have HSGP not be a subclass the base gp class, which seems weird to me because it is a subclass.

Is it actually a bad thing if args/kwargs get dropped silently? At least with kwargs isnt that somewhat expected? The docstring says it just takes dims, so what else should a user expect?

I'd expect it to take any of the kwargs of a typical PyMC distribution, for example dims, initval...

Can you make the signature of the base class more specific? (Specifying just dims, not *args, **kwargs..)

Otherwise you can go with a # type: ignore comment, but be aware that this violates the Liskov subtitution principle. Best check https://mypy.readthedocs.io/en/stable/common_issues.html#incompatible-overrides

Great, added the ignores. Maybe in a later PR I can refactor the GP module to not violate the liskov substitution principle.

args/kwargs arent getting dropped silently now, and the docstrings match the signature. I also added a Returns text to the docstring for prior_linearized.

I held off on adding Returns to the docstrings for the rest of the methods because the rest of the GP module methods don't have a returns sections either. I think it's good to have of course, but out of scope to add it for everything here. I also would like to defer refactoring how Base is used so it doesn't violate the Liskov sub. principle you pointed out because that also touches the entire gp submodule. It would be good to fix, but it's been that way for several years now without issue because users don't use Base. But I agree, structurally it could be better.

Co-authored-by: Michael Osthege <[email protected]>

ricardoV94

LGTM, just a minor remark

ricardoV94 · 2023-03-12T07:47:13Z

pymc/gp/cov.py

-    r"""
-    Base class for all kernels/covariance functions.
+def _verify_scalar(value):
+    if (


Why not call at/np.squeeze and capture the errors that both emit when the inputs are not allowed to be squeezed?

ricardoV94 · 2023-03-12T07:50:55Z

pymc/gp/cov.py

+        isinstance(value, pytensor.compile.SharedVariable)
+        and value.get_value().squeeze().shape == ()
+    ):
+        return at.squeeze(value)


This is slightly incorrect. If I pass pt.shared(np.ones(1)) it will fail and not be captured by your error.

If I pass pt.shared((1), shape=(1,)) then it will work.

I suggest just calling squeeze directly. Also inputs could be constants (e.g., from pm.ConstantData) and would not meet either branch, no?

This functions is only used by exponentiated kernels now (which aren't usable by HSGP). I tried to use _verify_scalar for other HSGP stuff earlier in this PR (that I cant recall) but it got refactored out.

Thanks for catching this potential regression. In practice though, I think its extremely unlikely anyone would do anything more sophisticated than

cov_func = eta**2 * pm.gp.cov.ExpQuad(1, ls)**2 # or cubed or something

_verify_scalar is only handling the power, **2. I'm also OK with not checking/supporting type checking here. Basically usage wise there's really no chance that exponent is anything other than "2" or maybe "3". I've not see this really in the wild in models, more that its just kinda neat that you can exponentiate kernels (because products of kernels are kernels) so why not support it.

How about I roll things back to how they are in master for this, which is,

def __pow__(self, other): if ( isinstance(other, pytensor.compile.SharedVariable) and other.get_value().squeeze().shape == () ): other = at.squeeze(other) return Exponentiated(self, other) elif isinstance(other, Number): return Exponentiated(self, other) elif np.asarray(other).squeeze().shape == (): other = np.squeeze(other) return Exponentiated(self, other) raise ValueError("A covariance function can only be exponentiated by a scalar value")

The errors aren't captured, but users will see what happens when squeeze is attempted.

No strong preference, but why is this not enough?

def __pow__(self, other): other = as_tensor_variable(other).squeeze() if not other.ndim == 0: raise ValueError(...) return Exponentiated(self, other)

alright that's why they pay you the big bucks, changed it to this

…al kernel creation

michaelosthege

@bwengals I think you can squash-merge 🥳

Since pymc-devs#6458, Covariance is now the base class for kernels/covariance functions with input_dim and active_dims, which does not include WhiteNoise and Constant kernels.

* fix WhiteNoise subclassing from Covariance (#6673) Since #6458, Covariance is now the base class for kernels/covariance functions with input_dim and active_dims, which does not include WhiteNoise and Constant kernels. * add regression test for #6673 * fix WhiteNoise input to marginal GP

bwengals added 2 commits January 17, 2023 23:11

add hsgp implementation

e351925

docstring improvements

d244d61

bwengals requested review from fonnesbeck, ferrine and danhphan January 18, 2023 19:04

bwengals mentioned this pull request Jan 18, 2023

Add HSGPs, add support for power spectral densities to covariance function objects #6036

Closed

5 tasks

fix precommit

6bd08d6

ferrine approved these changes Jan 20, 2023

View reviewed changes

twiecki requested a review from ricardoV94 January 21, 2023 03:36

michaelosthege reviewed Jan 21, 2023

View reviewed changes

pymc/gp/cov.py Outdated Show resolved Hide resolved

pymc/gp/cov.py Outdated Show resolved Hide resolved

pymc/gp/cov.py Outdated Show resolved Hide resolved

fonnesbeck reviewed Jan 21, 2023

View reviewed changes

bwengals and others added 3 commits January 23, 2023 17:54

add some type hints

dadcba1

Co-authored-by: Michael Osthege <[email protected]>

rename psd to power_spectral_density, rename D to n_dims, address oth…

dc838a6

…er comments

Merge branch 'hsgp' of github.com:bwengals/pymc into hsgp

f652aab

bwengals added 3 commits January 25, 2023 14:29

add type hints, refactor to allow user to bypass GP api

bda5831

fix tests up

a8d0fdc

fix cov psd tests

8b88a59

bwengals added 4 commits January 26, 2023 10:43

add docstring, no need to use m_star, can use size to set beta size

cebb6cf

fix mypy errors, missing drop_one clause in conditional, add conditio…

8a84b19

…nal_components

add parameterization arg

cfbfb17

make mypy happy

0ad2c06

michaelosthege reviewed Jan 29, 2023

View reviewed changes

pymc/gp/cov.py Outdated Show resolved Hide resolved

pymc/gp/hsgp.py Outdated Show resolved Hide resolved

pymc/tests/gp/test_cov.py Outdated Show resolved Hide resolved

dont need conditional_components, use pm.set_data

1474708

michaelosthege reviewed Mar 5, 2023

View reviewed changes

pymc/gp/hsgp.py Outdated Show resolved Hide resolved

pymc/gp/hsgp.py Outdated Show resolved Hide resolved

pymc/gp/hsgp.py Outdated Show resolved Hide resolved

pymc/gp/hsgp.py Outdated Show resolved Hide resolved

pymc/gp/hsgp.py Outdated Show resolved Hide resolved

bwengals and others added 5 commits March 5, 2023 20:42

Update pymc/gp/hsgp.py

133b783

Co-authored-by: Michael Osthege <[email protected]>

Update pymc/gp/hsgp.py

cb6c01f

Co-authored-by: Michael Osthege <[email protected]>

fix mypy errors

d0e8feb

as_tensor_variable

1abedbb

remove print

6a1de80

add test_hsgp.py to test runner

ed612c3

michaelosthege reviewed Mar 11, 2023

View reviewed changes

bwengals and others added 5 commits March 11, 2023 12:28

Apply suggestions from code review

ac288b9

Co-authored-by: Michael Osthege <[email protected]>

allow c to be int or float

33af5b3

Merge branch 'hsgp' of github.com:bwengals/pymc into hsgp

7db2868

updates from review

2abbed7

added mypy ignores

f9bacac

ricardoV94 reviewed Mar 12, 2023

View reviewed changes

bwengals added 6 commits March 13, 2023 17:37

rename TensorVariable type to TensorLike

da2f1f1

remove _verify_scalar_, take ricardos suggestion for fixing Exponenti…

5a51214

…al kernel creation

add returns docstring for prior_linearized

04d8c0d

Merge branch 'master' into hsgp

4c78672

fix precommit

626598c

at -> pt

ba7859f

ricardoV94 approved these changes Mar 14, 2023

View reviewed changes

michaelosthege approved these changes Mar 14, 2023

View reviewed changes

tomicapretto mentioned this pull request Mar 14, 2023

Add GPs bambinos/bambi#632

Merged

bwengals merged commit bae121a into pymc-devs:main Mar 14, 2023

ricardoV94 added the enhancements label Mar 30, 2023

dehorsley mentioned this pull request Apr 14, 2023

Fix WhiteNoise Covariance bug #6674

Merged

	def conditional(self, name: str, Xnew: TensorVariable, args, *kwargs):
	def conditional(self, name: str, Xnew: TensorVariable, **kwargs):

	return pm.Deterministic(name, fnew, dims=kwargs.get("dims"))
	return pm.Deterministic(name, fnew, **kwargs)

Uh oh!

Add HSGP Latent GP approximation #6458

Add HSGP Latent GP approximation #6458

Uh oh!

Conversation

bwengals commented Jan 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Major / Breaking Changes

New features

Bugfixes

Documentation

Maintenance

Uh oh!

codecov bot commented Jan 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ferrine left a comment

Choose a reason for hiding this comment

Uh oh!

michaelosthege left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bwengals commented Jan 24, 2023

Uh oh!

michaelosthege commented Jan 25, 2023

Uh oh!

bwengals commented Jan 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

michaelosthege commented Jan 31, 2023

Uh oh!

bwengals commented Feb 1, 2023

Uh oh!

michaelosthege commented Feb 3, 2023

Uh oh!

michaelosthege left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bwengals commented Mar 6, 2023

Uh oh!

michaelosthege left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bwengals Mar 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

bwengals commented Jan 18, 2023 •

edited

Loading

codecov bot commented Jan 18, 2023 •

edited

Loading

bwengals commented Jan 26, 2023 •

edited

Loading

bwengals Mar 11, 2023 •

edited

Loading