Skip to content

Prior predictive sampling with pm.Bound gives wrong results #4643

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ricardoV94 opened this issue Apr 15, 2021 · 6 comments
Closed

Prior predictive sampling with pm.Bound gives wrong results #4643

ricardoV94 opened this issue Apr 15, 2021 · 6 comments
Labels

Comments

@ricardoV94
Copy link
Member

ricardoV94 commented Apr 15, 2021

Using pm.Bound for distributions whose arguments include other model parameters (instead of constants) will lead to a model that is wrongly sampled in sample_prior_predictive()

with pm.Model() as m:
    pop = pm.Normal('pop', 2, 1)
    ind = pm.Bound(pm.Normal, lower=-2, upper=2)('ind', mu=pop, sigma=.5)
    
    prior = pm.sample_prior_predictive()
    trace = pm.sample()

The samples for the bounded distribution are fine (i.e., one obtains the same results if doing prior_predictive or sampling without data):

sns.kdeplot(prior['ind'])
sns.kdeplot(trace['ind'])

image

But the samples for the hyper-parameters do not match:

sns.kdeplot(prior['pop'])
sns.kdeplot(trace['pop'])

image

What is going on? I think the way pm.Bound is implemented, it corresponds to adding an arbitrary factor to the model logp of the kind:

pm.Potential('err', pm.math.switch((ind >= -2) & (ind <= 2), 0, -np.inf))

Which obviously is not (and cannot) be accounted for in predictive sampling.

@OriolAbril
Copy link
Member

This could be helpful in implementing a fix: https://mc-stan.org/docs/2_18/stan-users-guide/truncated-random-number-generation.html

@ricardoV94
Copy link
Member Author

ricardoV94 commented Apr 15, 2021

The thing is that Bound does not really correspond to a Truncated distribution. If it did, it would not affect the prior of the hyperparameters during (correctly implemented) prior predictive sampling.

@ricardoV94
Copy link
Member Author

ricardoV94 commented Apr 15, 2021

The TruncatedNormal which is a properly truncated distribution does not affect the prior predictive of its hyperparameters

with pm.Model() as m:
    pop = pm.Normal('pop', 2, 1)
    ind = pm.TruncatedNormal('ind', lower=-2, upper=2, mu=pop, sigma=.5)
    prior = pm.sample_prior_predictive()
    trace = pm.sample()

The prior for the Truncated is different, note the smaller left tail

sns.kdeplot(prior['ind'])
sns.kdeplot(trace['ind'])

image

The prior for the hyperparameter is not affected by the Truncation downstream

sns.kdeplot(prior['pop'])
sns.kdeplot(trace['pop'])

image

@ricardoV94
Copy link
Member Author

@OriolAbril your point made me recheck the Bound random method and I am very unsure as to whether it's logic is sound (although I couldn't confirm it's not either):

https://github.com/pymc-devs/pymc3/blob/2dee9dafb3541043ae23b66fe9262e70fbd2769a/pymc3/distributions/bound.py#L90-L95

In particular it ignores the order of the point values. It seems that if a point happens to include a parameter value that is easy enough to sample from within the bound constraints and 99 values that do not, it will happy resample 99 times from the point of high bounded likelihood. I am not completely sure this is violates the statistical behavior of Bound though.

@michaelosthege
Copy link
Member

@ricardoV94 was this v3 or v4? Please set labels accordingly.

@ricardoV94
Copy link
Member Author

It's an issue in V3. Bound hasn't been refactored to V4 but will be the same. Anyway no point in changing things for V3 so I'll put a label accordingly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants