Implement continuous distribution CDF methods #2688

domenzain · 2017-11-03T17:41:52Z

This completes the work started in #2048 and continued in #2073 and includes #2678, but rebased on top of a recent master and in more compact commits.

These can then be used to more adequately implement censored distributions as described in #1867 and #1864 .

domenzain · 2017-11-08T18:25:17Z

It seems the relevant tests pass for that last commit but the build errors out due to exceeded test time...

The missing continuous distributions are 1/3 of all available continuous distributions:

Gumbel: straightforward exponential.
Logistic: straightforward with log1p, exponential.
ExGaussian: straightforward using Normal CDF
Gamma: requires decent regularized upper/lower gamma function
InverseGamma: requires decent regularized upper/lower gamma function.
HalfStudentT: straightforward using StudentT CDF
VonMises: not analytic, seems complicated to implement and requires Bessel functions.
SkewNormal: OwenT function required. Maybe use series?

It would be ideal to wait until everything is implemented, but it adds the work of keeping the commits relevant to master.

Seeing that a majority of the CDF methods are implemented, could we review for inclusion onto master and add ErrorNotImplemented exceptions to the missing ones? This would invite users that need them to implement them or push for their implementation.

Do you have comments or suggestions, @fonnesbeck, @twiecki, @aseyboldt ?
Thank you,

fonnesbeck · 2017-11-08T21:09:53Z

My instinct would be to wait until they are all done, so that users aren't confused when they don't exist for some distributions. OTOH, the early audience will probably be small, so perhaps not a big deal. The structure of the distributions code is pretty simple, so merging a large number of them at once shouldn't be a big deal.

domenzain · 2017-11-17T16:35:47Z

For the tricky ones (SkewNormal, Gamma family and VonMises) user feedback would be a good guide.
Given that only five distributions are missing as of now it will be relatively small confusion anyhow. And having them around as NotImplementedErrors will make the need for their implementation explicit in case someone else wishes to contribute. Issues in GitHub are not as visible.

Presumably more users will have a need when we get around to implementing censored distributions, but it will be difficult to judge if we do not get started there.

fonnesbeck · 2017-11-18T13:02:42Z

pymc3/distributions/continuous.py

+    based on Cephes library by Steve Moshier (incbet.c).
+    small: Choose element-wise which continued fraction expansion to use.
+    '''
+    big = tt.constant(4.503599627370496e15, dtype='float64')


Perhaps use CAPS for constants?

fonnesbeck · 2017-11-18T13:05:38Z

I'm fine with merging this once tests pass. My only suggestion is to use caps for constant names, which makes things easier to read.

domenzain · 2017-11-21T17:03:19Z

One of the builds consistently times out.
Should I reduce the number of points to test?

In the build that times out, the Logistic distribution test fails but I can't see the parameters that cause the failure.
Is there some documentation on how to set up the test environment locally?

I've compared the Logistic log CDF with SciPy's and with an exact form from Mathematica. I believe my implementation is more numerically stable.

junpenglao · 2017-11-21T17:14:40Z

The timeout is a bit difficult to deal with, one of the quick fix is modify .travis.yml and move the test below to another quicker build:
https://github.com/pymc-devs/pymc3/blob/master/.travis.yml#L26

junpenglao · 2017-11-28T21:52:14Z

Had a look today (I am in need of doing a censoring model), overall it looks amazing!
However, I am a bit confuse of the how to use it. For example, running below gives me a pretty complicated theano.grad error

with pm.Model():
    nu = pm.HalfNormal('nu', 5)
    mu = pm.Normal('mu', 0, 1)
    sd = pm.HalfCauchy('sd', 2.5)
    left = pm.StudentT.dist(nu=nu, mu=mu, sd=sd).logcdf(6.)
    lcdf = pm.Potential('lcdf', left)
    pm.sample()

gBokiau · 2018-02-24T04:47:07Z

I've used truncated Gamma family distributions in many "waiting times" scenarios (broadly speaking), and they're not uncommon in that literature.
I think it would be a shame not to have a solution. Would this offer a way forward?
https://github.com/Theano/Theano_lgpl/blob/master/theano_lgpl/gamma.c

fonnesbeck · 2018-04-23T22:18:33Z

Is there a status update on this? Would be great to have these in for 3.5.

nickresnick · 2018-05-25T15:05:56Z

Hey all, I'm building a beta model on censored data, so need the beta survival function for incomplete observations. I've copied the incomplete beta functions in this PR to my local env, but I get this error when trying to fit the model:

AsTensorError                             Traceback (most recent call last)
<ipython-input-37-3fbed77b4ce8> in <module>()
      3     rate_rate = pm.HalfFlat('rate_rate')
      4     shape = pm.HalfFlat('shape')
----> 5     pm.DensityDist('obs', gamma_gamma_logp, observed=dict(t=data_gg, complete=complete, shape=shape, rate_shape=rate_shape, rate_rate=rate_rate))
      6     map = pm.find_MAP()

/usr/local/lib/python2.7/site-packages/pymc3/distributions/distribution.pyc in __new__(cls, name, *args, **kwargs)
     35             total_size = kwargs.pop('total_size', None)
     36             dist = cls.dist(*args, **kwargs)
---> 37             return model.Var(name, dist, data, total_size)
     38         else:
     39             raise TypeError("Name needs to be a string but got: {}".format(name))

/usr/local/lib/python2.7/site-packages/pymc3/model.pyc in Var(self, name, dist, data, total_size)
    769             with self:
    770                 var = MultiObservedRV(name=name, data=data, distribution=dist,
--> 771                                       total_size=total_size, model=self)
    772             self.observed_RVs.append(var)
    773             if var.missing_values:

/usr/local/lib/python2.7/site-packages/pymc3/model.pyc in __init__(self, name, data, distribution, total_size, model)
   1290         self.missing_values = [datum.missing_values for datum in self.data.values()
   1291                                if datum.missing_values is not None]
-> 1292         self.logp_elemwiset = distribution.logp(**self.data)
   1293         # The logp might need scaling in minibatches.
   1294         # This is done in `Factor`.

<ipython-input-36-503313329218> in gamma_gamma_logp(t, complete, shape, rate_shape, rate_rate)
      2 def gamma_gamma_logp(t, complete, shape, rate_shape, rate_rate):
      3     x = np.array(t / (t + rate_rate))
----> 4     return complete * (rate_shape * tt.log(rate_rate) + (shape - 1) * tt.log(t) - log_beta(shape, rate_shape) - (shape + rate_shape) * tt.log(rate_rate + t))  + (1 - complete) * incomplete_beta(shape, rate_shape, x)

<ipython-input-18-e540113f2e63> in incomplete_beta(a, b, value)
    151     w = one - value
    152 
--> 153     ps = incomplete_beta_ps(a, b, value)
    154 
    155     flip = tt.gt(value, (a / (a + b)))

<ipython-input-18-e540113f2e63> in incomplete_beta_ps(a, b, value)
    128             e for e in
    129             tt.cast((t, s),
--> 130                     'float64')
    131         ]
    132     )

/usr/local/lib/python2.7/site-packages/theano/tensor/basic.pyc in cast(x, dtype)
   1257         dtype = config.floatX
   1258 
-> 1259     _x = as_tensor_variable(x)
   1260     if _x.type.dtype == dtype:
   1261         return _x

/usr/local/lib/python2.7/site-packages/theano/tensor/basic.pyc in as_tensor_variable(x, name, ndim)
    198         except Exception:
    199             str_x = repr(x)
--> 200         raise AsTensorError("Cannot convert %s to TensorType" % str_x, type(x))
    201 
    202 # this has a different name, because _as_tensor_variable is the

AsTensorError: ('Cannot convert (Elemwise{mul,no_inplace}.0, TensorConstant{0.0}) to TensorType', <type 'tuple'>)

Thanks.

Update: I can confirm that the incomplete_beta here returns the desired result when scalars are passed. Looks like the issue is casting a tuple (t, s) to 'float64'. I also tried outputs_info=[e for e in (tt.cast(t, 'float64'), tt.cast(s, 'float64'))] but that failed with an assertion error in scan that I'm still investigating.

twiecki · 2018-06-26T07:53:38Z

I don't think we should wait until all CDFs are implemented and just merge what we have.

Seems like this branch needs a rebase, though. @domenzain Are you still interested in working on this? I think we could merge soon if there are no other blockers.

domenzain · 2018-06-26T08:06:14Z

Hi @twiecki,

I will see what it takes to rebase onto master today and come back to you on that.

arose13 · 2018-07-25T14:32:13Z

Is this still in development? Am I allowed to help here?

twiecki · 2018-07-25T14:34:26Z

Absolutely, help would be appreciated!

…

On Wed, Jul 25, 2018 at 4:32 PM Stephen Anthony Rose < ***@***.***> wrote: Is this still in development? Am I allowed to help here? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2688 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AApJmGGjYCsxYh_Sx15iY0-yY726ci-gks5uKIFvgaJpZM4QRcl6> .

fonnesbeck · 2018-09-23T13:48:17Z

Can we merge these? Need to rebase (or merge with current master) to resolve conflicts cc @domenzain

domenzain · 2018-09-24T14:18:02Z

Hi @fonnesbeck,

I have rebased the commits onto the current master, I hope this helps.

twiecki · 2018-09-25T13:30:11Z

Seems like there are some test errors, e.g.:

    def check_logcdf(self, pymc3_dist, domain, paramdomains, scipy_logcdf, decimal=None):
        domains = paramdomains.copy()
        domains['value'] = domain
        if decimal is None:
            decimal = select_by_precision(float64=6, float32=3)
        for pt in product(domains, n_samples=100):
            params = dict(pt)
            scipy_cdf = scipy_logcdf(**params)
            value = params.pop('value')
            dist = pymc3_dist.dist(**params)
>           assert_almost_equal(dist.logcdf(value).tag.test_value, scipy_cdf,
                                decimal=decimal, err_msg=str(pt))
E           AttributeError: 'Beta' object has no attribute 'logcdf'
pymc3/tests/test_distributions.py:460: AttributeError

domenzain · 2018-09-25T14:39:39Z

I botched the rebase for the Beta distribution, it seems.
Here's a second try.

domenzain · 2018-09-27T07:56:02Z

Looks like these are working as expected.

For the future, which would be the most useful: the missing continuous log CDFs or the discrete log CDFs?

junpenglao · 2018-09-27T09:28:13Z

Thanks for the great work @domenzain!
Could you add a line in the release note?
Also what do you think about the comment from @gBokiau

domenzain · 2018-09-27T10:08:52Z

Hi @junpenglao, it looks like the implementation of the upper/lower gamma function is solid and has clear references to go to in case of trouble.
If it is just a question of importing it, then we could have a working Gamma and InverseGamma log CDF right away.

I'll have a look.

domenzain · 2018-09-27T10:35:44Z

We need a bit of boilerplate wrapping to make the scalar C functions available as Theano Elementwise functions, but it looks easier than implementing them directly as was done for the incomplete_beta function here and the speed will likely be much better.

As the theano_lgpl package is not distributed on pip, and looks like it hasn't been touched in years, I'm not sure which is the right way to go about it for those modifications. Thoughts?

junpenglao · 2018-09-27T12:30:46Z

If there is no licence issue, to me it makes sense to put it in our code base.

twiecki · 2018-09-27T12:35:04Z

If it's licensed LGPL we can't :(. I suppose that's the reason for the fork.

domenzain · 2018-09-27T13:03:30Z

The original author has relicensed the code under MIT:
See apriori.zip math/doc in http://www.borgelt.net/apriori.html

And clearly the Theano authors would've preferred the license to be other than LGPL.

twiecki · 2018-09-27T13:19:07Z

Anything MIT/BSD/Apache v2 we can use. What specifically do you want to include? The c code?

twiecki · 2018-09-27T13:47:43Z

Without getting bogged down in these details, can we merge this @domenzain from your end?

domenzain · 2018-09-27T13:55:37Z

I think this is ready to merge as is. Additional log CDFs can come with a different PRs as they become available. Comments like the above about having a use for the Gamma log CDF give useful pointers for future development.

…

On Thu, Sep 27, 2018, 15:47 Thomas Wiecki ***@***.***> wrote: Without getting bogged down in these details, can we merge this @domenzain <https://github.com/domenzain> from your end? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2688 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAraG5KFKy8lQ9UxN6OLGHUHfRFhee6Gks5ufNcMgaJpZM4QRcl6> .

twiecki · 2018-09-27T13:56:57Z

Congrats @domenzain, this is a solid piece of work!

erikbern · 2018-09-28T19:43:48Z

Nice – looking forward to try this for something I'm working on with censored data!

As a side note I wasted hours trying to get the lower regularized gamma function working in Tensorflow: tensorflow/tensorflow#17995 (not sure what the situation is in Theano though). This is needed for the Gamma distribution (as mentioned further up in the thread).

domenzain · 2018-10-10T14:30:10Z

@erikbern, in https://github.com/domenzain/Theano_chi2sf I have taken the relicensed code from @gBokiau's link and wrapped most of the remaining Gamma functions.
I would like to have these available in theano.tensor to write the Gamma family log CDF functions, but you can try them out directly in the meantime.

domenzain · 2018-10-15T13:41:48Z

Theano/Theano#6648

aakhmetz · 2019-01-24T11:12:40Z

@domenzain I have a small obstacle, if I implement incomplete gamma from Theano_chi2sf, then I can't get the NUTS sampling working, because the calculation of the gradients is missing (MethodNotDefined: 'grad', <class 'theano.scalar.basic scipy.Chi2SF'>, 'Chi2SF'). Would it exist some workaround about that? (I am still struggling with implementing CDF of gamma distribution in my code such as here) to run NUTS sampler. Thanks in advance!

domenzain · 2019-01-24T13:00:22Z

Hi @aakhmetz,

I forgot to add the log CDF of the Gamma distribution after the merge of the pull request in my previous comment... I have just created #3356 to fix that.

In your linked issue, you are using the scipy implementation. You should use the Theano tensor implementation:

import theano.tensor as tt
def logCDF(alpha, beta, x)
    return tt.log(tt.gammainc(alpha, beta*x))

aakhmetz · 2019-01-24T14:20:58Z

Hi @domenzain,

Thank you! It looks like working. But I also found that my Theano did not contain gammainc function (I assume it was a conda version, precisely: Theano-1.0.3+2.g3e47d39ac.dirty):

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-27-08263549df00> in <module>
----> 1 pm.Gamma.dist(10,2).logcdf(.5).eval()

~/anaconda3/lib/python3.6/site-packages/pymc3-3.6-py3.6.egg/pymc3/distributions/continuous.py in logcdf(self, value)
   2377         beta = self.beta
   2378         return bound(
-> 2379             tt.log(tt.gammainc(alpha, beta * value)),
   2380             value >= 0,
   2381             alpha > 0,

AttributeError: module 'theano.tensor' has no attribute 'gammainc'

I installed the version from your github, but then I arrive to a similar problem as I had before:

MethodNotDefined: ('grad', <class 'theano.scalar.basic_scipy.GammaInc'>, 'GammaInc')

It looks like I came back to the same point :/

The code I am using looks like this:

with pm.Model() as model_Gamma:
    dmean = pm.Uniform('dmean',0,35)
    dsd = pm.Uniform('dsd',0,35)
    
#     delay_p = tt.exp(pm.Gamma.dist(mu=dmean,sd=dsd).logcdf(shared(df.δt0.get_values())))/\
#                 tt.exp(pm.Gamma.dist(mu=dmean,sd=dsd).logcdf(shared(df.δt.get_values())))
    
    delay_p = tt.gammainc((dmean/dsd)**2, dmean/(dsd**2)*shared(df.δt0.get_values()))/\
                tt.gammainc((dmean/dsd)**2, dmean/(dsd**2)*shared(df.δt.get_values()))
    
    counts = pm.Poisson('counts',\
                    mu=df.counts.get_values()*delay_p+0.001,\
                    observed=df.confirmed0.get_values())
    
    trace_Gamma = pm.sample(draws = number_of_iterations, tune=length_of_tunein,
                            cores = number_of_jobs)

and both repos of theano and pymc3 are the latest from github (pymc3 includes your last fix of course)

Ok, wakarimashita more or less. @domenzain, thanks!

davipatti · 2020-09-03T15:28:53Z

I have also been running into:

MethodNotDefined: ('grad', <class 'theano.scalar.basic_scipy.GammaInc'>, 'GammaInc')

whilst trying to use tt.gammainc in a pymc3 model context.

Presumably this should be implemented in theano, rather than pymc3, so I could submit a feature request on theano?

domenzain · 2020-09-03T17:16:12Z

@davipatti, @aakhmetz,

I think the grad method would have to be implemented in Theano for the GammaInc function here for that to work out.
More or less of a challenge depending on the variable with respect to which you need it...
See here for another BinaryScalarOp where one of the two variables is easy and the other hard.

davipatti · 2020-09-03T18:05:57Z

Sadly I don't think my maths is up to scratch :(

This would massively help my research if anyone is willing and able to help out!

aakhmetz · 2020-09-04T00:51:25Z

@davidpatti Probably it would be a strange place to say it, but me, I've switched to Stan partially because of that (some problems were with truncated likelihoods)

davipatti · 2020-09-04T14:51:27Z

Thanks for the suggestion - I've switched too now

domenzain force-pushed the continuous_cdf_methods branch 2 times, most recently from 3594af7 to c6b5ae5 Compare November 6, 2017 13:54

domenzain force-pushed the continuous_cdf_methods branch from f5ef708 to 96cb41c Compare November 17, 2017 15:48

fonnesbeck reviewed Nov 18, 2017

View reviewed changes

domenzain force-pushed the continuous_cdf_methods branch 2 times, most recently from ffccaac to 2470a5e Compare November 21, 2017 11:27

domenzain mentioned this pull request Mar 23, 2018

Implement incomplete beta function #2678

Closed

junpenglao mentioned this pull request Jun 25, 2018

First implementation of TruncatedNormal distribution #3052

Merged

fonnesbeck mentioned this pull request Sep 23, 2018

Add CDF methods to continuous distributions #2073

Closed

domenzain force-pushed the continuous_cdf_methods branch from 2470a5e to bad522b Compare September 24, 2018 14:15

domenzain force-pushed the continuous_cdf_methods branch from bad522b to a317160 Compare September 25, 2018 14:37

domenzain force-pushed the continuous_cdf_methods branch from a317160 to acdcd0a Compare September 26, 2018 12:28

junpenglao changed the title ~~WIP: Implement continuous distribution CDF methods~~ Implement continuous distribution CDF methods Sep 27, 2018

junpenglao added the request discussion label Sep 27, 2018

Mention changes in the release note

53e02b7

twiecki merged commit af637d2 into pymc-devs:master Sep 27, 2018

domenzain mentioned this pull request Oct 8, 2018

Design for bound, truncated and censored distributions #1864

Closed

domenzain deleted the continuous_cdf_methods branch October 16, 2018 12:54

domenzain mentioned this pull request Sep 3, 2020

Fixing typo in GammaIncC Op Theano/Theano#6751

Merged

twiecki mentioned this pull request Sep 4, 2020

Add grad of GammaInc aesara-devs/aesara#33

Closed

Uh oh!

Implement continuous distribution CDF methods #2688

Implement continuous distribution CDF methods #2688

Uh oh!

Conversation

domenzain commented Nov 3, 2017

Uh oh!

domenzain commented Nov 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fonnesbeck commented Nov 8, 2017

Uh oh!

domenzain commented Nov 17, 2017

Uh oh!

fonnesbeck Nov 18, 2017

Choose a reason for hiding this comment

Uh oh!

domenzain Nov 20, 2017

Choose a reason for hiding this comment

Uh oh!

fonnesbeck commented Nov 18, 2017

Uh oh!

domenzain commented Nov 21, 2017

Uh oh!

junpenglao commented Nov 21, 2017

Uh oh!

junpenglao commented Nov 28, 2017

Uh oh!

gBokiau commented Feb 24, 2018

Uh oh!

fonnesbeck commented Apr 23, 2018

Uh oh!

nickresnick commented May 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

twiecki commented Jun 26, 2018

Uh oh!

domenzain commented Jun 26, 2018

Uh oh!

arose13 commented Jul 25, 2018

Uh oh!

twiecki commented Jul 25, 2018 via email

Uh oh!

fonnesbeck commented Sep 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

domenzain commented Sep 24, 2018

Uh oh!

twiecki commented Sep 25, 2018

Uh oh!

domenzain commented Sep 25, 2018

Uh oh!

domenzain commented Sep 27, 2018

Uh oh!

junpenglao commented Sep 27, 2018

Uh oh!

domenzain commented Sep 27, 2018

Uh oh!

domenzain commented Sep 27, 2018

Uh oh!

junpenglao commented Sep 27, 2018

Uh oh!

twiecki commented Sep 27, 2018

Uh oh!

domenzain commented Sep 27, 2018

Uh oh!

twiecki commented Sep 27, 2018

Uh oh!

twiecki commented Sep 27, 2018

Uh oh!

domenzain commented Sep 27, 2018 via email

Uh oh!

twiecki commented Sep 27, 2018

Uh oh!

erikbern commented Sep 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

domenzain commented Oct 10, 2018

Uh oh!

domenzain commented Oct 15, 2018

domenzain commented Nov 8, 2017 •

edited

Loading

nickresnick commented May 25, 2018 •

edited

Loading

fonnesbeck commented Sep 23, 2018 •

edited

Loading

erikbern commented Sep 28, 2018 •

edited

Loading

aakhmetz commented Jan 24, 2019 •

edited

Loading