-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Improve numerical precision of discrete uniform and geometric ICDFs #6671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #6671 +/- ##
==========================================
- Coverage 91.99% 88.84% -3.16%
==========================================
Files 95 95
Lines 16016 16024 +8
==========================================
- Hits 14734 14236 -498
- Misses 1282 1788 +506
|
329e78e
to
b95d06d
Compare
|
b95d06d
to
2feed3e
Compare
pymc/distributions/discrete.py
Outdated
res_1m = pt.maximum(res - 1, 0) | ||
dist = pm.Geometric.dist(p=p) | ||
value_1m = pt.exp(logcdf(dist, res_1m)) | ||
res = ifelse(value_1m >= value, res_1m, res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to use a switch
here, ifelse
will only work for scalar cases
Make sure you pass |
2feed3e
to
4a40ee6
Compare
Thanks, replaced Even with |
pymc/testing.py
Outdated
@@ -574,7 +574,7 @@ def check_icdf( | |||
# Test pymc and scipy distributions match for values and parameters | |||
# within the supported domain edges (excluding edges) | |||
domains = paramdomains.copy() | |||
domain = Domain([0, 0.1, 0.5, 0.75, 0.95, 0.99, 1]) # Values we test the icdf at | |||
domain = Domain([0, 0.1, 0.5, 0.75, 0.95, 0.99, 0.9999, 1]) # Values we test the icdf at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the old implementations fail with just 3 9's? I am wary of testing such an extreme value by default because it puts a high burden on future icdfs on being very accurate for extreme values. We know these things must always break at some point, specially for unbounded distributions.
domain = Domain([0, 0.1, 0.5, 0.75, 0.95, 0.99, 0.9999, 1]) # Values we test the icdf at | |
domain = Domain([0, 0.1, 0.5, 0.75, 0.95, 0.99, 0.999, 1]) # Values we test the icdf at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to cover the changes in this PR, you can add a small test where you just evaluate the icdf at the values that used to fail for the two distributions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to cover the changes in this PR, you can add a small test where you just evaluate the icdf at the values that used to fail for the two distributions
To clarify my understanding, you mean I add the specific test for those distributions that fail and not something general that applies to all future distributions right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the old implementations fail with just 3 9's? I am wary of testing such an extreme value by default because it puts a high burden on future icdfs on being very accurate for extreme values. We know these things must always break at some point, specially for unbounded distributions.
I added 0.9999
, since as outlined in #6670,
for a Geometric distribution with p=0.99
, and CDF value 0.9999
whose ICDF is expected to be 2
but it gives 3
as the result. It works fine where the ICDF of 0.99
is 1
. To summarize this in a table:
p=0.99
Value | CDF | expected ICDF of the CDF | returned ICDF of the CDF | Bug |
---|---|---|---|---|
1 | 0.99 | 1 | 1 | No |
2 | 0.9999 | 2 | 3 | Yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify my understanding, you mean I add the specific test for those distributions that fail and not something general that applies to all future distributions right?
Yes that's what I suggest. Also note that instead of tweaking the icdf value we can tweak the domain of the distribution parameters that is tested. Probably testing more extreme Geometric p
values would also have shown the error?
Regardless, a separate test is also fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ricardoV94 .
I have decided to do this, let me know if it makes sense:
With my latest commit, I have removed the value 0.9999
in the test for ICDF to not burden all future ICDF code regarding high accuracy.
p=0.99
is already used, as defined by Unit
in testing.py
.
The ICDF tests will instead be covered by the self-consistency check for ICDF of discrete distributions, which I will add in this PR #6642 . (Note that the self-consistency check for ICDF of continuous distributions is already added in #6642, and the discrete self consistency check is blocked on the current PR because of this accuracy issue. Thus I will add it to #6642 after this current PR is closed.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
Perfect |
db0b98f
to
2348e4a
Compare
Is this ready to merge then? |
Yes, in my opinion. |
Awesome, great work! |
Thank you! :) |
What is this PR about?
Improve numerical precision issues for the ICDF function for the discrete geometric distribution.
Actually, this is an issue in general, wherever the ceiling function is used on a floating point number to get the ICDF value. Thus this fix for the precision issue applies to most discrete ICDF functions, and not only the Geometric distribution.
Details are outlined in this issue #6670 .
Closes #6670
Checklist
Major / Breaking Changes
New features
Bugfixes
Documentation
Maintenance
📚 Documentation preview 📚: https://pymc--6671.org.readthedocs.build/en/6671/