Skip to content

Improve numerical precision of discrete uniform and geometric ICDFs #6671

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 27, 2023

Conversation

gokuld
Copy link
Contributor

@gokuld gokuld commented Apr 13, 2023

What is this PR about?
Improve numerical precision issues for the ICDF function for the discrete geometric distribution.

Actually, this is an issue in general, wherever the ceiling function is used on a floating point number to get the ICDF value. Thus this fix for the precision issue applies to most discrete ICDF functions, and not only the Geometric distribution.

Details are outlined in this issue #6670 .

Closes #6670

Checklist

Major / Breaking Changes

  • ...

New features

  • ...

Bugfixes

  • Fix numerical precision issues for the discrete ICDF functions.

Documentation

  • ...

Maintenance

  • ...

📚 Documentation preview 📚: https://pymc--6671.org.readthedocs.build/en/6671/

@codecov
Copy link

codecov bot commented Apr 13, 2023

Codecov Report

Merging #6671 (db0b98f) into main (9f01be2) will decrease coverage by 3.16%.
The diff coverage is 60.00%.

❗ Current head db0b98f differs from pull request most recent head 2348e4a. Consider uploading reports for the commit 2348e4a to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #6671      +/-   ##
==========================================
- Coverage   91.99%   88.84%   -3.16%     
==========================================
  Files          95       95              
  Lines       16016    16024       +8     
==========================================
- Hits        14734    14236     -498     
- Misses       1282     1788     +506     
Impacted Files Coverage Δ
pymc/distributions/discrete.py 67.24% <55.55%> (-31.49%) ⬇️
pymc/testing.py 88.15% <100.00%> (-3.88%) ⬇️

... and 17 files with indirect coverage changes

@ricardoV94 ricardoV94 changed the title Bug fix: Discrete geometric ICDF. Fix numerical precision issues in Geometric ICDF. Apr 13, 2023
@ricardoV94 ricardoV94 changed the title Fix numerical precision issues in Geometric ICDF. Fix numerical precision issues in Geometric ICDF Apr 13, 2023
@gokuld gokuld force-pushed the bug_fix_geometric_icdf branch 2 times, most recently from 329e78e to b95d06d Compare April 13, 2023 19:18
@gokuld gokuld changed the title Fix numerical precision issues in Geometric ICDF Fix numerical precision issues in discrete ICDFs Apr 13, 2023
@gokuld
Copy link
Contributor Author

gokuld commented Apr 14, 2023

test_geometric and test_discrete_unif passed locally on my machine.
Let me check and get back on this.

@gokuld gokuld force-pushed the bug_fix_geometric_icdf branch from b95d06d to 2feed3e Compare April 14, 2023 05:51
res_1m = pt.maximum(res - 1, 0)
dist = pm.Geometric.dist(p=p)
value_1m = pt.exp(logcdf(dist, res_1m))
res = ifelse(value_1m >= value, res_1m, res)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to use a switch here, ifelse will only work for scalar cases

@ricardoV94
Copy link
Member

ricardoV94 commented Apr 14, 2023

Make sure you pass n_samples=-1 to check_logp and the other helpers so all combinations are tried and not just a random subset

@gokuld gokuld force-pushed the bug_fix_geometric_icdf branch from 2feed3e to 4a40ee6 Compare April 14, 2023 07:32
@gokuld
Copy link
Contributor Author

gokuld commented Apr 14, 2023

Thanks, replaced ifelse with switch.

Even with n_samples=-1. pytest -v tests/distributions/test_discrete.py -k 'test_geometric or test_discrete_unif' worked locally on my machine. Perhaps it has to do something with switch / ifelse

pymc/testing.py Outdated
@@ -574,7 +574,7 @@ def check_icdf(
# Test pymc and scipy distributions match for values and parameters
# within the supported domain edges (excluding edges)
domains = paramdomains.copy()
domain = Domain([0, 0.1, 0.5, 0.75, 0.95, 0.99, 1]) # Values we test the icdf at
domain = Domain([0, 0.1, 0.5, 0.75, 0.95, 0.99, 0.9999, 1]) # Values we test the icdf at
Copy link
Member

@ricardoV94 ricardoV94 Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the old implementations fail with just 3 9's? I am wary of testing such an extreme value by default because it puts a high burden on future icdfs on being very accurate for extreme values. We know these things must always break at some point, specially for unbounded distributions.

Suggested change
domain = Domain([0, 0.1, 0.5, 0.75, 0.95, 0.99, 0.9999, 1]) # Values we test the icdf at
domain = Domain([0, 0.1, 0.5, 0.75, 0.95, 0.99, 0.999, 1]) # Values we test the icdf at

Copy link
Member

@ricardoV94 ricardoV94 Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to cover the changes in this PR, you can add a small test where you just evaluate the icdf at the values that used to fail for the two distributions

Copy link
Contributor Author

@gokuld gokuld Apr 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to cover the changes in this PR, you can add a small test where you just evaluate the icdf at the values that used to fail for the two distributions

To clarify my understanding, you mean I add the specific test for those distributions that fail and not something general that applies to all future distributions right?

Copy link
Contributor Author

@gokuld gokuld Apr 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the old implementations fail with just 3 9's? I am wary of testing such an extreme value by default because it puts a high burden on future icdfs on being very accurate for extreme values. We know these things must always break at some point, specially for unbounded distributions.

I added 0.9999, since as outlined in #6670,
for a Geometric distribution with p=0.99, and CDF value 0.9999 whose ICDF is expected to be 2 but it gives 3 as the result. It works fine where the ICDF of 0.99 is 1. To summarize this in a table:

p=0.99

Value CDF expected ICDF of the CDF returned ICDF of the CDF Bug
1 0.99 1 1 No
2 0.9999 2 3 Yes

Copy link
Member

@ricardoV94 ricardoV94 Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify my understanding, you mean I add the specific test for those distributions that fail and not something general that applies to all future distributions right?

Yes that's what I suggest. Also note that instead of tweaking the icdf value we can tweak the domain of the distribution parameters that is tested. Probably testing more extreme Geometric p values would also have shown the error?

Regardless, a separate test is also fine

Copy link
Contributor Author

@gokuld gokuld Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ricardoV94 .

I have decided to do this, let me know if it makes sense:

With my latest commit, I have removed the value 0.9999 in the test for ICDF to not burden all future ICDF code regarding high accuracy.

p=0.99 is already used, as defined by Unit in testing.py.

The ICDF tests will instead be covered by the self-consistency check for ICDF of discrete distributions, which I will add in this PR #6642 . (Note that the self-consistency check for ICDF of continuous distributions is already added in #6642, and the discrete self consistency check is blocked on the current PR because of this accuracy issue. Thus I will add it to #6642 after this current PR is closed.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

@ricardoV94
Copy link
Member

Thanks, replaced ifelse with switch.

Even with n_samples=-1. pytest -v tests/distributions/test_discrete.py -k 'test_geometric or test_discrete_unif' worked locally on my machine. Perhaps it has to do something with switch / ifelse

Perfect

@gokuld gokuld force-pushed the bug_fix_geometric_icdf branch from db0b98f to 2348e4a Compare April 27, 2023 10:44
@ricardoV94
Copy link
Member

Is this ready to merge then?

@ricardoV94 ricardoV94 changed the title Fix numerical precision issues in discrete ICDFs Improve numerical precision of discrete uniform and geometric ICDFs Apr 27, 2023
@gokuld
Copy link
Contributor Author

gokuld commented Apr 27, 2023

Is this ready to merge then?

Yes, in my opinion.

@ricardoV94
Copy link
Member

Awesome, great work!

@ricardoV94 ricardoV94 merged commit 5cf7efe into pymc-devs:main Apr 27, 2023
@gokuld
Copy link
Contributor Author

gokuld commented Apr 27, 2023

Thank you! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: ICDF implementation for the discrete geometric distribution fails some tests.
2 participants