-
-
Notifications
You must be signed in to change notification settings - Fork 32k
math.log handles int but not 3rd party integer types #120950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Probably better to describe this as a feature request rather than a bug: there's never been any intention to support things other than That said, |
There are several large groups of function in If we make |
I'm not sure about this. The problem for |
I'm confused. Doesn't it do that already?
|
Oh, I overlooked this. Then this issue is more like a bug.
|
True - it's definitely inconsistent that |
Handling of arbitrary large int-like argument is now consistent with handling arbitrary large int arguments.
#121011 is a simple solution for this inconsistency. It is not optimal for large int-like argument, because some work is done twice. Making it more efficient in this corner case needs inlining There are other functions in the |
Agreed; I don't think we want to go this way at all. If the large int support for |
Then, maybe it's a good opportunity to deprecate this support?
(History: 7852616)
@oscarbenjamin, I'm just curious if you worry only about inconsistency (wrt builtin ints) or had in mind some applications for this feature that couldn't be realized with existing int's methods? |
I ran into a bug in SymPy where under certain conditions math.log is called with integers but the arguments might be gmpy2.mpz or flint.fmpz which would then overflow as above. What I realised then though is that under more typical conditions this is implicitly converting mpz to float meaning different behaviour for mpz vs int. It is not clear to me if the original authors of this code in SymPy knew that math.log had a special path for handling integers (the code perhaps predates that feature). The code in question could probably use >>> print([int(math.log(float(10**(2*n)), float(10**(n)))) for n in range(10,100)])
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2] Compared to the status quo I would prefer if either I would also be happier making use of a function with a different name whose stated purpose was to compute logs accurately with large integers and e.g. having a documented guarantee that >>> integer_log(100, 10)
(2, True)
>>> integer_log(101, 10)
(2, False) I assume this is not used in the code shown because the integers are expected to be small and
Apparently PyPy and CPython give different results here: >>>> math.log(10**1000).hex()
'0x1.1fd2b914f1517p+11'
>>> math.log(10**1000).hex()
'0x1.1fd2b914f1516p+11' Should that be considered a bug? CPython's behaviour I presume does not depend on libm here so differences in the C math library are not the cause of the difference. It seems that PyPy also has: >>>> math.log(10**100, 10**10000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: int too large to convert to float So I guess it does not have this feature. If we were starting from scratch here I would say that Since there is apparently a feature that |
I did a quick run of the Diofant's test suite for ntheory module with this line changed to do float cast first. It pass. Yet, here is another place in SymPy (and Diofant): So, factor_.py's code clearly assumes CPython's math.log() feature. (On another hand, I don't see failures in the mpmath test suite.)
Does make sense for me. And we already have isqrt(), why not ilog()? This will be the place to land current (mostly) code for integer values in the log(). SymPy claims that it supports PyPy, but I believe it's CI should fail on PyPy. ilog() proposal will address this real issue. While support for CC @tim-one as author
Yes, it has just simple wrappers to libm: I don't think that we could consider this to be a PyPy bug, because it's not documented even as a CPython-specific feature. |
That particular case is a good example for where In [14]: b1 = int.bit_length
In [15]: b2 = lambda n: int(math.log2(n))+1
In [16]: all(b1(m) == b2(m) for m in range(1, 10000))
Out[16]: True Looks like PyPy's log2 function can handle large integers correctly as well:
It seems that PyPy can handle arbitrarily large integers in To be able to depend on this capability it would be better if it was not just an implementation detail of CPython or PyPy though and if any expectations about handling of exact integer values were documented. |
It would be nice to have an Turns out SymPy's I don't see anything like I guess it is hard to beat just using floats in this operation but it is nicer if if it is wrapped up in something that guarantees not to fail due to rounding, libm etc. Maybe |
This feature was originally introduced by @tim-one in 7852616. Unfortunately there is no reference to issue number, so we do not know what discussion preceded it and what were the reasons to implement it. My guess is that it was used instead of to get the order of magnitude of the number, like Now we should decide what to do with this further. One direction -- extend this to support third-party integer types. But there are still many inconsistencies -- why other Other way -- deprecate this undocumented behavior. @tim-one, what are your thoughts? |
(@serhiy-storchaka, I worry if reply by Tim is safe here, given his "suspension". I won't make it ethernal.)
This seems to be a reasonable way. I will try to find other projects, that could be affected, beyond sympy/diofant. |
The PSF has no control over my GitHub account. I can still do anything here that anyone else with a GitHub account can do. I cannot do things like commit changes, close issues, or add/remove labels on PRs, in the CPython project, for the duration. There was no public discussion at the time. Little thought went into it. "The simplest thing that could possibly work." As I recall, Guido agreed it was a good idea - "if the implementation is easy to explain, it may be a good idea". So I added it. The Python language never guarantees anything about float results - not even CPython's results. Too dependent on the platform C and It cannot be deprecated, in my view. It's useful, and used. I personally never cared about base 2 in this context, but other people certainly do. Leave them in peace. Their code works fine. The language docs should change to say that integers too big to convert to a float are fine to use, although it may need weasel-words to say that the very largest integer that's fine to use is implementation-defined. This is not valuable enough to justify heroic (massive, complicated, involved) efforts to implement. Keep it simple, and just give up (raise I'm entirely in favor of doing whatever it takes so packages with other ideas of what "int" means can participate. There's no good reasons to extend this to Likewise there's no good reason to change Doesn't cover everything, but I have to get some sleep now. G'nite! |
I would be astonished if they had any objection to my participating here. It's an Open Source project, after all., and I'm not hurting anyone in any conceivable way here. And I already told the Steering Council I intended to help Serhiy if I could (although I haven't yet heard back from them). |
Hmm, so far I've failed to see more examples (which could be replaced with bit_length() workarounds) beyond sympy/diofant. What if we factor out this code to a dedicated function like ilog()? You mention use case, that seems to be interested only in integer part of the result.
In my opinion you didn't anything like that before. So, I haven't a good mind model for SC. |
Very glad to hear it.
I was astonished by the suspension in the first place. In any case it's great that you're here so we can have the discussion and I'll get back on topic... I also don't see why anything should be deprecated. The fact that
You'd need a million gigabytes of RAM to represent an integer that large. My machines have never had more than about 32 gigabytes but I do often hear of machines with quite a bit more memory. Maybe it is not completely inconceivable in future... Something like In [16]: math.log(3**100)/math.log(3)
Out[16]: 100.0
In [17]: (3**100).bit_length()/(3).bit_length()
Out[17]: 79.5 The bit length is quite significantly off. The first of these is a useful first step in trying to find an exact integer relationship that can be followed up with a few iterations up or down. It gives an approximation that might be out by a couple of bits but we can try things like That is one thing that SymPy uses math.log (actually math.log10?) for. It is also the same approach that Flint uses for some similar exact integer operations and an equivalent of On the other hand I don't immediately know of examples where def ilog(a, b):
"""n such that b**n == a or raise an error."""
...
def ilog(a, b):
"""largest n such that b**n <= a."""
... Tim's original desire was to have The implementation of Serhiy's suggested alternative for In [27]: log = lambda n: math.log(n * 0.5**n.bit_length()) + n.bit_length()*math.log(2)
In [28]: log(10**1000)
...
OverflowError: int too large to convert to float It does not seem easy to replace having a function that computes the My suggestions are:
|
I won't talk about the ban here - this isn't an open discussion forum, and the Steering Council could (IMO) quite legitimately object to it being treated as if it were. I don't believe there's a legitimate objection to using this for its intended purpose (discussing technical issues related to the Python language). But if they object anyway (I still haven't heard back from then), then I'll have to stop (regardless of whether they can force me to stop - I'll honor their wishes). Right, it's not just 1- and 2-arg Adding Remain -1 on deprecation. Leave users in peace. It's not true that all uses want only the integer part. The result of Reluctant to add >>> n = str(2**82_589_933 - 1)
...
ValueError: Exceeds the limit (4300 digits) for integer string conversion; use sys.set_int_max_str_digits() to increase the limit Why doesn't it say how many digits it would require? At one time (before it was released), it tried to. But Mark Dickenson gave up on trying to make it exact, so better to say nothing than tell a lie in some cases. It's in fact easy to give an exact result, by computing An efficient exact In any case, adding exact |
BTW, in the absence of a compelling reason not to, when working with giant ints I favor matching what However, not all Python implementations can use C extensions. So it's of practical benefit when a piece of Python can use native Python ints or They don't have a 2-argument Their |
JFR, in gmpy2 log*() functions are just wrappers for appropriate MPFR/MPC functions.
That works in a slightly different way (and I'm not sure if the bigfloat package works like that). All gmpy2 functions accept giant ints, because they are converted exactly (not using current context settings, like precision, exponent bounds, etc): >>> from gmpy2 import *
>>> set_context(ieee(64))
>>> get_context()
context(precision=53, real_prec=Default, imag_prec=Default,
round=RoundToNearest, real_round=Default, imag_round=Default,
emax=1024, emin=-1073,
subnormalize=True,
trap_underflow=False, underflow=False,
trap_overflow=False, overflow=False,
trap_inexact=False, inexact=False,
trap_invalid=False, invalid=False,
trap_erange=False, erange=False,
trap_divzero=False, divzero=False,
allow_complex=False,
rational_division=False,
allow_release_gil=False)
>>> log(10**1000)
mpfr('2302.5850929940457')
>>> mpfr(10**1000)
mpfr('inf')
>>> # WAT? Here is what happens under the hood:
>>> mpfr(10**1000, precision=1)
mpfr('10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.0',3322)
>>> log(_)
mpfr('2302.5850929940457') In that sense gmpy2 has foolish consistency. |
By "matching" I mean API much more than implementation, but your point stands. For example, |
@oscarbenjamin, I forgot to mention that you shouldn't take Serhiy's example literally. Rather, it could be like this: def mylog(n):
nbits = n.bit_length()
# use integer division, rather multiplication to a float:
return math.log(n/(1<<nbits)) + nbits/math.log2(math.e) This should work in same range as current math.log. I think that this fits another use case, mentioned by @tim-one:
From my quick test, above simple version looks not much worse than the current math.log (stats taken for
# a.py
import math, mpmath
from collections import defaultdict
from random import randint, seed
from math import ulp, floor
from sys import argv
ULP_SCALE = 2.0
bins1 = defaultdict(int)
bins2 = defaultdict(int)
name = 'log'
ref = mpmath.log
lib = math.log
def lib2(n):
nbits = n.bit_length()
return math.log(n/(1<<nbits)) + nbits/math.log2(math.e)
seed(1)
count = 1
nbits = eval(argv[1])
while count & 0xfff:
arg = randint(2, 2**nbits)
with mpmath.extraprec(100):
try:
got1 = lib(arg)
except:
continue
expected = float(mpmath.autoprec(ref)(arg))
got2 = lib2(arg)
ULP = ulp(expected)
diff1 = (got1 - expected)/ULP
diff2 = (got2 - expected)/ULP
diff1 = floor(float(diff1) * ULP_SCALE)
diff2 = floor(float(diff2) * ULP_SCALE)
bins1[diff1] += 1
bins2[diff2] += 1
count += 1
header = f"===== {name} for {nbits} ================="
print(f"{header:=<35}", flush=True)
print(f"{'math.log':<20} {'mylog':=<21}", flush=True)
for k in sorted(set(bins1) | set(bins2)):
v1 = bins1[k]
v2 = bins2[k]
print(f"{k/ULP_SCALE:+5.2f} {v1:7} "
f"{v1/count:6.2%} {v2:7} {v2/count:6.2%}",
flush=True) I think that with the PEP 791 - this issue might be postponed. The The ilog() (like SymPy's integer_log()) might be a reasonable addition, but I doubt it's can be easily implemented efficiently, e.g.:
|
Uh oh!
There was an error while loading. Please reload this page.
Bug report
Bug description:
The
math.log
function has a special path for integers that only works for the builtinint
type:cpython/Modules/mathmodule.c
Lines 2220 to 2223 in ac61d58
That means that it does not work for 3rd party integer types e.g.:
Maybe if there is a special integer handling path in
math.log
it should check for__index__
before__float__
.Related gh-106502 is about
math.log
withDecimal
.CPython versions tested on:
3.13
Operating systems tested on:
No response
Linked PRs
The text was updated successfully, but these errors were encountered: