Skip to content

Correct return type of sum() builtin #1582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 5, 2017

Conversation

teepark
Copy link
Contributor

@teepark teepark commented Aug 29, 2017

sum([]) always returns the integer 0.

@JelleZijlstra
Copy link
Member

Not sure about this; we tend to avoid union return types.

@teepark
Copy link
Contributor Author

teepark commented Sep 25, 2017

Some context on why I'm changing this: I had code that would sum a list of decimal.Decimal, and then I attempted to .quantize() the returned value, which resulted in a production bug when the list was empty for the first time.

mypy is in a position to know that the return value could possibly be an integer. Can you clarify why union return types are avoided? It's certainly more accurate in this case.

@JukkaL
Copy link
Contributor

JukkaL commented Sep 26, 2017

One problem with union in this context is that some code may be careful to never pass an empty iterable to sum() and thus it will never return an int. In this case a union return type could generate false positives. I don't have a strong opinion either way, though.

@teepark
Copy link
Contributor Author

teepark commented Sep 26, 2017

Thanks for the response. That makes sense and I imagine this change would be more painful for people being careful to never pass an empty list.

But ultimately that's a dependency only on runtime data, which mypy isn't able to take into consideration. It's not really any different from a function that returns different types in the branches of an if statement -- mypy can only accurately model it as a union, even though code using the function might have more detailed knowledge based on how it crafted the arguments.

@gvanrossum
Copy link
Member

It's similar to pow(), where usually pow(x, y) with int arguments is expected to return an int, but if y is negative it will return a float. We solved that with a plugin because usually y is a compile-time constant expression. But before it just returned Any.

Maybe Any is a better return type for sum()? I'm not sure -- it seems there's no good answer here, and a plugin won't help because a plugin can only tell whether the list is empty at compile time, which is no use for a typical sum() call.

One obvious improvement is to create two overloads, sum(x) and sum(x, y). The latter does not return an int, it returns y.

@matthiaskramm
Copy link
Contributor

matthiaskramm commented Sep 27, 2017

For pytype, we would write

@overload
def sum(iterable: Iterable[nothing]) -> int: ...
@overload
def sum(iterable: Iterable[_T], start: _T = ...) -> _T: ...

(See e.g. the pytype definition of "reduce", https://github.com/google/pytype/blob/master/pytype/pytd/builtins/__builtin__.pytd#L92)

However, neither mypy nor PEP 484 has a concept of "nothing" so this doesn't work here.
(It also doesn't work because mypy doesn't stop, after the first matching @overload)

@gvanrossum
Copy link
Member

But how would you know whether a particular variable containing a list of integers can be empty or not? Otherwise this won't be very useful -- unlike pow(), sum() is rarely called with an empty list literal -- it's called with a variable that may or may not be guaranteed to be non-empty.

@teepark
Copy link
Contributor Author

teepark commented Sep 27, 2017

@gvanrossum I agree the overload is a definite improvement.

As a starting point, here's what I believe we'd need to accurately model all behavior of sum():

@overload
def mysum(iterable: Iterable[_T]) -> Union[_T, int]: ...

@overload
def mysum(iterable: Iterable[_T], start:_S) -> Union[_T, _S]: ...

In the two argument version we need a second type variable and the union because there's nothing outright requiring that the second argument be the same type as the items in the iterable, and promotion can happen either way:

sum([1, 2, 3], 4.5)  # float (start)
sum([1.0, 2.0, 3.5], 4)  # float (iterable)

Do we want to model all that behavior? I would have found it helpful for mypy to notify me of the problem with my blind sum(my_decimals).quantize(precision), but if we want to optimize for the (probably?) common case where the iterable can't possibly be empty that makes sense too.

@gvanrossum
Copy link
Member

gvanrossum commented Sep 27, 2017 via email

`sum([])` always returns the integer 0.
@JelleZijlstra JelleZijlstra merged commit 355f30c into python:master Oct 5, 2017
@teepark teepark deleted the sum-empty-int branch October 5, 2017 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants