Skip to content

implement __class_getitem__ for some classes in itertools module #112896

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
denballakh opened this issue Dec 9, 2023 · 6 comments
Closed

implement __class_getitem__ for some classes in itertools module #112896

denballakh opened this issue Dec 9, 2023 · 6 comments
Labels
topic-typing type-feature A feature request or enhancement

Comments

@denballakh
Copy link
Contributor

denballakh commented Dec 9, 2023

Feature or enhancement

Proposal:

Currently only itertools.chain can be subscripted at runtime. I think most of other iterators should be subscriptable too.
Most of these classes are generic in typeshed, so typecheckers thing that subscripting them is fine, which is inconsistent with behaviour at runtime.

With pyright:

import itertools
x = itertools.cycle[str]('abc')
reveal_type(x) # cycle[str]

At runtime:

>>> import itertools as it
>>> it.cycle[int]
  File "<stdin>", line 1, in <module>
TypeError: type 'itertools.cycle' is not subscriptable
>>> it.count[int]
TypeError: type 'itertools.repeat' is not subscriptable
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: type 'itertools.accumulate' is not subscriptable
>>> it.chain[int] # this is the only subscriptable class in itertools module
itertools.chain[int]

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

#108761 and #108761 (comment)

@denballakh denballakh added the type-feature A feature request or enhancement label Dec 9, 2023
@rhettinger
Copy link
Contributor

@JelleZijlstra Can you weigh in on this?

Isn't it the norm to use Iterable[some_input_type] rather than specify a precise implementation type such as reversed[some_input_type]? If I understand correctly, a type checker only needs to know which methods are guaranteed to be available (the iterator protocol) rather than semantic information to differentiate say a chained source versus a reversed source.

I would think it would be bad design for a function to dictate how its input is formed:

def multiply(source: chain[int]):    <-- Forces tight coupling

vs

def multiply(source: Iterable[int]):    <-- Loose coupling

@JelleZijlstra
Copy link
Member

I'm also a bit skeptical here, and you are right that in most cases users should not write annotations like chain[int]. I believe on typeshed we got a report about this from a user who was subclassing itertools.cycle, though I don't know why you'd need that either.

This is very similar to #108761, which asks to make builtins like map subscriptable. There I argued that we should wait for a resolution on PEP 718 (#108761 (comment)).

@AlexWaygood
Copy link
Member

In general, I agree with @rhettinger and @JelleZijlstra. It's generally an antipattern to specify the exact type of iterator you need, unless that iterator has some special attributes or methods that distinguish it in some way from other kinds of iterators.

Having said that, I think there is a good argument for adding __class_getitem__ to one of the itertools classes specifically: groupby. Compare and contrast two different ways of annotating an adapted version of the unique_justseen recipe from the itertools docs. (Doing it as a one-liner makes it harder for type checkers to infer the types here, so I split it into two lines to make it easier for them.)

import operator
from collections.abc import Callable, Iterable, Iterator
from itertools import groupby

def unique_justseen1[T](
    iterable: Iterable[T], key: Callable[[T], bool] | None = None
) -> Iterator[T]:
    g: groupby[T | bool, T] = groupby(iterable, key)
    return map(next, map(operator.itemgetter(1), g))


def unique_justseen2[T](
    iterable: Iterable[T], key: Callable[[T], bool] | None = None
) -> Iterator[T]:
    g: Iterator[tuple[Iterator[T | bool], T]] = groupby(iterable, key)
    return map(next, map(operator.itemgetter(1), g))

The former is obviously much more ergonomic.

@rhettinger
Copy link
Contributor

The former is obviously much more ergonomic.

It does look a little nicer but it is only reasonable in this specific context where the type annotation is not leaked outside the function. If the return type was a groupby object, we would still be better off with the second form; otherwise, we fall into the concrete type specific anti-pattern again.

And even with this self-contained example, it is still questionable. As a code reviewer trying to understand this small data pipeline, I still need to mentally unroll g: groupby[T | bool, T] into g: Iterator[tuple[Iterator[T | bool], T]] because that is the only way to digest what the map/itemgetter step is doing. And for me personally, that unrolling isn't obvious and I find the annotation to be confusing. As an experiment, try showing g: groupby[T | bool, T] to another programmer who is familiar with type annotations and see whether they can deduce what it means and whether it is correct.

@AlexWaygood
Copy link
Member

That's fair; those are all reasonable points

@rhettinger
Copy link
Contributor

I think this can be closed now. If new PEPs get approved that change the landscape, this can be reopened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-typing type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants