Skip to content

Inconsistency over Container #397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
abarnert opened this issue Aug 17, 2014 · 5 comments
Closed

Inconsistency over Container #397

abarnert opened this issue Aug 17, 2014 · 5 comments

Comments

@abarnert
Copy link

In collections.abc, Sequence, Mapping, and Set all inherit from Container.

In typing, Sequence[T] inherits from Container[T], while Mapping[T] and AbstractSet[T] do not, and instead define __contains__, which means Container picks them up as a Protocol. Unless there's a reason these have to work differently, it seems like they should all inherit.

Also, Container doesn't annotate its __contains__ method at all, and Mapping and AbstractSet annotate it as taking an object. Shouldn't these all be requiring T?

@JukkaL
Copy link
Collaborator

JukkaL commented Aug 17, 2014

Container is still work in progress -- note that stubs/3.2/typing.py does not define Container at all, so the type checker won't let you use it. It's missing because this a somewhat tricky issue.

Note that Python often lets you use in with incompatible types, since equality generally works for any pairs of objects:

>>> 1 in ['x']
False  (no exception!)

However, this does not work for strings, which probably should be containers as well:

>>> 1 in 'x'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'in <string>' requires string as left operand, not int

So we could reasonably either restrict the argument of __contains__ to be of type T (to be compatible with str and bytes) or we could allow it to be anything (to be compatible with list, etc.). However, there's more to consider!

I'd likeSequence[int] to be a subtype of Sequence[Union[int, str]]. However, if the argument of __contains__ has type T, this does not work, since Sequence would have to be invariant. The reasoning is this: suppose I assign Sequence[int] to a variable with type Sequence[Uniont[int, str]]. Now I should be able to call __contains__ with a str argument, but this would not be fine since str is not a subtype of int!

So there is no obvious right way to do this -- it seems that no matter what we choose, we can't type check some operations properly.

Here are some compromises:

  1. Use Any type in Container for the argument type, but allow subclasses such as str
    to specify a more precise type. Type checking would be precise for concrete container types but not
    for abstract types.
  2. Use object type in Container for the argument type and don't type check in operations precisely.
  3. Use object type in Container for the argument type but have a special type checking rule for in
    operators that warns if the left operand is incompatible, even though it could be valid
    for containers such as lists.

@abarnert
Copy link
Author

The fact that 1 in ['x'] doesn't raise an exception doesn't mean it can't be a static type error (assuming ['x'] is inferred as a Container[str] rather than just a Container or Container[Any], of course).

After all, part of the point of static type checking is not just that it can catch errors earlier, but that it can catch errors that are otherwise never caught unless you write specific unit tests for them. Consider this example: I'm writing some database-driven code, and I have a function that's supposed to take a new person's name and add a record to the Person table. If the caller passes me an int for the first name, then (depending on which RDBMS I'm using), nothing is going to raise anywhere. Which is exactly why I'd like to declare my function as taking a str in the first place. It's not wrong for the type checker to reject the caller that passes me an int just because it wouldn't raise an exception.

As for your Sequence[int] example, I think that runs afoul of the Liskov substitution rule even without __contains__. You're going to have the same problem with seq.count('abc') or seq.index('abc'). If you're going with the simple variance rule you described in another issue (and are implicitly assuming for __contains__, then Sequence[int] is not a subtype of Sequence[Union[int, str]].

@JukkaL
Copy link
Collaborator

JukkaL commented Aug 17, 2014

I agree that 1 in ['x'] is probably usually an error. There are cases where something like this would not be a bug, though. For example, the type of the left operand could be a supertype of the sequence item type, and this can have a true or false result:

class Animal: ...
class Cat(Animal): ...

def f(special_cats: Sequence[Cat], animal: Animal) -> ...:
    if animal in special_cats: ...  # Should be okay?

In your example, passing an int seems like an obvious bug.

As for count and index, typing.Sequence actually currently doesn't include them (though it should). Here the reasoning would be similar. Also, b'foo'.count(1) is not valid in Python 3.2, but it was fixed later, so that bytes can now be a subtype of Sequence[int] even if using the full interface with count and index, I think.

I'm still unsure about what's the best way to do this.

@abarnert
Copy link
Author

I don't think that should be okay. Unless you want to define the contains method as contravariant (which a few languages would let you do, but I don't think you or anyone else has suggested that for MyPy), this is a type error. So a type checker should catch it.

What do statically-typed languages do in this case? Let's look at C++:

#include <algorithm>
#include <set>
using namespace std;
class Animal {};
class Cat: public Animal {};
int main() {
  set<Cat> cats;
  Animal dog;
  auto it = cats.find(dog);
}

Compile this, and you'll get a slew of errors, starting with:

contra.cpp:13:18: error: no matching member function for call to 'find'
  auto it = cats.find(dog);
            ~~~~~^~~~
<include/c++/v1/set>:641:14: note:
      candidate function not viable: no known conversion from 'Animal' to
      'const key_type' (aka 'const Cat') for 1st argument
    iterator find(const key_type& __k)             {return __tree_.find(__k);}
             ^

OK, maybe I should have picked Java or Swift rather than C++, because C++ is mostly structurally-typed, so it's literally complaining about the lack of an implicit conversion from Animal to Cat (that is, that Animal can't compile-time-duck-type as a Cat) rather than the lack of an explicit subclass relationship. But it's the same problem, and every language will reject it.

Of course you could argue that people have written code that takes advantage of the weak typing of contains and you don't want to break it. But you're not breaking it, you're just flagging it as a static type error, which it is. You can still run the code in Python and it will "work", just like my example of inserting an int into a sqlite str column will work, but that doesn't mean it's not something a type-checking linter should complain about.

@JukkaL
Copy link
Collaborator

JukkaL commented Aug 23, 2014

I think the biggest issue is whether Sequence would support covariance. The question is whether precise type checking of these operations is more useful than covariance of Sequence types.

Note that in Java, the corresponding methods work with arbitrary objects (I'm talking about List<T>). Thus these methods can be used even via wildcard type references (wildcard types are the way Java implements covariance for collection types):

int indexOf(Object o)
boolean contains(Object o)

C# does not seem to have an immutable list/sequence type, and thus it only supports covariance with IEnumerable which is similar to Iterable in mypy. I'm not 100% sure about this, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants