Skip to content

Second opinions on a couple specific issues #41

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mikeshardmind opened this issue Feb 7, 2024 · 32 comments
Closed

Second opinions on a couple specific issues #41

mikeshardmind opened this issue Feb 7, 2024 · 32 comments

Comments

@mikeshardmind
Copy link
Collaborator

mikeshardmind commented Feb 7, 2024

So.. while working on something formal for #38, I'm left in a situation where there are a few corner cases I really don't like, and I think we need answers for.

  1. type[OrderedIntersection[T, U]] : The primary use case seems to imply that the class that satisfies this should be instantiable as T, but type is currently not type safe for __init__ (my displeasure at LSP exceptions for "pragmatic" reasons have already been expressed, and I dont think it's in scope to change that) so do users need to do
    OrderedIntersection[Callable[*expected_init_args, OrderedIntersection[T, U]], OrderedIntersection[T, U]] to actually have type safety here? This seems exceptionally verbose for a situation that shouldn't require this level of verbosity, especially since there's no way to just "copy" T.__init__'s signature, is there something we can do to improve this without fixing the fact that LSP is being ignored for constructors?

  2. What about intersections with members of the numeric tower? OrderedIntersection[HasDotHexMethod, float] allows fixing (one of) the type holes with it. Do we need to do anything else here? I'm leaning no and that we should, after this is accepted, look into fixing some things with the numeric tower and having protocols built with intersections that express common things people want to be able to allow for duck typed math while remaining type safe, but I think that doesn't require extra work here, but future work after, have I missed something that requires we consider it more ourselves?

@mikeshardmind mikeshardmind changed the title RFC on a few corner cases. Second opinions on a couple specific issues Feb 7, 2024
@mark-todd
Copy link
Collaborator

Thanks for working on this @mikeshardmind - good to know it's coming along!

Here are my thoughts on your points:

  1. So, my thought here is that because it is ordered, users have 2 choices (I'm assuming left-most takes priority like inheritance):
    a) type[OrderedIntersection[T, U]]: This implies the __init__ of T takes priority, and that means the type signature of that __init__ is the __init__ in the resulting type.
    a) type[OrderedIntersection[U, T]]: This implies the __init__ of U takes priority, and that means the type signature of that __init__ is the __init__ in the resulting type.

If the __init__ of T and U are different it doesn't matter that LSP is violated, only that we know which one takes priority in the result.

especially since there's no way to just "copy" T.__init__'s signature, is there something we can do to improve this without fixing the fact that LSP is being ignored for constructors?

I've noticed this comment - isn't there? Can't the type checker just work through elements of the intersection until it finds one that specifies the requested method?

  1. My thoughts on this return to: what does inheritance do?
class A:
    def hex(self):
        pass

class Test(A, float): # Base classes for class "Test" define method "hex" in incompatible way
    pass                # Return type mismatch: base method returns type "str", override returns type "None"

I think OrderedIntersection should use the same error message, or maybe "clash in intersection due to: ..."

@mark-todd
Copy link
Collaborator

mark-todd commented Feb 7, 2024

Made this quick runtime illustration of how I imagine the type checker for intersection might work:

from typing import Any, Optional

def check_compatibility(intersection_tup: tuple[Any,...]) -> Optional[str]:
    # Checks for any errors in combination of items
    # If errors returns str, else returns None
    # This can reuse the logic type checkers already use when mixing classes
    return None

def intersection_getattr(intersection_tup: tuple[Any,...], attr: str) -> Any:
    error = check_compatibility(intersection_tup)
    if error is not None:
        raise TypeError(f"Intersection {intersection_tup} incompatible, reason: {error}")
    else:
        for item in intersection_tup:
            if hasattr(item, attr):
                return getattr(item, attr)
        raise TypeError(f"Attribute {attr} not found in intersection {intersection_tup}")

Then if we apply this to case 1, it takes the __init__ from T

intersection_getattr((T,U), "__init__")

...and if we're in case 2, check_compatibility would fail (if mixing with float would fail) and we get the same error message. I suppose in reality check_compatibility would run at the point of Intersection creation, so:

check_compatibility((HasDotHexMethod,float))

...would then produce an error

Edit: The logic of check_compatibility would most likely be quite complex, but fortunately I think it would already exist. For pyright I think this is probably one of the related functions: here

@mikeshardmind
Copy link
Collaborator Author

mikeshardmind commented Feb 7, 2024

Ah, see that's not actually what I was getting at @mark-todd

import random

class T:
    def __init__(self, req_arg: int):
        ...

class U:
    def __init__(self, req_arg: int):
        ...

class LSPViolation(T, U): 
    def __init__(self, req_arg: int, extra_required_arg: int):  # type checkers allow this
        self._x = required_arg

type Intersected = OrderedIntersection[T, U]

def f(typ: type[Intersected]) -> Intersected:
    return typ(random.randint(1, 100))


f(LSPViolation)

You can check a simpler version of this to see that it's an existing problem) without intersections, but the implications of this are worse because this is explicitly a means of compositing types, so not being able to safely construct them afterward feels wrong.

Now this is fixable with intersections, where it isn't currently fixable:

import random
import typing

class T:
    def __init__(self, req_arg: int):
        ...

class U:
    def __init__(self, req_arg: int):
        ...

class LSPViolation(T, U): 
    def __init__(self, req_arg: int, extra_required_arg: int):  # type checkers allow this
        self._x = required_arg

type Intersected = OrderedIntersection[T, U]
type ConstructableAsIntersected[T, U] = Callable[[int], Intersected]
type IntersectedType = OrderedIntersection[ConstructableAsIntersected, type[Intersected]]

def f(typ: IntersectedType) -> Intersected:
    return typ(random.randint(1, 100))


f(LSPViolation)  # this would actually be detected now, this isn't a ConstructableAsIntersected

But you'll notice there's no way to just automatically do this. We can't extract the expected init signature automatically and convert those type aliases to

type ConstructableAsIntersected[*Ts] = Callable[??? , OrderedIntersection[*Ts]]
type IntersectedType[*Ts] = OrderedIntersection[ConstructableAsIntersected[*Ts], type[OrderedIntersection[*Ts]]

The triple question marks as standins for the missing puzzle piece to allow this to be type safe and not require a ton of per type boilerplate, but the question is really why should users need this boilerplate when this is a clear and obvious use.

Basically, we're running into an issue with a prior decision and no way to square this with that decision without adding an exception to the existing exception, removing the exception, or pairing this with some additional construct we haven't described here yet.

@mikeshardmind
Copy link
Collaborator Author

I think you also missed my intent with the numeric tower, which is on me for just assuming people were aware that the numeric tower in python is currently not type safe as it is. The question I had is very similar to the first one, is it enough that the type safety with the numeric tower is now expressable and avoidable, do we need to do more, or can this be fixed seperate to this without us needing to consider anything about it now?

Code sample in pyright playground

def f(x: float, /):
    hexrep = x.hex()
    ... # but actually use it as a numeric value here too, so protocol isn't "simply" enough


f(1)

This is allowed, again for "pragmatic" reasons, and there is a way to express what you want with intersections now:

class HasDotHex(Protocol):
    def hex(self) -> str:
        ...

def f(x: OrderedIntersection[HasDotHex, float]):
    ...


f(1)  # this fails the protocol now

Again, this is a situation where in theory we don't need to do anything, and we gave people tools to fix an existing type hole, but it feels like we're just kicking an existing problem further down the road here, so in doing so I want to see if anyone has a reason for anything more that's actually our problem to consider to make sure that in the process of not addressing this one we don't make it harder for it to be properly adressed in the future. "You already have this way of doing it, this isn't a priority" mixed with people doing the easiest thing... well, the correct thing not being ergonomic is going to lead to the correct thing not being done.

@mark-todd
Copy link
Collaborator

mark-todd commented Feb 7, 2024

Ah, see that's not actually what I was getting at @mark-todd

@mikeshardmind I think I see - that's pretty spicy!

So I'm pretty sure there are "spin-off" problems from this you could also generate from the fact some method combinations don't violate LSP I believe.

I propose a rule - if you overwrite an existing method from let's say OrderedIntersection[T, U] with a function signature that doesn't match, you're no longer of type OrderedIntersection[T, U].

import random
import typing

class T:
    def __init__(self, req_arg: int):
        ...

class U:
    def __init__(self, req_arg: int):
        ...

class NotFine(T, U): 
    def __init__(self, req_arg: int, extra_required_arg: int):  # type checkers allow this
        self._x = required_arg

class Fine(T,U):
    def __init__(self, req_arg: int):
        ...

def f(typ: type[OrderedIntersection[T, U]]) -> OrderedIntersection[T, U]:
    return typ(random.randint(1, 100))
    
f(Fine) # No issues
f(NotFine) # Not fine - method `__init__` clashes with T definition of `__init__`

To consult my earlier runtime algorithm, for this to work you would get the type of __init__ with:

intersection_getattr((T,U),'__init__')

And find that this type matches with Fine.__init__ but not with NotFine.__init__

@mikeshardmind
Copy link
Collaborator Author

I propose a rule - if you overwrite an existing method from let's say OrderedIntersection[T, U] with a function signature that doesn't match, you're no longer of type OrderedIntersection[T, U].

This would be the "Add an exception to the existing exception" option...

type checkers don't consider the __init__ signature with type[*Ts], so we'd be saying that if type is in an intersection, they have to consider it again for this to do anything. I have a feeling this level of "dont check this as an exception to lsp, except here, check it again here" is going to be a problem.

@mark-todd
Copy link
Collaborator

I have a feeling this level of "dont check this as an exception to lsp, except here, check it again here" is going to be a problem.

Right, I think I see, because e.g. with the function I shared earlier check_compatibility would behave differently for mixing vs intersections - could they just add a bool of "intersection_mode" maybe?

@mark-todd
Copy link
Collaborator

mark-todd commented Feb 7, 2024

class HasDotHex(Protocol):
def hex(self) -> str:
...

def f(x: OrderedIntersection[HasDotHex, float]):
...

f(1) # this fails the protocol now

@mikeshardmind
Wait a minute, you can't have an intersection of a Protocol and a non-Protocol right? Just like you can't mix a Protocol and Non-Protocol

@mikeshardmind
Copy link
Collaborator Author

It breaks congruency with what's allowed currently to fix this by going the exception route if we had that, then this would happen:

import random

class T:
    def __init__(self, req_arg: int):
        ...

class U:
    def __init__(self, req_arg: int):
        ...

class LSPViolation(T, U): 
    def __init__(self, req_arg: int, extra_required_arg: int):  # type checkers allow this
        self._x = required_arg

type Intersected = OrderedIntersection[T, U]

def f1(typ: type[Intersected]) -> Intersected:
    return typ(1)

def f2(typ: type[T]) -> T: 
    return typ(1)

f(LSPViolation)  # the exception to the exception prevents this from being allowed
f2(LSPViolation)  # allowed, the existing exception allows this, errors at runtime

Maybe we can't do anything about it from that end without being too disruptive. Other than modifying the original exception to fix it, I think we would need to figure out how to remove the question marks from this.

type ConstructableAsIntersected[*Ts] = Callable[??? , OrderedIntersection[*Ts]]
type IntersectedType[*Ts] = OrderedIntersection[ConstructableAsIntersected[*Ts], type[OrderedIntersection[*Ts]]

to at least allow this to be a way for people to patch over the type hole.

@mikeshardmind
Copy link
Collaborator Author

@mark-todd

Wait a minute, you can't have an intersection of a Protocol and a non-Protocol right? Just like you can't mix a Protocol and Non-Protocol

You can with the way the rules were described, and not allowing this would break a few of our class decorator cases.

It's not a big deal, we aren't assuming there is a concrete implementation at the point of the intersection, we're placing in an annotation context the expectation that there will be a value which satisfies the protocol and non protocol constraints simultaneously. There's not the same ambiguity with subclassing a protocol going on here.

@mark-todd
Copy link
Collaborator

It breaks congruency with what's allowed currently to fix this by going the exception route if we had that, then this would happen:

@mikeshardmind Yeah I see what you're getting at here - it's a bit strange but I think it's fine. I would expect the intersection to be more restrictive than saying must inherit from T.

@mark-todd
Copy link
Collaborator

mark-todd commented Feb 7, 2024

You can with the way the rules were described, and not allowing this would break a few of our class decorator cases.

It's not a big deal, we aren't assuming there is a concrete implementation at the point of the intersection, we're placing in an annotation context the expectation that there will be a value which satisfies the protocol and non protocol constraints simultaneously. There's not the same ambiguity with subclassing a protocol going on here.

@mikeshardmind Ooh interesting. I would argue Protocols and Non-Protocol shouldn't share an intersection, but I do see also why it might be useful, as you can tag on behaviour to an existing class... - any reason why you couldn't use Abstract classes instead of Protocols for that use case?

I think it's important that we follow the tenet of what is acceptable by inheritance - mixing them currently isn't so that's why I'm thinking that neither should it be to Intersection. At the same time though, pragmatically a non-useful Intersection helps nobody so it's difficult.

@mark-todd
Copy link
Collaborator

Oh similarly I would expect TypedDict can only be intersected with other TypedDict's

@mikeshardmind
Copy link
Collaborator Author

mikeshardmind commented Feb 7, 2024

any reason why you couldn't use Abstract classes instead of Protocols for that use case?

That the protocol is the correct tool. abstract classes (As in from abc) don't actually get checked properly by type checking in all contexts. That's part of why the numbers module is somewhat useless for type checking. You may want to look at this https://github.com/beartype/numerary for examples of protocols that can't be handled with abcs in the type system or this long discussion thread on it: https://discuss.python.org/t/numeric-generics-where-do-we-go-from-pep-3141-and-present-day-mypy/17155

Oh similarly I would expect TypedDict can only be intersected with other TypedDict's

I haven't gotten to TypedDicts yet, but I'm not sure I agree with that. I need to explore the rules we have for potential issues with not restricting it like that (FWIW I don't think there are any issues with allowing it) as I'm already aware of reasons people would want to not do this, such as with annotating sql wrapping objects that can behave like a typeddict for accessing values, but may have additional methods for getting metadata about the action that produced the object.

@mark-todd
Copy link
Collaborator

mark-todd commented Feb 8, 2024

You may want to look at this https://github.com/beartype/numerary for examples of protocols that can't be handled with abcs in the type system

@mikeshardmind Right, thanks, I think I follow why Protocol mixtures would be useful now. Also I've thought some more re-implementation and mixing Protocols and Non-Protocols should be fine as it's just the types that get merged. On which note, returning to your example:

class HasDotHex(Protocol):
    def hex(self) -> str:
        ...

def f(x: OrderedIntersection[HasDotHex, float]):
    ...


f(1)

See, I would say f(1) should succeed. Here's my reasoning - the function signature of hex in HasDotHex matches the function signature in float. This means the interface of OrderedIntersection[HasDotHex, float] matches that of float - in my mind I suppose the difference between Protocols and Non-Protocols in an OrderedIntersection, is that for Protocols we say the resulting intersection is also, effectively a Protocol:

OrderedIntersection[Protocol, Protocol] = Protocol
OrderedIntersection[Non-Protocol, Non-Protocol] = Non-Protocol
OrderedIntersection[Protocol, Non-Protocol] = Protocol
OrderedIntersection[Non-Protocol, Protocol] = Protocol

The point is that I think it has to be one or the other, and the result being a non-protocol is definitely out - I'm not sure we can leave it as a "mixed" type if that makes sense.

I haven't gotten to TypedDicts yet, but I'm not sure I agree with that.

Let's recircle to TypedDict perhaps - I think I can see some issues there but perhaps it's a problem for another thread :)

@mikeshardmind
Copy link
Collaborator Author

See, I would say f(1) should succeed. Here's my reasoning - the function signature of hex in HasDotHex matches the function signature in float. This means the interface of OrderedIntersection[HasDotHex, float] matches that of float - in my mind I suppose the difference between Protocols and Non-Protocols in an OrderedIntersection, is that for Protocols we say the resulting intersection is also, effectively a Protocol:

That would defeat the purpose of OrderedIntersection. The entire point is the ability to specify multiple things about something.

1 as a value is an instance of int, and doesn't have .hex()

OrderedIntersection[Protocol, Protocol] = Protocol
OrderedIntersection[Non-Protocol, Non-Protocol] = Non-Protocol
OrderedIntersection[Protocol, Non-Protocol] = Protocol
OrderedIntersection[Non-Protocol, Protocol] = Protocol

None of the above. As I described it, it's a typing special form and does not create a new type, but describes a set of types. Union does not turn a union of protocols and non protocols into only being protocols, it's a type system tool (special form) for describing which things are being considered.

The simple behavior:
At assignment, check in order: is the value consistent with all of these types?
At use (access of a member of it), check in order to find the first type providing a definition for that member, error if none provide one.

@mark-todd
Copy link
Collaborator

mark-todd commented Feb 8, 2024

1 as a value is an instance of int, and doesn't have .hex()

Ah gotcha, sorry I was thinking it was being used as a float. I guess my point was that f(1.0) should succeed.

Ok, I think the problem I'm wrestling with is a different one again - consider the following:

class A:
   def foo(self) -> int:
      return 1

class B(Protocol):
   def bar(self) -> int:
      ...

class C:
   def third(self) -> int:
      return 2

class D(A,C):
   pass

class E(A):
   def bar(self) -> int:
      return 3

d: OrderedIntersection[A,C] = D() # Must inherit from A & C in that order
e: OrderedIntersection[A,B] = E() # Must inherit from A and have the method `bar`
f: OrderedIntersection[A, B, C] = ?? # Now what? 

To make f, we could say, it must inherit from A & C in that order and have method bar, but that would require us to treat Protocols and Non-Protocols separately. My solution was to just say it becomes "Protocol-Like" - in other words we drop the inheritance condition.

Edit: I'm now thinking looking at your descriptions that maybe the inheritance condition isn't there even for non-protocol classes - is that correct?

@mikeshardmind
Copy link
Collaborator Author

mikeshardmind commented Feb 8, 2024

@mark-todd So, Maybe this is a good example of a reason to use a term that isn't OrderedIntersection, as the ordered part is only on one side of the interaction.

All of your examples there are fine and solvable with the current semantics. The ordering doesn't come into play at class definition.

You have:

d: OrderedIntersection[A,C] = D() # Must inherit from A & C in that order
e: OrderedIntersection[A,B] = E() # Must inherit from A and have the method `bar`
f: OrderedIntersection[A, B, C] = ?? # Now what? 

But the comments aren't quite right. line by line

"Must be an instance of A and C or their subtypes, attribute/method lookup at type time will be done by looking at A then C"
"Must be an instance A of a subtype of A and satisfy the protocol B attribute/method lookup will be done by first looking at A, and the B"
"Must be an instance of A and C or their subtypes and satisfy the protocol B, attribute/method lookup at type time will be done by looking at A then B, then C"

The reason this isn't a type-hole is that the limitations on what classes can be defined prevent any case with incompatible overlap from ever being created to then have an instance to pass in. The ordering being only at the end serves 2 purposes.

  1. With the use of protocols, you can place your protocol before the type being augmented. This claims the lower bound of functionality you specifically need, and allow the augmented type to do more or less without breaking. It creates very clear boundary conditions that are expressible by the programmer.
  2. By the same token with 1 for boundary conditions expressed by the programmer, the boundary with Any/untyped code becomes possible for the programmer to decide, rather than the type checker to have to make a decision on.

Some quick examples

Intentionally using the ordering as a mask for exported capability to rely on vs internal types (previously not possible without a lie in a type stub or a cast)

__all__ = ["X",]

class _X:
    def foo(self, _private_cache: dict[str, int] = {}) -> int:
        """
        The private cache being implemented this way prevents the issues of lru_cache on methods...
        """
        ...

class ProtoFoo:
    def foo(self) -> int:
        ...

# We've exported this without `_private_cache` being an allowed arg to pass in X().foo
# We can continue to export this as is, even if `_X` is later changed to use some other means of handling caching as we've declared that the definition for the method `foo` should be looked for by type checkers in ProtoFoo before _X
X: type[OrderedIntersection[ProtoFoo, _X]] = _X

Examples of behavior with Any:

class X:
    def foo(self) -> int:
        ...

def foox(x1: OrderedIntersection[X, Any], x2: OrderedIntersection[Any, X]):
    x1.foo()  # int    # This is the case that was more strict than *possible* with `Any` in option 5 from the prior set
    x1.bar()  # Any
    x2.foo()  # Any  # This is the case I didn't like with option 4 from the prior set
    x2.bar()  # Any

going back to your example

class A:
   def foo(self) -> int:
      return 1

class B(Protocol):
   def bar(self) -> int:
      ...

class C:
   def third(self) -> int:
      return 2

class D(A,C):
   pass

class E(A):
   def bar(self) -> int:
      return 3

class F(E, C):
    pass

d: OrderedIntersection[A,C] = D() 
e: OrderedIntersection[A,B] = E()
f: OrderedIntersection[A, B, C] = F()  # As there's no overlapping methods/attributes to ever be looked up, the ordered nature won't ever come into play

@DiscordLiz
Copy link
Collaborator

@mikeshardmind What you have there makes sense to me. If you haven't already, you probably need to explain how the rules prevent an issue in the following case:

# library.py
__all__ = ["X",]

class _X:
    def foo(self, _private_cache: dict[str, int] = {}) -> int:
        """
        The private cache being implemented this way prevents the issues of lru_cache on methods...
        """
        ...

class ProtoFoo:
    def foo(self) -> int:
        ...

# We've exported this without `_private_cache` being an allowed arg to pass in X().foo
# We can continue to export this as is, even if `_X` is later changed to use some other means of handling caching as we've declared that the definition for the method `foo` should be looked for by type checkers in ProtoFoo before _X
X: type[OrderedIntersection[ProtoFoo, _X]] = _X
# user.py

from library import X

class Y(X):
    def foo(self) -> int:
        ...

I think this can properly error for LSP issues with the definitions you chose.

@mikeshardmind
Copy link
Collaborator Author

mikeshardmind commented Feb 9, 2024

@DiscordLiz Thanks for spelling that out as an example after the quick chat we had earlier, I'm going to be taking a lot of these examples and creating test cases as part of the proposal, both as samples of what is and isn't expected, as well as things that can be used in the conformance suite. I gave it a bit more thought since we talked about it, I do have a note for it here after confirming it isn't a problematic case.

The current working definition requires this be detected as invalid subtyping (LSP conflict). It just requires extra clarity on in what contexts type checkers must consider consistency with all operands and when they only need to get the first definition.

Namely, consistency with all operands must be considered when determing if there is a valid type relationship between an intersection and another type¹. In all other cases, it is sufficient to use the first found definition. However, this occurs in multiple contexts.

  • Declaring subtypes (such as via subclassing)
  • Assigning a value to a variable with a type specification²
  • typing.assert_type

It may be worth getting a list of contexts where this relationship must be handled into the typing specification as part of the increase in clarification as part of this, as clarity around intersections would be improved for having such a reference.

¹ The word type and the idea of a type relationship are each somewhat overloaded and not fully specified in the specification. When considering annotations as specifications of allowed types, this is the level at which a type comparison must be taking place for this to apply.

² I believe the term "Type specification" to be a good way to express the difference between types as a runtime concept, and types which express expectations of the runtime types, and will be using this as a defined term in what I am writing. Should another term for this be adopted instead in the specification, it would be easy to revise this to use another term.

@mark-todd
Copy link
Collaborator

Putting this here because I had a slightly more general comment from this, and this one:

@mikeshardmind I think we really have three sensible choices:

  1. We allow anything to be intersected, as if inheritance can happen.
  2. We ban intersections between objects that can't be combined together via inheritance.
  3. We have a halfway house where some are allowed and some aren't.

A lot of our discussions come down to:

  • Combining TypedDict with other non-typeddict classes
  • Combining base types like float and int with other non-base-type classes
  • Combining Protocol with other non-protocol classes

It occurs to me that these may actually be the same issue. @CarliJoy made some good points about why combining TypedDict with other classes might be useful. @mikeshardmind made some good points here about why combining Protocol with non-protocol classes might be useful.

Personally I think the only option I am against is option 3, where we allow special subcases, because this makes the logic of intersection much more complex. Sure, we could say we want to allow TypedDict, and Protocol and go through every subcase we can think of, but we'd be bound to miss something.

There have been plenty of compelling discussions against option 2 - I think this option realistically won't succeed.

That leaves us with option 1 - anything can be intersected, but what does that mean?

Option 1 - Intersect anything

Let's take str and int.

We know obviously that these cannot be combined in a class - but the question I pose is, does that stop us forming an intersection? We've discussed plenty of other cases where this is allowed. If we consider the methods of a str, skipping dunder methods for now:

['capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find',...]

And we consider the methods of int:

['as_integer_ratio', 'bit_count', 'bit_length', 'conjugate', 'denominator', 'from_bytes', 'imag', 'is_integer',...]

These could be combined into some combination, defined by another class:

['as_integer_ratio', 'bit_count', 'bit_length', 'capitalize', 'casefold', 'center', 'conjugate',  'count', ...]

And these could be the methods of OrderedIntersection[str, int]

But wait! It must inherit from str and int now!

This is where it gets interesting. See, I realised on reflection that frequently in our discussions we say, "Well ok, we'll relax this condition in this subcase." In my mind, what holds for Protocol and TypedDict must also hold for str and int as the rule about inheritance still applies. That leaves two solutions:

a) We relax the inheritance restriction
b) We create an exclusions list.

Ok, so maybe I'm not so against option 3 after all.

Solution a) presents the issue that we wanted to be able to enforce that for concrete classes.

Solution b) means we just have to come up with a list of special classes to which this rule doesn't apply. So far we have:

exclusions = [Protocol, TypedDict, str, int, float, bytes, (other base types that can't be mixed)]

Does solution b) here resolve all the cases?

@mikeshardmind
Copy link
Collaborator Author

@mark-todd The option you're missing is the current option as proposed by OrderedIntersection: Don't ban anything in intersection, don't detect incompatibility in intersection, it will be detected when someone actually tries to make the type or an instance of the type if it is actually incompatible. OrderedIntersection is not a type and does not create a type It is a type system level way of expressing expectations of a type or instance of a type, and those expectations may be impossible to satisfy.

Here's how this plays out with two incompatible cases, 1 that is expressible by the type system, and one that's a runtime issue.

from typing import OrderedIntersection as OI

def takes_bad_runtime_intersect(x: OI[int, str]):
    ...

class UserAttempt(int, str):
    pass

User gets a runtime error at the class definition, and is told they can't do that. Note that type checkers do not error at that subclass.

from typing import OrderedIntersection as OI

class X:
    foo: int

class Y:
    foo: str

def takes_bad_intersect(x: OI[X, Y]):
    ...

class UserAttempt(X, Y):  # error here
    pass

This time the user will get a type checker error no matter how they try to construct such a class. They never even need to reach the point where assignment to the value annotated with the intersection happens.

@DiscordLiz
Copy link
Collaborator

@mark-todd I don't think what was proposed has any relaxed conditions or even special cases at all.

The simplest way I can express this while remaining accurate, is that a value expression is consistent with an ordered intersection as a type specification* if it is consistent with all of the operands of the ordered intersection as type specifications.

a) We relax the inheritance restriction

There isn't actually an inheritance restriction, there's a subtyping consistency restriction. The examples given for how someone could think about it included inheritence for nominal types, and structural compatability for structural types because both of those are the respective ways to check for subtyping consistency for those kinds of type specifications

* I'm re-using a definition from this comment #41 (comment) for "type specification", I don't know if I like the name of the term, but I don't care enough to bikeshed the name.

@mark-todd
Copy link
Collaborator

The option you're missing is the current option as proposed by OrderedIntersection: Don't ban anything in intersection, don't detect incompatibility in intersection, it will be detected when someone actually tries to make the type or an instance of the type if it is actually incompatible

@mikeshardmind Yeah sorry this is just what I meant by option 1, but the part I missed was a solution c). I see what you're getting at - just because you can make an intersection, doesn't mean that intersection can exist in practice. So we say effectively if you want to set your function interface as OI[int,str], sure go for it, but good luck using it! I'm trying to think if there is precedent for this one way or the other in existing python. I think in practice this will come up very little for the base types, so I don't see an issue with allowing OI[int,str].

I think in regards Protocol and TypedDict we'll have to consider these separately.

OrderedIntersection is not a type and does not create a type

I sense this is important but I'm not sure I follow you here - I'm imagining at runtime print(OrderedIntersection[A,B]) would generate something like typing.OrderedIntersection[A,B] - an instance of the class, and type of OrderedIntersection. This is how all types currently work in python - I don't really see how it could not be a type?

There isn't actually an inheritance restriction, there's a subtyping consistency restriction.

@DiscordLiz Ok, so if I'm understanding correctly I think being a subtype means, for non-protocol classes, this inherits from it, and for protocol classes, this follows the specified structure? Apologies if I missing something here. If this is correct, that resolves the Protocol case, OI[int,str], which just leaves us with TypedDict.

TypedDict is a non-protocol class, so that means OI[SomeTypedDict, A] where A is just a regular class, and must inherit from both A and SomeTypedDict, assuming I've understood the subtype definition correctly. The use cases that @CarliJoy described were using a cast to change the return type to this, so although we can't actually create a class that would satisfy this definition (which does somewhat bother me) I suppose this is still useful for the intended purpose?

I suppose one thing about this that perplexes me is we're effectively saying "Yes you can define an object to be of this type, but no that object can't actually exist, only via cast". I'm not sure this has ever been done before in python - typically if types are used incorrectly this is flagged, so it feels like we're breaking new ground here. That's not necessarily an argument against doing it, just an observation.

...unless TypedDict actually is a form of protocol class in some sense, in which case the inheritance rule doesn't apply and we're ok (ish).

@mikeshardmind
Copy link
Collaborator Author

TypedDict is a non-protocol class, so that means OI[SomeTypedDict, A] where A is just a regular class, and must inherit from both A and SomeTypedDict, assuming I've understood the subtype definition correctly.

It's a structural type similar to Protocols, but with different semantics to help with a few common cases involving heterogenous mappings. The bit about a cast shouldn't be needed (but there may be issues with typecheckers I haven't accounted for, just going off of what's specified here, if there's a type checker bug clouding this, we can raise that to their issue trackers)

@mikeshardmind
Copy link
Collaborator Author

I sense this is important but I'm not sure I follow you here - I'm imagining at runtime print(OrderedIntersection[A,B]) would generate something like typing.OrderedIntersection[A,B] - an instance of the class, and type of OrderedIntersection. This is how all types currently work in python - I don't really see how it could not be a type?

This may seem like it's needlessly semantic in nature, and it's collapsed as important, but tangential.

Quick preface

There are other issues relating to this. types have different meaning in an annotation and as a value, and some type system constructs in typing exist not to express a type, but to express expectations of a value from the context of an annotation.

This has come up in some other discussions, it's a known place where the type system's specifying language can be improved upon. Being able to differentiate between types as a runtime concept and types as a type system concept helps with making powerful generalizations that can remain consistent

Some types aren't types that you can just use to get a value

There are a lot of things in the type system like this where something is technically a type at runtime (due to all implementable runtime objects needing to be and python having runtime access to annotations and types), but the distinction of what it actually is to the type system differs from what it is at runtime, and matters for handling what rules apply to it and ensuring those rules compose appropriately. Most of these things are for expressing ways to compose simpler typing concepts.

  • typing.Union isn't a type system type, it's multiple types wrapped in a construct for the type system
  • typing.Concatenate isn't a type system type, it's a construct which has semantic meaning in the type system.
  • typing.Protocol is something we directly subclass to declare protocol types, but it isn't a type we expect to have instances of
  • Annotated isn't a type, but a means of attaching other data to a type
  • Never is a type, but you can't ever have a value at runtime of this type, and if your annotations are all correct, any code which infers a type of Never for an in-scope variable indicates unreachable code.
  • (And so on)

typing.OrderedIntersection should it happen will be similar. It will have runtime representation in typing, but this representation isn't something you expect to deal with except in runtime typing code, it's a way for the type system to be introspectable and work at runtime.

As a value, rather than as an annotation, these are typing special forms.

Taking it further though, Even basic types usually express consistency, not a specific type

If we take a look at the below:

variable: annotation = value_expression

typing.OrderedIntersection[T, U] in the place of an annotation doesn't specify an equivalent runtime type, it specifies expectations about the types that things assigned to the variable as a value expression should be consistent with.

typing.OrderedIntersection[T, U] as a value expression is clearly a runtime type, but with a completely different structure than what it expresses as a typing concept.

This is also consistent with simpler cases, but this notion of consistency is often substituted for the specific consistency check which applies to the current example.

var: list[int] = UserList()

while list[int] is a runtime type, as an annotation, it only specifies expectations that the values assigned to var must be consistent with the type list[int], types that are consistent with list[int] include infinitely many possible list subclasses.

@mark-todd
Copy link
Collaborator

It's a structural type similar to Protocols, but with different semantics to help with a few common cases involving heterogenous mappings.

@mikeshardmind Yeah that makes sense to me

The bit about a cast shouldn't be needed

Just wanted to elaborate on what I meant about this, because I think it's important to what I was describing about the drawbacks of the approach.

class A(TypedDict):
    x: int

class B:
    def foo(self):
        pass

x: OI[A,B]

What value can x be set to? My point was that there is (currently) no value that can be used here without use of a cast.

class C(A,B):
    pass

The above is banned, because you cannot inherit from A and B

class C(B):
    def __getitem__(self, key:str):
        if key == 'x':
            return 1
        else:
            raise KeyError

x: OI[A,B] = C()

Although technically correct, the above would also fail type checks I believe. Perhaps this is what you meant about the bug in type checkers? So that means we need to say:

x: OI[A,B] = cast(OI[A,B], C())

@CarliJoy
Copy link
Owner

CarliJoy commented Feb 13, 2024

Answer to @mark-todd

I suppose one thing about this that perplexes me is we're effectively saying "Yes you can define an object to be of this type, but no that object can't actually exist, only via cast". I'm not sure this has ever been done before in python - typically if types are used incorrectly this is flagged, so it feels like we're breaking new ground here. That's not necessarily an argument against doing it, just an observation.

Well that isn't accurate. This object exists with the exact signature that I will cast there. There is only no way at the moment to express this in Python typing at the moment.
As you can wrap other languages in Python and monkeypatch things, this happens a lot. And this is exactly on reason we need Intersections for.

Note: I really never will create a discussion group in github, it suck so much, that there aren't even threads here...

Btw: why do we discuss typed dicts again hre and not in #42, that is confusing...

@mark-todd
Copy link
Collaborator

This object exists with the exact signature that I will cast there.

@CarliJoy I see what you're getting at, but I'm not sure this is strictly true - for it to have exactly the same signature it would have to also have methods for .values .items and all the other things that come with a dict. Even if we included all those, the object could still fail a check of isinstance(x, dict) which isn't good. I'm wondering if some of this could be solved by inheriting from dict.

There is only no way at the moment to express this in Python typing at the moment.

I still think the Unpack[SomeTypedDict] as the return type of __getitem__ is really what you're looking for (or some similar syntax), but appreciate that doesn't exist at the moment.

As you can wrap other languages in Python and monkeypatch things, this happens a lot. And this is exactly on reason we need Intersections for.

It's interesting, I hadn't really thought about it this way, but I take the point. I suppose this is the view that Intersection is a good stopgap for other missing language features (like the one I describe above) or perhaps adding some typing to poorly typed libraries like pandas.

Btw: why do we discuss typed dicts again hre and not in #42, that is confusing...

Apologies, that's on me - I had a more general comment, that then developed into a specific discussion again - I find it quite difficult to separate these topics sometimes, but happy to comment in the other thread if that's easier.

@mikeshardmind
Copy link
Collaborator Author

mikeshardmind commented Feb 14, 2024

I'd personally find it easier if we didn't try to split every single line of discussion to another issue in the first place, there's not really a good way to do that with github being used for this, and a lot of discussion seems to be split off while being relevant to the initial topic and not something that needs to be brought back and forth.

I'm wondering if some of this could be solved by inheriting from dict.

TypedDict doesn't guarantee inhereiting from dict as it exists in the specification, and you aren't allowed to use methods like .pop It's a structural type, so the isinstance check against dict doesn't actually say if it's a TypedDict or consistent with specific TypedDict

The more we discuss this, the more I think we actually do need a section on TypedDict to point out that prior to intersections, TypedDict was assumed by developers to only ever actually be a dict, but that it is possible for other things to be consistent with the structure it provides.

@mark-todd
Copy link
Collaborator

TypedDict doesn't guarantee inhereiting from dict as it exists in the specification, and you aren't allowed to use methods like .pop It's a structural type, so the isinstance check against dict doesn't actually say if it's a TypedDict or consistent with specific TypedDict

@mikeshardmind Oh interesting - ok in that case I think it's all fine. Are there any other structural types in python other than Protocol and TypedDict?

The more we discuss this, the more I think we actually do need a section on TypedDict to point out that prior to intersections, TypedDict was assumed by developers to only ever actually be a dict, but that it is possible for other things to be consistent with the structure it provides.

Yeah I agree - this was not clear to me at all previously (partially from the name)

@mikeshardmind
Copy link
Collaborator Author

I'm gonna close this, I think we're enough in consensus on the points here, that any further clarity can be handled when I finish the draft (hopefully this weekend, time permitting). Going through all of this and ensuring we don't need special cases, but are aware of anything that may seem strange and whether or not it causes problems for either type safety, consistency of the system, and composability with future type system additions is mostly done on my end now, and was able to result in much simpler PEP language after confirming TypedDicts shouldn't need special treatment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants