Skip to content

Type inference for union types is surprising #10108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
odersky opened this issue Oct 28, 2020 · 14 comments · Fixed by #10112
Closed

Type inference for union types is surprising #10108

odersky opened this issue Oct 28, 2020 · 14 comments · Fixed by #10112
Milestone

Comments

@odersky
Copy link
Contributor

odersky commented Oct 28, 2020

Minimized example

object Test1:
  val x: Int | String = 1
  val y = x
  val z: Int | String = y

Output

4 |  val z: Int | String = y
  |                        ^
  |                        Found:    (Test1.y : Any)
  |                        Required: Int | String

The problem is that a union type is widened at each inference step. Hence the declared type Int | String of x gets lost in the type of y.

This widening can lead to mysterious behavior in the type checker. Here's an example:

object Test2:
  type Sig = Int | String
  def consistent(x: Sig, y: Sig): Boolean = x == y

  def consistentLists(xs: List[Sig], ys: List[Sig]): Boolean =
       xs.corresponds(ys)(consistent)        // OK
    || xs.corresponds(ys)(consistent(_, _))  // error, found: Any, required: Int | String

The first corresponds tests succeeds, but if we eta-expand the argument to consistent(_, _) it fails.

Expectation

I hope we can come up with a more predictable type inference algorithm.

@sjrd
Copy link
Member

sjrd commented Oct 28, 2020

IMO the best strategy we've thought about in the past years is the widen-if-synthesized strategy: at the points where we currently widen stuff, widen only compiler-synthesized union types. If a union type was written down by the developer, it is not synthesized, and does not get widened. Whether a union is user-written or synthesized is stored in the UnionType (it does not need to be stored in TASTy, however: any union type read from TASTy can be considered user-written without loss of generality). When a user-written union type is carried around and used as the inferred type for something (like a val), we naturally reuse the same instance of UnionType and therefore that property is retained. This requires some infrastructure (an additional flag in UnionType, which must be taken into account for uniqueness but not for subtyping), but it would then also be applicable to singleton types.

In the example snippet above, all the union types come from a Sig and hence from the user-written Int | String. They would therefore never be widened.

At the last meeting we discussed another strategy, which is never to widen union types, and instead joining the types of the branches of ifs and matches. We would also probably have to widen when inferring type parameters with several constraints (such as several formal parameters of a type parameter T, when the actual arguments have different types A and B. With that strategy, the challenge is that If and Match nodes need to store their result type (in memory and in TASTy), otherwise we would need to specify the join algorithm in the specification of TASTy, and that is something that TASTy should not be concerned with (it is a type inference-related concern).

@odersky
Copy link
Contributor Author

odersky commented Oct 28, 2020

Here are some alternatives to the current behavior:

  1. Never widen union types. There are at least two problems with this approach:

    First, we might get very large union types that way. For instance, if we have a match with a 100 cases, each of which returns a different type. Large union types are difficult to read and can be very bad for the performance of type inferencing.

    Second, we deviate significantly from current Scala type inference. For instance ArrayBuffer(None, Some(1)) would have type ArrayBuffer[None | Some[Int]], which is incompatible with the currently inferred ArrayBuffer[Option[Int]].

  2. Widen union types if they are the result of a match or if-then-else or they are the element type of a sequence literal (i.e. the same places where we currently harmonize primitive types of constants). This solves the first problem above but not the second.

  3. Distinguish between "hard" union types A | B and "soft" union types. I'm using A \/ B for the latter, but assume it's strictly internal; that notation is not exposed to users. Hard union types arise from explicit type declarations and soft union types arise from lubs. We have A \/ B <: A | B. Only soft union types will be widened in inference steps.

If we go with (3), one tricky question is what to do with combinations of hard and soft unions. E.g. assuming A, B, C are all subclasses of D, what is the widened type of

(A | B) \/ C

? The most obvious answer is D, i.e. hard unions are dissolved when they are a part of a soft union. But there's precedent against that. If B is Null, we treat it specially (/cc @noti0na1 @olhotak)

 widen[ (A | Null) \/ B ]  =  widen[ A \/ B ] | Null  =  D | Null

So, if we do not want to treat Null specially here, how do we generalize this? One possibility is to treat A | B in an asymmteric way. If a union A | B_1 | ... | B_n is part of of a soft union, only A forms part of the join, and B_1, ..., B_n are pushed out, just like Null is being pushed out now. So that would mean that

widen[ (A | C) \/ (B | C) ]  =  widen[ A \/ B ] | C | C  =  D | C | C  =  D | C

I am assuming here that all types that get pushed out of the join are again combined with a lub, but one that yields a hard union, not a soft one.

At first, treating unions asymmetrically looks weird. But consider that these are all declared types, so programmerts have fine-grained control what behavior they intend. For instance, one could force a union type to be never widened by declaring it
as Nothing | A | B instead of just A | B.

@odersky
Copy link
Contributor Author

odersky commented Oct 28, 2020

@sjrd Our comments were written in parallel, but it seems they come to similar conclusions.

@olhotak
Copy link
Contributor

olhotak commented Oct 28, 2020

For unions with Null, I see two possibilities:

  1. Null is special and never widened, even in a soft union. The rationale is that T|Null widens to Any, which loses too much information. You might argue that String|Int also widens to Any. But a counterargument is that None|Some[Int] widens to Option[Int], which does not lose information. Furthermore, in the following example, I think I want the type of x to be inferred as String|Null rather than Any:
val x = if(???) "foo" else null

An analogue of None|Some[Int] widening to Option[Int] would be to have a type Nullable and define T|Null to widen to Nullable[T]. But I think we wanted the elegance of union types rather than introducing a new marker.

  1. We treat unions with Null the same as other unions, and in particular dissolve hard unions inside soft unions. Then we see what breaks and try to fix specific cases. The current decision to not widen Null was made before the idea of hard vs. soft unions. Maybe distinguishing hard vs. soft is sufficient or almost sufficient to make it possible to reverse that decision: maybe we can widen unions with Null when they are inside a soft union.

@soronpo
Copy link
Contributor

soronpo commented Oct 28, 2020

Scala's eager widening also hurts singleton values. Is there a common solution to both (singleton widening / union widening) problems or are they mutually exclusive? One thing I can see is that since union types are new, there is no harm in creating special rules for them, but a common solution with singleton types still needs to take into account backward compatibility or minimize unexpected change in behavior.

@LPTK
Copy link
Contributor

LPTK commented Oct 29, 2020

@olhotak This seems connected to the concept of super traits (see: #9028).

We should never widen to a super trait (which includes the traits like Product, Serializable, and IterableLike) since these traits are only used for "shaping" the inheritance hierarchy, but not generally used as the types of values and parameters.

Dotty already avoids widening to super traits: when it sees something like Product & Serializable, it treats it as Object & Product & Serializable and removes the super traits from that intersection, widening it to Object.

It would be great if Any and Object were also somehow avoided during widening. (Conceptually, consider them super traits too?) The rule would be: if widening leads to Any or Object, which means that there was no other LUB than Any or Object and super traits, then simply do not widen, and use a union instead.

I think in the most common cases, the typing of if-then-else and pattern-matching should not be affected, as branches usually have types like Some[...] and None, which would widen as before (to Option[...]). So I hope this would not create too many problems.

@odersky
Copy link
Contributor Author

odersky commented Oct 29, 2020

I fear not widening to Any would break too much code. There is perfectly acceptable code that works with Any-typed expressions.

odersky added a commit to dotty-staging/dotty that referenced this issue Oct 29, 2020
It turned out that it's generally clear whether an OrType should be soft or hard, so
this is a good first step to solve the problem.
@olhotak
Copy link
Contributor

olhotak commented Oct 29, 2020

For Null, a big part of the problem is that there is no "non-super trait" expressing that something is nullable. The only supertype of String | Null is Any. I didn't like the idea of Nullable[T] at first, but if unions are widened in many places, maybe having Nullable[T] is a good idea.

The question comes down to what we want the canonical notation to be: if we want people to generally write String | Null, then we should find ways to not widen the unions; if we decide we want people to generally write Nullable[String], as they do Option[String], then it's fine to widen much more often. That's a question of style: how often do we want union types in general to be written in Scala 3 code as opposed to named types like Nullable?

Then, I wonder if this could generalize to other union types. Union types would be soft by default, but type aliases of union types would be hard (maybe this is already the existing behaviour; maybe it removes the need for hard unions):

type IntOrString = Int | String

val x: Int | String = 1
val y: IntOrString = 2

val wx = x // wx: Any
val wy = y // wy: IntOrString

But then the recommended style must be to name your union types.

@sjrd
Copy link
Member

sjrd commented Oct 29, 2020

I would suggest to special-case Null on the grounds that it is special for erasure. Joining A | Null with B gives a more specific erasure if we special case Null so that it becomes C | Null instead of Any.

I'm afraid of the potential consequences of making unions asymmetric. The suggested Nothing | A | B thing seems weird to me, because I would expect that to be simplified to A | B at any unpredictable point, so I could not actually rely on that union not be widened.

@neko-kai
Copy link
Contributor

Hmm, would there possibly be an option to skip widening unions entirely, e.g. using a language import, or an -Xsource: flag?

There is perfectly acceptable code that works with Any-typed expressions.

I agree, and would like to add that there's perfectly acceptable code that works with fully inferred unions - and that may even work better, with less type annotations required. Having an option to disable widening would allow the community to experiment and refine the libraries and best practices to support and benefit from inferred union types.

odersky added a commit to dotty-staging/dotty that referenced this issue Oct 30, 2020
It turned out that it's generally clear whether an OrType should be soft or hard, so
this is a good first step to solve the problem.
odersky added a commit that referenced this issue Oct 31, 2020
Fix #10108: Distinguish between soft and hard union types
@odersky
Copy link
Contributor Author

odersky commented Oct 31, 2020

@neko-kai I think enabling changed inference behavior for union types under a flag would require a lot more experimentation. Without knowing whether the new behavior would be at least half-way reasonable, we should not add mode-switches like that to the compiler. So to get anywhere someone would have to fork the compiler, do the experiments, and, fi successful, convince us that it's worth it. But I can't see it getting in if it does not at least compile the standard library without errors.

@SethTisue
Copy link
Member

SethTisue commented Oct 31, 2020

I agree strongly that it's worth standing firm on mode-switches, as per scala/scala-dev#430

@smarter
Copy link
Member

smarter commented Nov 2, 2020

I agree, and would like to add that there's perfectly acceptable code that works with fully inferred unions - and that may even work better, with less type annotations required.

The major difficulty you'll run into is the interaction with implicit search: before implicit search, we instantiate all previously constrained type variables referred to in the type we're doing a search on (if we don't, we easily get too many results and thus ambiguous implicits), if we don't widen unions at that point, then we might end up doing a search for Monad[[X] =>> Left[X] | Right[X]] instead of Monad[Either], this won't succeed since Monad is invariant in its type parameter. Here's a minimal example illustrating the problem: https://github.com/lampepfl/dotty/blob/09461212179531cbe6dfe319137102c42cfd3d43/tests/pos/i7829.scala#L24-L25

@neko-kai
Copy link
Contributor

neko-kai commented Nov 3, 2020

@smarter
That's what I mean by community refining their approaches — in this case by adding more variance to typeclasses instead of keeping them invariant, for example Monad works well with full unions when defined like so:

trait Monad[-F[_], +F1[_]] {
  extension [A, B](fa: F[A]) def flatMap(afb: A => F[B]): F1[B]
  extension [A](a: A) def pure: F1[A]
}

See example https://scastie.scala-lang.org/YIpUUo6HRD6GiWH9CahQEA

Note that if you remove variance from the example, the syntax would still work, but not the explicit summon for the union type. I guess that's because unions are widened when resolving extension methods.

@Kordyjan Kordyjan added this to the 3.0.0 milestone Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants