Change case class desugaring and decouple Products and name-based-pattern-matching #1938

OlivierBlanvillain · 2017-02-03T13:45:18Z

This PR changes the name based pattern matching the following way:

Eligibility condition is to extend a NameBasedPattern trait (instead of extending ProductN).
The pattern arity and types are determined by looking at implemented _1 to _N methods, where N is the arity of the last (subsequently) implemented method with such shape (N was before obtained from the ProductN superclass).

As a side effect, this PR lifts the 22 limitation on case classes.

@DarkDimius I remember you mentioning a change in this vein, what are you thoughts?

DarkDimius · 2017-02-03T13:52:39Z

While I like the marker-trait approach and I'd be in favor of it if we were designing a new language, I'd prefer if we don't go for it now.

The reason is that scalac actually supports name-based pattern matching, though it wasn't documented, and there are some code-bases that use it(e.g. parser combinators). The proposed scheme is incompatible between the two approaches.

The intermediate ground could be to temporarily support both under -language:Scala2 mode, but it would mean implementing both schemes, which would only increase complexity.

I think there's a possibility to remain compatible with scalac, by having a scheme that is very close to it, like the current one used by dotty. While I agree that it would be nice to simplify it, I believe, we should remain compatible.

DarkDimius · 2017-02-03T13:54:28Z

compiler/src/dotty/tools/dotc/ast/Desugar.scala

+    val caseClassMeths = {
+      def syntheticProperty(name: TermName, rhs: Tree) =
+        DefDef(name, Nil, Nil, TypeTree(), rhs).withMods(synthetic)
+      // The override here is less than ideal: user defined productArity / productElement


why? I think it's used a lot in userland and I don't (yet?) see a value in removing it.

DarkDimius · 2017-02-03T13:55:22Z

compiler/src/dotty/tools/dotc/ast/Desugar.scala

+      // The override here is less than ideal: user defined productArity / productElement
+      // methods would be silently ignored. This is necessary to compile `scala.TupleN`.
+      // The long term solution is to remove `ProductN` entirely from stdlib.
+      def productArity =


there's now a LOT more code synthesized per case-class. I'm worried about both the bytecode footprint and the runtime code size.

I'd prefer to keep all those methods in super-classes, as it makes it a lot easier for JIT to omptimize a callsite that sees products of the same arity. In your case, it would be non optimizable by contemporary VMs, to the best of my knowledge.

ping @sjrd I think this would also affect Scala.js size.

I think this is what scalac does, but I guess we could keep ProductN when N <= 22 and synthesize only after that.

Can someone summarize what the effective changes to generated code are, e.g., as snippets of Scala or Java code of what's generated per case class?

This is case class A[T1, T2](e1: T1, e2: T2) compiled by 2.12 vs compiled by dotty/master.

The PR replaces the replace the ProductN superclasses by Product which require synthesizing productElements, so the discussion is about reintroducing these lines in dotty.

OlivierBlanvillain · 2017-02-03T14:59:25Z

@DarkDimius I think I'm confusing names, if name based pattern matching refers to def unapply(a: A): A / isEmpty & get, then it's not be affected by this PR (and the trait should be called something else than NameBasedPattern). The changes of this PR affect the condition under which dotty does it's "case class" pattern matching, the one without allocation nor isEmpty check (is there a another name for this one?).

Let me summarize my understanding to be sure we are on the same page.

The following works as expected on scalac / dotty-master / this PR:

class A(e1: Int, e2: Int, e3: Int) {
  def isEmpty = false
  def get     = this
  def _1      = e1
  def _2      = e2
  def _3      = e3
}

object A {
  def unapply(a: A): A = a
}

Let's now consider case class A(e1: Int, e2: Int, e3: Int) desugared by the different compilers.

Desugared by scalac:

class A(e1: Int, e2: Int, e3: Int) extends Product // + case flag

object A {
  def unapply(a: A): Option[(Int, Int, Int)] = Some((a.e1, a.e2, a.e3))
}

Desugared by dotty/master:

class A(e1: Int, e2: Int, e3: Int) extends Product3[Int, Int, Int]

object A {
  def unapply(a: A): A = a
}

Desugared by this PR:

class A(e1: Int, e2: Int, e3: Int) extends Product with NameBasedPattern

object A {
  def unapply(a: A): A = a
}

Pattern matching on 1. with dotty compiles but goes through the unapply, thus allocating a Tuple3 and an Option (can be observed with val a :: b = List(1)). Pattern matching on 2. and 3. won't compile with scalac, and works as expected with dotty/master / this PR respectively.

So using a NameBasedPattern trait instead of ProductN to indicate "case class pattern matching" does not affect compatibility. Furthermore, adding isEmpty & get methods to the desugaring of case classes should be enough to achieve compatibility both from master and from here.

DarkDimius · 2017-02-03T15:26:15Z

compiler/src/dotty/tools/dotc/core/Definitions.scala

-    else -1
+  /** Is this type eligible for name based pattern matching?
+   *
+   *  That means either extending `scala.ProductN` or `NameBasedPattern`.


I'd capitalize OR.
It would be also nice to "one day" make scala.ProductN inherit NameBasedPattern.
That would simplify the rule even more.

DarkDimius · 2017-02-03T15:26:39Z

@OlivierBlanvillain yes, we didn't speak about the same thing indeed. Thanks for pointing it out.

if name based pattern matching refers to def unapply(a: A): A / isEmpty & get

Yes, name-based pattern-matching refers to unapply that doesn't return a Option.
scala/scala#2848 introduced it initially.

I'd propose to rename this pr to along the lines "change case class desugaring and decouple Products and name-based-pattern-matching".
I'd also propose to keep ProductN classes for the sake of bytecode size and reduction of number of possible virtual call targets. Ie, after my proposal the correct desugaring would be

class A(e1: Int, e2: Int, e3: Int) extends Product3[Int, Int, Int] with NameBasedPattern

This change would also remove most of the new code introduced in this PR as well as the workaround(I guess?).

OlivierBlanvillain · 2017-02-03T16:24:36Z

This change would also remove most of the new code introduced in this PR as well as the workaround(I guess?).

@DarkDimius We would still need it them for arity 23 and up. It makes sense to keep ProductN below that, I just need to add a test case to make sure that the synthesized code is covered :)

odersky · 2017-02-10T02:22:13Z

@OlivierBlanvillain Why the change? Is about the 22 restriction, or are there other reasons?

OlivierBlanvillain · 2017-02-10T10:28:29Z

@odersky Yes, it's about the 22 restriction. I need something like this for my hlist branch where I synthesize TupleN for N > 22 and erase them to arrays. An alternative would be to also synthesize & erase ProductN as needed.

odersky · 2017-02-10T18:13:46Z

So that means with the current PR case classes stop implementing ProductN traits once they have more than 22 parameters? in that case I'd prefer to generalize Product instead.

odersky · 2017-02-16T09:58:35Z

I like the idea of disentangling ProductN and name-based pattern matching, since the future status of ProductN is unclear. But I am reluctant to introduce yet another marker trait. Can we try instead with Product instead of ProductN?

OlivierBlanvillain · 2017-02-22T09:09:24Z

Things mostly work using scala.Product instead of a new marker trait for "product pattern matching" (the one tuples and other case classes that calls _i directly).

This is non trivial because of types such as Option and List that are already valid return type for unapplys but also extend Product, thus becoming candidate for two different patterns.

In the last commit I swapped the priority of isProductMatch and isGetMatch tests which is enough to make things work for Option and List. However it breaks the "Scala-parser-combinators use case" where a single extractor is used with two different type of patterns. I could not find a way to make this one work without breaking something else... Could we deprecate this corner case?

DarkDimius · 2017-02-22T09:29:19Z

In presence of both modes I would prefer the name-based patternatching to be preferred, as it is more efficient. I also think that we should not break source compatibility like the last change does. It makes it impossible to compile into efficient bytecode without source-incompatible changes to public api of the library. Its as if we are propagating the problem we didn't solve to library authors.

…

On 22 February 2017 10:09:29 am Olivier Blanvillain ***@***.***> wrote: Things mostly work using `scala.Product` instead of a new marker trait for "product pattern matching" (the one tuples and other case classes that calls `_i` directly). This is non trivial because of types such as `Option` and `List` that are already valid return type for unapplys but also extend `Product`, thus becoming candidate for two different patterns. In the last commit I swapped the priority of `isProductMatch` and `isGetMatch` tests which is enough to make things work for `Option` and `List`. However it breaks the "Scala-parser-combinators use case" where a single extractor is used with two different type of patterns. I could not find a way to make this one work without breaking something else... Could we deprecate this corner case? -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #1938 (comment)

odersky · 2017-02-22T09:53:53Z

I agree with @DarkDimius that I'd prefer product matches to take precedence over get matches.

This is non trivial because of types such as Option and List that are already valid return type for unapplys but also extend Product, thus becoming candidate for two different patterns.

I thought the test would be "extends Product and has the right number of _i selectors". As far as I can see, neither List nor Option has _i selectors?

OlivierBlanvillain · 2017-02-22T13:42:25Z

I've found a way to make Product work without breaking anything.

Actually List & Option where considered valid Product patterns with arity 0. A simple fix is to make the inequality on this line strict, which adds a "you need at least a _1" constraint.

odersky · 2017-02-22T15:14:06Z

Actually List & Option where considered valid Product patterns with arity 0.

Ouch. I had not thought of that. I guess that swings the balance in favor of using a new subtrait of Product as our marker instead. What's a good name for it, I wonder?

OlivierBlanvillain · 2017-02-24T09:50:14Z

If we ignore this zero arity case we could even avoid using a marker trait altogether by saying that the def _1 method serves as a marker for product pattern matching. It would be more consistent with the rest of the language which seams is mostly name based.

Also, it looks like scalac desugars case class C() with a def unapply(c: C): Boolean, could we do the same?

odersky · 2017-02-24T10:15:44Z

Sent from my iPhone

On 24 Feb 2017, at 10:50, Olivier Blanvillain ***@***.***> wrote: If we ignore this zero arity case we could even avoid using a marker trait altogether by saying that the def _1 method serves as a marker for product pattern matching. It would be more consistent with the rest of the language which seams is mostly name based. Also, it looks like scalac desugars case class C() with a def unapply(c: C): Boolean, could we do the same?

Yes let's do that. that means we can leave the other criterion to be "extends Product and has exactly the right number of constructors". I'd tend to still require Product for safety.

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/lampepfl/dotty","title":"lampepfl/dotty","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in ***@***.*** in #1938: If we ignore this zero arity case we could even avoid using a marker trait altogether by saying that the `def _1` method serves as a marker for product pattern matching. It would be more consistent with the rest of the language which seams is mostly name based.\r\n\r\nAlso, it looks like scalac desugars `case class C()` with a `def unapply(c: C): Boolean`, could we do the same?"}],"action":{"name":"View Pull Request","url":"#1938 (comment)"}}}

odersky · 2017-03-09T09:18:21Z

What's the status here?

OlivierBlanvillain · 2017-03-09T17:48:38Z

The PR is in sync with the latest comments:

The change in this line updates the isProductMatch condition to have a _1 method, a Product superclass, and a matching number of patterns.
Dotty already generates a def unapply(c: C): Boolean for zero arity case classes (see test case in latest commit)

The changes are mergeable from my side!

OlivierBlanvillain · 2017-03-16T13:48:06Z

Here another attempt to formalize Dotty's pattern-matching (base on #1805). This covers the Extractor Patterns section of the spec. Dotty support 4 different extractor patterns: Boolean Pattern, Product Pattern, Seq Pattern and Name Based Pattern.

Boolean Pattern

Extractor defines def unapply(x: T): Boolean
Pattern-matching on exactly 0 patterns

Product Pattern

Extractor defines def unapply(x: T): U
U <: Product (could be removed, kept for safety?)
N > 0 is the maximum number of consecutive (parameterless def or val) _1: P1 ... _N: PN members in U
Pattern-matching on exactly N patterns with types P1, P2, ..., PN

Seq Pattern

Extractor defines def unapplySeq(x: T): U
U has (parameterless def or val) members isEmpty: Boolean and get: S
S <: Seq[V]
Pattern-matching on any number of pattern with types V, V, ..., V

Name Based Pattern

Extractor defines def unapply(x: T): U
U has (parameterless def or val) members isEmpty: Boolean and get: S
If there is exactly 1 pattern, pattern-matching on 1 pattern with type S
Otherwise N > 1 is the maximum number of consecutive (parameterless def or val) _1: P1 ... _N: PN members in U
Pattern-matching on exactly N patterns with types P1, P2, ..., PN

In case of ambiguities, Product Pattern is preferred over Name Based Pattern.

OlivierBlanvillain · 2017-03-30T11:26:51Z

@DarkDimius Could you give your opinion on this last comment?

odersky · 2017-04-04T17:16:42Z

The spec looks accurate to me. One question concerns the almost equivalence of name-based and product patterns. Should we enforce that the result type of a get is a subtype of Product? In this case we could merge most of the two cases into one condition.

OlivierBlanvillain · 2017-04-05T07:38:12Z

Indeed that would be nice, either by adding a Product on one side or removing it on the other (the alternative being checking for the presence of a _1 member). I would lean towards removing Product altogether for the following reasons:

It shouldn't break existing code (adding a requirement would)
It completely solves the extends Product with Serializable problem
Product is a weird marker trait: it requires productElement & productArity which are not used for pattern matching
Less coupling with the stdlib

odersky · 2017-04-06T11:19:23Z

OK, let's drop Product from the name based matching rules. For the moment we should still generate a Product for case classes, because existing code uses productArity and productElement. But dropping Product from the pattern matching rules makes it easier to change that later.

Product pattern use to: - have a `<: Product` requirement - compute the arity of a pattern by looking at `N` in a `ProductN` superclass. This commit changes `<: Product`, instead we look for a `_1` member. The arity is determined by inspecting `_1` to `_N` members instead. --- Here another attempt to formalize Dotty's pattern-matching (base on scala#1805). This covers the *Extractor Patterns* [section of the spec](https://www.scala-lang.org/files/archive/spec/2.12/08-pattern-matching.html#extractor-patterns). Dotty support 4 different extractor patterns: Boolean Pattern, Product Pattern, Seq Pattern and Name Based Pattern. Boolean Pattern - Extractor defines `def unapply(x: T): Boolean` - Pattern-matching on exactly `0` patterns Product Pattern - Extractor defines `def unapply(x: T): U` - `N > 0` is the maximum number of consecutive (parameterless `def` or `val`) `_1: P1` ... `_N: PN` members in `U` - Pattern-matching on exactly `N` patterns with types `P1, P2, ..., PN` Seq Pattern - Extractor defines `def unapplySeq(x: T): U` - `U` has (parameterless `def` or `val`) members `isEmpty: Boolean` and `get: S` - `S <: Seq[V]` - Pattern-matching on any number of pattern with types `V, V, ..., V` Name Based Pattern - Extractor defines `def unapply(x: T): U` - `U` has (parameterless `def` or `val`) members `isEmpty: Boolean` and `get: S` - If there is exactly `1` pattern, pattern-matching on `1` pattern with type `S` - Otherwise fallback to Product Pattern on type `U` In case of ambiguities, *Product Pattern* is preferred over *Name Based Pattern*.

- t7296 & case-class-23 are moved out of pending - 1938 tests productElement > 23

OlivierBlanvillain · 2017-04-06T16:56:20Z

Updated to decouple Product and pattern-matching

odersky

Otherwise LGTM

odersky · 2017-04-10T08:50:37Z

compiler/src/dotty/tools/dotc/core/Definitions.scala

@@ -846,17 +848,9 @@ class Definitions {
  }

  def isProductSubType(tp: Type)(implicit ctx: Context) =
-    (tp derivesFrom ProductType.symbol) && tp.baseClasses.exists(isProductClass)
+    Applications.extractorMemberType(tp, nme._1).exists


The isProductSubType method should be renamed and moved to Applications.

odersky · 2017-04-10T08:53:41Z

compiler/src/dotty/tools/dotc/typer/Applications.scala

-    0 <= numArgs && numArgs <= Definitions.MaxTupleArity &&
-    tp.derivesFrom(defn.ProductNType(numArgs).typeSymbol)
+    numArgs > 0 && defn.isProductSubType(tp) &&
+    productSelectorTypes(tp).size == numArgs


Use productArity instead?

outdated

The change in question breaks the following pattern, commonly used in name based pattern matching: ```scala object ProdEmpty { def _1: Int = ??? def _2: String = ??? def isEmpty = true def unapply(s: String): this.type = this def get = this } ``` This type define both `_1` and `get` + `isEmpty` (but is not <: Product). After the changes in scala#1938 it becomes eligibles for both product and name based pattern. Because "in case of ambiguities, *Product Pattern* is preferred over *Name Based Pattern*", isEmpty wouldn't be used, breaking the scalac sementics.

The change in question broke the following pattern, commonly used in name based pattern matching: ```scala object ProdEmpty { def _1: Int = ??? def _2: String = ??? def isEmpty = true def get = this } ``` This type define both `_1` and `get` + `isEmpty` (but is not <: Product). After scala#1938, `ProdEmpty` became eligibles for both product and name based pattern. Because "in case of ambiguities, *Product Pattern* is preferred over *Name Based Pattern*", isEmpty wouldn't be used, breaking the scalac semantics.

OlivierBlanvillain force-pushed the named-based-patmat branch from b056cb2 to b5742d9 Compare February 3, 2017 13:51

DarkDimius previously requested changes Feb 3, 2017

View reviewed changes

DarkDimius reviewed Feb 3, 2017

View reviewed changes

OlivierBlanvillain changed the title ~~Named based patmat~~ Change case class desugaring and decouple Products and name-based-pattern-matching Feb 3, 2017

OlivierBlanvillain force-pushed the named-based-patmat branch from c53bdae to 6c56acd Compare February 3, 2017 21:48

OlivierBlanvillain force-pushed the named-based-patmat branch from f3b7cdd to 0a3fb0f Compare February 22, 2017 09:32

OlivierBlanvillain force-pushed the named-based-patmat branch 2 times, most recently from 721cc47 to b19b134 Compare February 22, 2017 12:48

OlivierBlanvillain force-pushed the named-based-patmat branch 2 times, most recently from 70c4083 to 46c680c Compare March 9, 2017 17:38

OlivierBlanvillain force-pushed the named-based-patmat branch from 99a0913 to b4fbadf Compare March 30, 2017 19:41

felixmulder requested review from liufengyun and odersky April 4, 2017 09:47

OlivierBlanvillain added 5 commits April 6, 2017 18:28

Add {before,after}-pickling.txt to gitignore

0378330

Workaround scala#1932 (bug trigged by desugaring changes)

0ad5956

Generate synthetic productElement/productArity methods above 22

c0ff8ad

Add tests

5bf9d2b

- t7296 & case-class-23 are moved out of pending - 1938 tests productElement > 23

OlivierBlanvillain force-pushed the named-based-patmat branch from b4fbadf to 5bf9d2b Compare April 6, 2017 16:54

OlivierBlanvillain mentioned this pull request Apr 7, 2017

Redesign Tuples with HList-like structure #2199

Closed

4 tasks

odersky reviewed Apr 10, 2017

View reviewed changes

Move isProductSubType to Applications & rename to canProductMatch

198b5ce

OlivierBlanvillain force-pushed the named-based-patmat branch from 4699adb to 198b5ce Compare April 11, 2017 06:48

felixmulder merged commit 4868fb2 into scala:master Apr 11, 2017

felixmulder deleted the named-based-patmat branch April 11, 2017 10:01

OlivierBlanvillain mentioned this pull request Apr 13, 2017

Revert <: Product requierment in pattern matching #2249

Merged

senia-psm mentioned this pull request Nov 18, 2017

Add an error message for "invalid unapply return type" error. #3501

Merged

Change case class desugaring and decouple Products and name-based-pattern-matching #1938

Change case class desugaring and decouple Products and name-based-pattern-matching #1938

Uh oh!

Conversation

OlivierBlanvillain commented Feb 3, 2017

Uh oh!

DarkDimius commented Feb 3, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkDimius Feb 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

OlivierBlanvillain Feb 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

OlivierBlanvillain commented Feb 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkDimius commented Feb 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OlivierBlanvillain commented Feb 3, 2017

Uh oh!

odersky commented Feb 10, 2017

Uh oh!

OlivierBlanvillain commented Feb 10, 2017

Uh oh!

odersky commented Feb 10, 2017

Uh oh!

odersky commented Feb 16, 2017

Uh oh!

OlivierBlanvillain commented Feb 22, 2017

Uh oh!

DarkDimius commented Feb 22, 2017 via email

Uh oh!

odersky commented Feb 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OlivierBlanvillain commented Feb 22, 2017

Uh oh!

odersky commented Feb 22, 2017

Uh oh!

OlivierBlanvillain commented Feb 24, 2017

Uh oh!

odersky commented Feb 24, 2017 via email

Uh oh!

odersky commented Mar 9, 2017

Uh oh!

OlivierBlanvillain commented Mar 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OlivierBlanvillain commented Mar 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Boolean Pattern

Product Pattern

Seq Pattern

Name Based Pattern

Uh oh!

OlivierBlanvillain commented Mar 30, 2017

Uh oh!

odersky commented Apr 4, 2017

Uh oh!

OlivierBlanvillain commented Apr 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

odersky commented Apr 6, 2017

Uh oh!

DarkDimius Feb 3, 2017 •

edited

Loading

OlivierBlanvillain Feb 3, 2017 •

edited

Loading

OlivierBlanvillain commented Feb 3, 2017 •

edited

Loading

DarkDimius commented Feb 3, 2017 •

edited

Loading

odersky commented Feb 22, 2017 •

edited

Loading

OlivierBlanvillain commented Mar 9, 2017 •

edited

Loading

OlivierBlanvillain commented Mar 16, 2017 •

edited

Loading

OlivierBlanvillain commented Apr 5, 2017 •

edited

Loading