Skip to content

Change case class desugaring and decouple Products and name-based-pattern-matching #1938

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 11, 2017

Conversation

OlivierBlanvillain
Copy link
Contributor

This PR changes the name based pattern matching the following way:

Eligibility condition is to extend a NameBasedPattern trait (instead of extending ProductN).
The pattern arity and types are determined by looking at implemented _1 to _N methods, where N is the arity of the last (subsequently) implemented method with such shape (N was before obtained from the ProductN superclass).

As a side effect, this PR lifts the 22 limitation on case classes.

@DarkDimius I remember you mentioning a change in this vein, what are you thoughts?

@DarkDimius
Copy link
Contributor

While I like the marker-trait approach and I'd be in favor of it if we were designing a new language, I'd prefer if we don't go for it now.

The reason is that scalac actually supports name-based pattern matching, though it wasn't documented, and there are some code-bases that use it(e.g. parser combinators). The proposed scheme is incompatible between the two approaches.

The intermediate ground could be to temporarily support both under -language:Scala2 mode, but it would mean implementing both schemes, which would only increase complexity.

I think there's a possibility to remain compatible with scalac, by having a scheme that is very close to it, like the current one used by dotty. While I agree that it would be nice to simplify it, I believe, we should remain compatible.

val caseClassMeths = {
def syntheticProperty(name: TermName, rhs: Tree) =
DefDef(name, Nil, Nil, TypeTree(), rhs).withMods(synthetic)
// The override here is less than ideal: user defined productArity / productElement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? I think it's used a lot in userland and I don't (yet?) see a value in removing it.

// The override here is less than ideal: user defined productArity / productElement
// methods would be silently ignored. This is necessary to compile `scala.TupleN`.
// The long term solution is to remove `ProductN` entirely from stdlib.
def productArity =
Copy link
Contributor

@DarkDimius DarkDimius Feb 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's now a LOT more code synthesized per case-class. I'm worried about both the bytecode footprint and the runtime code size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to keep all those methods in super-classes, as it makes it a lot easier for JIT to omptimize a callsite that sees products of the same arity. In your case, it would be non optimizable by contemporary VMs, to the best of my knowledge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @sjrd I think this would also affect Scala.js size.

Copy link
Contributor Author

@OlivierBlanvillain OlivierBlanvillain Feb 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is what scalac does, but I guess we could keep ProductN when N <= 22 and synthesize only after that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can someone summarize what the effective changes to generated code are, e.g., as snippets of Scala or Java code of what's generated per case class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is case class A[T1, T2](e1: T1, e2: T2) compiled by 2.12 vs compiled by dotty/master.

The PR replaces the replace the ProductN superclasses by Product which require synthesizing productElements, so the discussion is about reintroducing these lines in dotty.

@OlivierBlanvillain
Copy link
Contributor Author

OlivierBlanvillain commented Feb 3, 2017

@DarkDimius I think I'm confusing names, if name based pattern matching refers to def unapply(a: A): A / isEmpty & get, then it's not be affected by this PR (and the trait should be called something else than NameBasedPattern). The changes of this PR affect the condition under which dotty does it's "case class" pattern matching, the one without allocation nor isEmpty check (is there a another name for this one?).

Let me summarize my understanding to be sure we are on the same page.

The following works as expected on scalac / dotty-master / this PR:

class A(e1: Int, e2: Int, e3: Int) {
  def isEmpty = false
  def get     = this
  def _1      = e1
  def _2      = e2
  def _3      = e3
}

object A {
  def unapply(a: A): A = a
}

Let's now consider case class A(e1: Int, e2: Int, e3: Int) desugared by the different compilers.

  1. Desugared by scalac:
class A(e1: Int, e2: Int, e3: Int) extends Product // + case flag

object A {
  def unapply(a: A): Option[(Int, Int, Int)] = Some((a.e1, a.e2, a.e3))
}
  1. Desugared by dotty/master:
class A(e1: Int, e2: Int, e3: Int) extends Product3[Int, Int, Int]

object A {
  def unapply(a: A): A = a
}
  1. Desugared by this PR:
class A(e1: Int, e2: Int, e3: Int) extends Product with NameBasedPattern

object A {
  def unapply(a: A): A = a
}

Pattern matching on 1. with dotty compiles but goes through the unapply, thus allocating a Tuple3 and an Option (can be observed with val a :: b = List(1)). Pattern matching on 2. and 3. won't compile with scalac, and works as expected with dotty/master / this PR respectively.

So using a NameBasedPattern trait instead of ProductN to indicate "case class pattern matching" does not affect compatibility. Furthermore, adding isEmpty & get methods to the desugaring of case classes should be enough to achieve compatibility both from master and from here.

else -1
/** Is this type eligible for name based pattern matching?
*
* That means either extending `scala.ProductN` or `NameBasedPattern`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd capitalize OR.
It would be also nice to "one day" make scala.ProductN inherit NameBasedPattern.
That would simplify the rule even more.

@DarkDimius
Copy link
Contributor

DarkDimius commented Feb 3, 2017

@OlivierBlanvillain yes, we didn't speak about the same thing indeed. Thanks for pointing it out.

if name based pattern matching refers to def unapply(a: A): A / isEmpty & get

Yes, name-based pattern-matching refers to unapply that doesn't return a Option.
scala/scala#2848 introduced it initially.

I'd propose to rename this pr to along the lines "change case class desugaring and decouple Products and name-based-pattern-matching".
I'd also propose to keep ProductN classes for the sake of bytecode size and reduction of number of possible virtual call targets. Ie, after my proposal the correct desugaring would be

class A(e1: Int, e2: Int, e3: Int) extends Product3[Int, Int, Int] with NameBasedPattern

This change would also remove most of the new code introduced in this PR as well as the workaround(I guess?).

@OlivierBlanvillain OlivierBlanvillain changed the title Named based patmat Change case class desugaring and decouple Products and name-based-pattern-matching Feb 3, 2017
@OlivierBlanvillain
Copy link
Contributor Author

This change would also remove most of the new code introduced in this PR as well as the workaround(I guess?).

@DarkDimius We would still need it them for arity 23 and up. It makes sense to keep ProductN below that, I just need to add a test case to make sure that the synthesized code is covered :)

@odersky
Copy link
Contributor

odersky commented Feb 10, 2017

@OlivierBlanvillain Why the change? Is about the 22 restriction, or are there other reasons?

@OlivierBlanvillain
Copy link
Contributor Author

@odersky Yes, it's about the 22 restriction. I need something like this for my hlist branch where I synthesize TupleN for N > 22 and erase them to arrays. An alternative would be to also synthesize & erase ProductN as needed.

@odersky
Copy link
Contributor

odersky commented Feb 10, 2017

So that means with the current PR case classes stop implementing ProductN traits once they have more than 22 parameters? in that case I'd prefer to generalize Product instead.

@odersky
Copy link
Contributor

odersky commented Feb 16, 2017

I like the idea of disentangling ProductN and name-based pattern matching, since the future status of ProductN is unclear. But I am reluctant to introduce yet another marker trait. Can we try instead with Product instead of ProductN?

@OlivierBlanvillain
Copy link
Contributor Author

Things mostly work using scala.Product instead of a new marker trait for "product pattern matching" (the one tuples and other case classes that calls _i directly).

This is non trivial because of types such as Option and List that are already valid return type for unapplys but also extend Product, thus becoming candidate for two different patterns.

In the last commit I swapped the priority of isProductMatch and isGetMatch tests which is enough to make things work for Option and List. However it breaks the "Scala-parser-combinators use case" where a single extractor is used with two different type of patterns. I could not find a way to make this one work without breaking something else... Could we deprecate this corner case?

@DarkDimius
Copy link
Contributor

DarkDimius commented Feb 22, 2017 via email

@odersky
Copy link
Contributor

odersky commented Feb 22, 2017

I agree with @DarkDimius that I'd prefer product matches to take precedence over get matches.

This is non trivial because of types such as Option and List that are already valid return type for unapplys but also extend Product, thus becoming candidate for two different patterns.

I thought the test would be "extends Product and has the right number of _i selectors". As far as I can see, neither List nor Option has _i selectors?

@OlivierBlanvillain OlivierBlanvillain force-pushed the named-based-patmat branch 2 times, most recently from 721cc47 to b19b134 Compare February 22, 2017 12:48
@OlivierBlanvillain
Copy link
Contributor Author

I've found a way to make Product work without breaking anything.

Actually List & Option where considered valid Product patterns with arity 0. A simple fix is to make the inequality on this line strict, which adds a "you need at least a _1" constraint.

@odersky
Copy link
Contributor

odersky commented Feb 22, 2017

Actually List & Option where considered valid Product patterns with arity 0.

Ouch. I had not thought of that. I guess that swings the balance in favor of using a new subtrait of Product as our marker instead. What's a good name for it, I wonder?

@OlivierBlanvillain
Copy link
Contributor Author

If we ignore this zero arity case we could even avoid using a marker trait altogether by saying that the def _1 method serves as a marker for product pattern matching. It would be more consistent with the rest of the language which seams is mostly name based.

Also, it looks like scalac desugars case class C() with a def unapply(c: C): Boolean, could we do the same?

@odersky
Copy link
Contributor

odersky commented Feb 24, 2017 via email

@odersky
Copy link
Contributor

odersky commented Mar 9, 2017

What's the status here?

@OlivierBlanvillain OlivierBlanvillain force-pushed the named-based-patmat branch 2 times, most recently from 70c4083 to 46c680c Compare March 9, 2017 17:38
@OlivierBlanvillain
Copy link
Contributor Author

OlivierBlanvillain commented Mar 9, 2017

The PR is in sync with the latest comments:

  • The change in this line updates the isProductMatch condition to have a _1 method, a Product superclass, and a matching number of patterns.

  • Dotty already generates a def unapply(c: C): Boolean for zero arity case classes (see test case in latest commit)

The changes are mergeable from my side!

@OlivierBlanvillain
Copy link
Contributor Author

OlivierBlanvillain commented Mar 16, 2017

Here another attempt to formalize Dotty's pattern-matching (base on #1805). This covers the Extractor Patterns section of the spec. Dotty support 4 different extractor patterns: Boolean Pattern, Product Pattern, Seq Pattern and Name Based Pattern.

Boolean Pattern

  • Extractor defines def unapply(x: T): Boolean
  • Pattern-matching on exactly 0 patterns

Product Pattern

  • Extractor defines def unapply(x: T): U
  • U <: Product (could be removed, kept for safety?)
  • N > 0 is the maximum number of consecutive (parameterless def or val) _1: P1 ... _N: PN members in U
  • Pattern-matching on exactly N patterns with types P1, P2, ..., PN

Seq Pattern

  • Extractor defines def unapplySeq(x: T): U
  • U has (parameterless def or val) members isEmpty: Boolean and get: S
  • S <: Seq[V]
  • Pattern-matching on any number of pattern with types V, V, ..., V

Name Based Pattern

  • Extractor defines def unapply(x: T): U
  • U has (parameterless def or val) members isEmpty: Boolean and get: S
  • If there is exactly 1 pattern, pattern-matching on 1 pattern with type S
  • Otherwise N > 1 is the maximum number of consecutive (parameterless def or val) _1: P1 ... _N: PN members in U
  • Pattern-matching on exactly N patterns with types P1, P2, ..., PN

In case of ambiguities, Product Pattern is preferred over Name Based Pattern.

@OlivierBlanvillain
Copy link
Contributor Author

@DarkDimius Could you give your opinion on this last comment?

@odersky
Copy link
Contributor

odersky commented Apr 4, 2017

The spec looks accurate to me. One question concerns the almost equivalence of name-based and product patterns. Should we enforce that the result type of a get is a subtype of Product? In this case we could merge most of the two cases into one condition.

@OlivierBlanvillain
Copy link
Contributor Author

OlivierBlanvillain commented Apr 5, 2017

Indeed that would be nice, either by adding a Product on one side or removing it on the other (the alternative being checking for the presence of a _1 member). I would lean towards removing Product altogether for the following reasons:

  • It shouldn't break existing code (adding a requirement would)
  • It completely solves the extends Product with Serializable problem
  • Product is a weird marker trait: it requires productElement & productArity which are not used for pattern matching
  • Less coupling with the stdlib

@odersky
Copy link
Contributor

odersky commented Apr 6, 2017

OK, let's drop Product from the name based matching rules. For the moment we should still generate a Product for case classes, because existing code uses productArity and productElement. But dropping Product from the pattern matching rules makes it easier to change that later.

Product pattern use to:
- have a `<: Product` requirement
- compute the arity of a pattern by looking at `N` in a `ProductN` superclass.

This commit changes `<: Product`, instead we look for a `_1` member. The arity is determined by inspecting `_1` to `_N` members instead.

---

Here another attempt to formalize Dotty's pattern-matching (base on scala#1805). This covers the *Extractor Patterns* [section of the spec](https://www.scala-lang.org/files/archive/spec/2.12/08-pattern-matching.html#extractor-patterns). Dotty support 4 different extractor patterns: Boolean Pattern, Product Pattern, Seq Pattern and Name Based Pattern.

Boolean Pattern

- Extractor defines `def unapply(x: T): Boolean`
- Pattern-matching on exactly `0` patterns

Product Pattern

- Extractor defines `def unapply(x: T): U`
- `N > 0` is the maximum number of consecutive (parameterless `def` or `val`) `_1: P1` ... `_N: PN` members in `U`
- Pattern-matching on exactly `N` patterns with types `P1, P2, ..., PN`

Seq Pattern

- Extractor defines `def unapplySeq(x: T): U`
- `U` has (parameterless `def` or `val`) members `isEmpty: Boolean` and `get: S`
- `S <: Seq[V]`
- Pattern-matching on any number of pattern with types `V, V, ..., V`

Name Based Pattern

- Extractor defines `def unapply(x: T): U`
- `U` has (parameterless `def` or `val`) members `isEmpty: Boolean` and `get: S`
- If there is exactly `1` pattern, pattern-matching on `1` pattern with type `S`
- Otherwise fallback to Product Pattern on type `U`

In case of ambiguities, *Product Pattern* is preferred over *Name Based Pattern*.
- t7296 & case-class-23 are moved out of pending

- 1938 tests productElement > 23
@OlivierBlanvillain
Copy link
Contributor Author

Updated to decouple Product and pattern-matching

Copy link
Contributor

@odersky odersky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

@@ -846,17 +848,9 @@ class Definitions {
}

def isProductSubType(tp: Type)(implicit ctx: Context) =
(tp derivesFrom ProductType.symbol) && tp.baseClasses.exists(isProductClass)
Applications.extractorMemberType(tp, nme._1).exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The isProductSubType method should be renamed and moved to Applications.

0 <= numArgs && numArgs <= Definitions.MaxTupleArity &&
tp.derivesFrom(defn.ProductNType(numArgs).typeSymbol)
numArgs > 0 && defn.isProductSubType(tp) &&
productSelectorTypes(tp).size == numArgs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use productArity instead?

@felixmulder felixmulder merged commit 4868fb2 into scala:master Apr 11, 2017
@felixmulder felixmulder deleted the named-based-patmat branch April 11, 2017 10:01
OlivierBlanvillain added a commit to dotty-staging/dotty that referenced this pull request Apr 13, 2017
The change in question breaks the following pattern, commonly used in name based pattern matching:

```scala
object ProdEmpty {
  def _1: Int = ???
  def _2: String = ???
  def isEmpty = true
  def unapply(s: String): this.type = this
  def get = this
}
```

This type define both `_1` and `get` + `isEmpty` (but is not <: Product). After the changes in scala#1938 it becomes eligibles for both product and name based pattern. Because "in case of ambiguities, *Product Pattern* is preferred over *Name Based Pattern*", isEmpty wouldn't be used, breaking the scalac sementics.
OlivierBlanvillain added a commit to dotty-staging/dotty that referenced this pull request Apr 13, 2017
The change in question broke the following pattern, commonly used in name based pattern matching:

```scala
object ProdEmpty {
  def _1: Int = ???
  def _2: String = ???
  def isEmpty = true
  def get = this
}
```

This type define both `_1` and `get` + `isEmpty` (but is not <: Product). After scala#1938, `ProdEmpty` became eligibles for both product and name based pattern. Because "in case of ambiguities, *Product Pattern* is preferred over *Name Based Pattern*", isEmpty wouldn't be used, breaking the scalac semantics.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants