Skip to content

SIP-43 - Pattern matching with named fields #44

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from
Closed

Conversation

Jentsch
Copy link

@Jentsch Jentsch commented Jun 16, 2022

State of implementation is here: scala/scala3#15437

@julienrf julienrf self-assigned this Jun 16, 2022
@julienrf julienrf assigned raulraja and sjrd and unassigned julienrf Jul 1, 2022
@julienrf julienrf changed the title SIP for pattern matching with named fields SIP-43 - Pattern matching with named fields Jul 1, 2022
Copy link
Member

@sjrd sjrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some first comments, without digging too much into the details.

case User(age = _) => "Just wanted to use the extractor, lol!"
```

### Reordering of user code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is an issue, because Extractors are supposed to be pure. This is already specified today: the pattern matcher is under no obligation to respect the order in which extractors are evaluated, nor how many times they will be evaluated, if any. I don't think it's a problem to extend that requirement to the def _x accessors.

Comment on lines 164 to 165
Whenever a single named argument is used in pattern, the pattern can have fewer arguments than the unapply provides. This is driven by the motivation the make pattern matching extensible.
But this leads to (arguably small) inconsistency, as pointed out by Lionel Parreaux in the [Scala Contributors Thread](https://contributors.scala-lang.org/t/pattern-matching-with-named-fields/1829/44). This could lead users to use a named pattern, just to skip all parameters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular aspect troubles me. This will cause to silently ignore fields if we forget some.

I strongly believe we should have explicit syntax to deliberately ignore the remaining fields, if that's what we want. Perhaps something like

case User(name, city = c, _*) =>

with the generalized understanding that _* deliberately ignores what's left, whether it's single fields or variadic fields.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sjrd I did not understand what troubles you, normally one destructures to use the extracted value, so what is a case in which forgetting fields causes an issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, refactoring.

If I already have some case User(name, city = c) => in my code, and then later I change the definition of User to add a new field, there is a pretty good chance that there are at least some case User(...) that will have to take this field into account. In many cases, for me, most if not all of my case User will need to take it into account. I want the compiler to help me with that refactoring. If we silently ignore any omitted field, refactoring becomes risky.

Copy link
Author

@Jentsch Jentsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sjrd Thank you very much for your review. Now I have something to chew on.


If a pattern with a name is encountered, the compiler looks up list of provided names and places the trees accordingly.

The list of names either provided by the return type of the unapply method or by the constructor list of the case class.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would simplify things. But I have two concerns:

  • This would change the signature of all generated unapplys. For most cases this should be fine, because the new type is more specific than the old type. I'm unsure about libraries, like slick, which rely on the unapply method. But thats why we have the community build, right?
  • The current implementation would retroactively add named pattern for libraries compiled with an older scala version. That's maybe a good thing.

I'll give a try and see how much breaks.

@dwijnand
Copy link
Member

Thanks @Jentsch for the SIP and impl proposal! This would be an exciting feature to have in the language.

Here's a concern I've read about way back when (https://grokbase.com/t/gg/scala-debate/12ad0n33b0/default-and-named-arguments-in-extractors#20121015a7udamnwcyasnybzqfsq7kmvlq) about this, which I'll reproduce:

One potential point of contention: should a missing parameter with a default argument be treated as the _ pattern, or as equal to the default arg? I can see the intuitiveness of _, but I wonder if the broken symmetry is worth it:

case class Foo(a: Int, b: Int = 42)

assert(Foo(a = 1) == Foo(a = 1, b = 42))
assert(Foo(a = 1) != Foo(a = 1, b = 0))

val Foo(a = x) = Foo(a = 1, b = 42)
val Foo(a = x) = Foo(a = 1, b = 0)

I'm really on the fence on this one. If we can find one, I think it would be good to find a way to break this similarity/difference. I'm thinking maybe with some sort of "and ignore the rest" syntax, as in Foo(a = x, <ignore the rest>), but what to use? Foo(a = x, _) looks like one element, Foo(a = x, _*), looks like the last varargs element. So must it be a new token in the language?

@sjrd
Copy link
Member

sjrd commented Jul 11, 2022

Foo(a = x, _*), looks like the last varargs element

IMO this is a plus. The varargs element can also be read as "and ignore the rest". So we're truly generalizing it; not inventing a different concept.

@dwijnand
Copy link
Member

dwijnand commented Jul 11, 2022

Foo(a = x, _*), looks like the last varargs element

IMO this is a plus. The varargs element can also be read as "and ignore the rest". So we're truly generalizing it; not inventing a different concept.

Yeah, I buy that. Nice. So that would mean:

case class Foo(a: Int, b: Int = 42)

assert(Foo(a = 1) == Foo(a = 1, b = 42))
assert(Foo(a = 1) != Foo(a = 1, b = 0))
 
val Foo(a = x, _*) = Foo(a = 1, b = 42)
val Foo(a = x, _*) = Foo(a = 1, b = 0)
// and
val Foo(a = x, b = y) = Foo(a = 1, b = 0)
val Foo(a = x) = Foo(a = 1, b = 0) // error

@julienrf
Copy link
Contributor

is it easy to ship / package branches for experiments?

The easiest way is to locally build and publish the compiler, and then use it.

Otherwise, you have to convince the compiler team to merge your feature behind a flag.

@chrisandrews-ms
Copy link

I suspect that application with a rich model of the real world would benefit more.

Proprietary example

In the (closed source, Scala 2) application I work on, we have >1,200 occurrences of four consecutive placeholders in pattern matches. Here's an example monstrosity (class names altered to avoid even the tiniest leakage of proprietary code, but syntax unchanged):

case Foo(Bar(_, Baz(_, _, _, _, _, _, _), _, _, , _, true, _), _, v) =>

With the proposed syntax including _* to indicate missing fields, it would be:

case Foo(thing = Bar(flavor = Baz(_*), switch = true, _*), value = v, _*) =>

I think it's much clearer, especially since there are multiple Boolean parameters to Foo and it's very helpful as a reviewer to know which one is being checked.

If we don't use _* to indicate missing fields, it would be:

case Foo(thing = Bar(flavor = Baz(_, _, _, _, _, _, _), switch = true), value = v) =>

Now Foo and Bar are more readable but don't have a nice way to write "any Baz". Perhaps we'd just write:

case Foo(thing = Bar(flavor = _: Baz, switch = true), value = v) =>

(And of course we could have done that for Baz even without this SIP). But people are reluctant to do that because we all know that type tests are a little bit Javaish, and of course it wouldn't work if Baz was some custom extractor.

Open source example

I found plenty of examples of multiple placeholders in this dusty old compiler 😄

Here's one example:

case ctor @ DefDef(_, _, _, vparamss, _, cbody @ Block(cstats, cunit)) =>

Arguably it's fairly readable already to anyone who works on the compiler because vparamss matches the name of the field, and we know Block is a tree so that last part must be the rhs. But I still think it's clearer as:

case ctor @ DefDef(vparamss = vparamss, rhs = cbody @ Block(cstats, cunit)) =>

Hypothetical example:

It's not that unusual to have multiple fields with the same type, e.g. suppose we have case class Color(red: Double, green: Double, blue: Double), then case Color(1.0, 1.0, 0.0) conveys a lot less information than case Color(red = 1.0, green = 1.0, blue = 0.0). After all, if we don't think that names are important to see at usage sites, why do we have named parameters for method calls? I guess it could be argued that we should separate introduce types for Red, Green, and Blue but it seems a little overkill.

@gabro
Copy link

gabro commented Aug 19, 2022

Quite a few examples in the wild in Metals and Scalameta

This is a very similar domain as @chrisandrews-ms's compiler example, since it's common to match on big case classes when dealing with compiler internals.

Another egregious example I've found in my own code:

    assert(api.routes.collectFirst {
      // format: off
      case Route(_, List(
        RouteSegment.String("campings"),
        RouteSegment.String("getByCoolnessAndSize"),
      ), _, _, _, _, _, _, _, _, _) => ()
      // format: on
    }.isDefined)

(source)

It's quite telling that I had to turn off scalafmt for this particular snippet, or else each _, would have ended up on a separate line.

In proprietary code, I've surely seen similar examples when dealing with database records which tend to be big case classes.

@sjrd
Copy link
Member

sjrd commented Nov 18, 2022

Hey, sorry for the radio silence here. I've been meaning to reply here for a while, but never got around to doing it.

We had an offline discussion a while ago with @odersky on how to make progress here. Some principles that guided the reasoning were:

  • We should not compromise on the property of case classes that they can be explained as a desugaring into regular extractors; so no case class-only magic.
  • We don't want annotations to affect type checking.

I also rehashed some of the ideas with @adpi2 this morning.

The conclusion I've reached from all of that is that we could design things as follows:

  • It's based on the idea of case vals I exposed earlier. They are looked up within the type U (Product-based) or S (name-based) used in the reference.
  • We would probably need a different modifier than case, because it would be very confusing that a case val and a case object don't mean the same thing at all. I'll keep using case for now.
  • Stick to the order and number of case definitions in a class to match them up with positional arguments.
  • If a class does not define _N fields, it cannot be extracted with positional arguments; it can only be extracted with named arguments.
  • If it does define _N fields, the number of case defs must match the number of _N fields, and their types must corresponding pairwise. This way, we know the link between both for the purposes of exhaustivity.
  • The idea of "deprecated case defs" is dropped. It's still possible to keep a former case def as an @deprecated non-case def to preserve binary and TASTy compatibility; but source compatibility is broken in that situation.
  • Named arguments coming before positional arguments in a pattern must correspond to the positional arguments that would be positioned where they are.
  • A named argument cannot be used for the variadic components of a variadic extractor.

All of that is orthogonal to whether patterns should enforce exhaustivity by default or not. I have nothing to say about that that I haven't said before.

Example:

class User(case val lastName: String, case val age: Int, case val city: String) extends Product:
  def _1: String = lastName
  def _2: Int = age
  def _3: String = city

object User:
  def unapply(u: User): User = u // Product-based extractor

x match
  case User(n, a, c) => // all positional
  case User(lastName = n, age = a, city = c) => // all named
  case User(age = a, city = c, lastName = n) => // all named, order changed
  case User(n, age = a, c) => // mix; named arg 'age' is used at its corresponding position
  case User(n, city = c, a) => // illegal; named arg city comes before positional arg 'a' and does not correspond to its position

If later, we want to deprecate lastName in favor of name, we can do:

class User(case val name: String, case val age: Int, case val city: String) extends Product:
  def _1: String = name
  def _2: Int = age
  def _3: String = city

  @deprecated
  def lastName: String = name
end User

while retaining backward binary and TASTy compatibility.

Some illegal definitions:

// Not same number of `case` defs versus `_N` defs
class User(case val name: String, case val age: Int):
  def _1: String = name
  def _2: Int = age
  def _3: String = city

// Type mismatch between `case` def and `_N` def
class User(case val name: String, case val age: Int):
  def _1: Int = age
  def _2: String = name

@soronpo
Copy link
Contributor

soronpo commented Nov 18, 2022

  • We would probably need a different modifier than case, because it would be very confusing that a case val and a case object don't mean the same thing at all. I'll keep using case for now.

How about product val? (Soft keyword)

@lrytz
Copy link
Member

lrytz commented Dec 16, 2022

@sjrd IIUC, in your proposal, fields cannot be omitted in a match? To me, allowing to skip the _s is probably the most valueable part of this SIP.

@sjrd
Copy link
Member

sjrd commented Dec 16, 2022

We can omit fields, but only if we add a , _* at the end. Which we can do whether we use named arguments or not, by the way.

I really don't want to loose the safety of matching of all fields just because I write an explicit name for a Boolean field.

@Jentsch
Copy link
Author

Jentsch commented Feb 3, 2023

I'm also sorry for the radio silence. Here a bulk of thoughts:

Keywordwise: Could we use match as modifier keyword? Like:

class User(match val name: String, ...)

I'll using that syntax for now, to see how much it hurts the eyes.

Pairing the name and the position by their position seems a brittle. Checking the types helps, but even the User example here has two fields with the same type and boolean fields are another driver for this SIP. It also ties constructor parameters to the number of the fields in the pattern, which is maybe not always wanted. (I'm unsure about it.)

So here an alternative: If a match def or match val just references an other match it becomes an alias of the referenced definition. This is position independend and allows multiple match def to references the same field, including deprecated ones. (The main motivation here is to make the schema position independend, deprecated names are just a benefit.)

class User(match val name: String, match val age: Int, match val city: String):
  def _1: String = name
  def _2: Int = age
  def _3: String = city

  @deprecated
  match def lastName: String = name

The first position (_1) and lastName is an alias for name, because it's just calling name / this.name. We could a other modifier to make this rule more explicit, but that becomes a bit verbose:

class User(match val name: String, ...):
  alias def _1: String = name
  @deprecated
  alias match def lastName: String = name
  • If a class does not define _N fields, it cannot be extracted with positional arguments; it can only be extracted with named arguments.

Regardless how we recognize aliases, with the option of having fields without corresponding positional arguments we need have to think about the space engine for exhaustiv checking of patterns. Would it be okay to leave here room for future improvement?

class User(match val name: String, ...):
  match def firstName: String = name.split(" ")(0)

user match:
  User(firstName = "Anna", ...) => // is equivalent to
  u @ User(...) if u == "Anna" =>  // so exhaustiv checking is limited
  • Named arguments coming before positional arguments in a pattern must correspond to the positional arguments that would be positioned where they are.
  • A named argument cannot be used for the variadic components of a variadic extractor.

@sjrd What the reasoning for the first rule? Especially because it seems to enforce the second rule / limitation.

Regarding using the token _*: I'm still skeptical about it, because of its two different meanings at the same position. Maybe we can use the same token as Rust here: ..?

And now to the spicy bit: exhaustivity of names

I have a question: If we make _* mendatory, we can still phase out the restriction in a later release? If yes, we could make it part of the SIP and then wait for feedback from the community using the feature.

Or another approach: Instead of deciding at the usage side, would it be better to decide on the declaration side? Like:

object User:
  match_all_names def unapply(user: User): User = user

@gabro, @chrisandrews-ms, @soronpo: Sorry for not coming back to your examples. I couldn't come up with a questionary that does @sjrd concerns justice. I think every short self containing example is extremly biased against enforcing the usage of all names.

@julienrf julienrf assigned smarter and unassigned sjrd Feb 6, 2023
@julienrf julienrf requested a review from smarter February 6, 2023 09:24
@julienrf
Copy link
Contributor

julienrf commented Feb 6, 2023

The reviewers of this proposal are now @smarter, @raulraja, and @chrisandrews-ms. I’ve removed @sjrd from the team of reviewers because he will work actively on the proposal with @Jentsch.

@anatoliykmetyuk
Copy link
Contributor

anatoliykmetyuk commented Aug 23, 2023

@sjrd any progress on this one? I think it would be good to discuss this on the next SIP meeting.

@sjrd
Copy link
Member

sjrd commented Aug 23, 2023

Ouch ... another one that keeps slipping away. :(

Keywordwise: Could we use match as modifier keyword?

Why not? I don't think it collides with any existing syntax, since match cannot currently start a statement.

Pairing the name and the position by their position seems a brittle. [...]

So here an alternative: If a match def or match val just references an other match it becomes an alias of the referenced definition. This is position independend and allows multiple match def to references the same field, including deprecated ones. (The main motivation here is to make the schema position independend, deprecated names are just a benefit.)

The main issue with that is that it makes the body of a val/def part of the public API. In terms of library evolution, that is a big problem. We would need a publicly visible way to ensure that.

Regarding brittleness, perhaps we can enforce that if the positions match, then the bodies are aliasing. That enforces internal consistency, while still not exposing the body to the public API. It still does not allow deprecated things, however.

Regardless how we recognize aliases, with the option of having fields without corresponding positional arguments we need have to think about the space engine for exhaustiv checking of patterns. Would it be okay to leave here room for future improvement?

class User(match val name: String, ...):
  match def firstName: String = name.split(" ")(0)

user match:
  User(firstName = "Anna", ...) => // is equivalent to
  u @ User(...) if u == "Anna" =>  // so exhaustiv checking is limited

If we can indeed recognize aliases, I don't see why we would need to downgrade exhaustivity like that. What am I missing?

  • Named arguments coming before positional arguments in a pattern must correspond to the positional arguments that would be positioned where they are.
  • A named argument cannot be used for the variadic components of a variadic extractor.

@sjrd What the reasoning for the first rule? Especially because it seems to enforce the second rule / limitation.

The rationale is that it makes sure that a positional argument at position 2 unambiguously correspond to position 2 in the definitions. The same restriction exists when mixing named and positional arguments in calls:

scala> def foo(a: Int, b: Int, c: Int) = (a, b, c)
def foo(a: Int, b: Int, c: Int): (Int, Int, Int)
                                                                                               
scala> foo(1, b = 2, 3)
val res0: (Int, Int, Int) = (1,2,3)
                                                                                               
scala> foo(1, c = 2, 3)
-- [E171] Type Error: ----------------------------------------------------------
1 |foo(1, c = 2, 3)
  |^^^^^^^^^^^^^^^^
  |missing argument for parameter b of method foo: (a: Int, b: Int, c: Int): (Int, Int, Int)
1 error found

And now to the spicy bit: exhaustivity of names

I have a question: If we make _* mendatory, we can still phase out the restriction in a later release? If yes, we could make it part of the SIP and then wait for feedback from the community using the feature.

Yes, absolutely. We can keep it mandatory at this point, and it would be backward compatible to remove that restriction in the future if so desired.

@soronpo
Copy link
Contributor

soronpo commented Sep 19, 2023

Is it worth to introduce just _* as a separate SIP? IMO it's already worthwhile having, even without named parameters.

@anatoliykmetyuk
Copy link
Contributor

Since there hasn't been much activity on this one for a long time, we're withdrawing it. This feature, however, is something we're definitely interested in. So, if there's someone interested in picking this proposal up - feel free to reopen the PR or submit a new SIP from scratch!

@SethTisue
Copy link
Member

SethTisue commented Oct 24, 2023

@lrytz and @dwijnand are looking into it

@odersky
Copy link
Contributor

odersky commented Nov 22, 2023

Another scheme would be possible if we had named tuples.

Named pattern matches would be supported for

  • case classes
  • extractors that return named tuples

Example:

 object Address:
   def unapply(x: Any): Option[(city: String, zip: Int, street: String)]

   x match
     case Address(city = c, zip = z) => ...

I am against an explicit marker for named fields. Object deconstruction is usually by selecting fields, and there's no check that we have selected all the fields. Named pattern matches should be analogous. In my mind, it's a feature that we can add more fields to a class and not go through the ceremony of updating all patterns. In a case where we want to insist on complete matches, just define an unapply that returns a regular tuple. If that gets too large and you want to use names, I'd argue your spec is unreasonable. You should not have a great number of fields and at the same time insist that all fields are matched. So in balance I think I prefer partial named matches.

Note that F# does the same thing:

The record pattern is used to decompose records to extract the values of fields. The pattern does not have to reference all fields of the record; any omitted fields just do not participate in matching and are not extracted.

@Dessix
Copy link

Dessix commented Nov 23, 2023

Coming from Rust and Haskell, I keep running into exactly this syntactic limitation, so I'm definitely glad to see recent activity on it.

The only thing I'd suggest given recent code snippets is that it should also be possible to name the binding separately from the match- for example in cases of an Option field being bound only when it's Some, or to destructure the field for further conditionality on content without creating any actual bindings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.