-
Notifications
You must be signed in to change notification settings - Fork 28
SIP-43 - Pattern matching with named fields #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some first comments, without digging too much into the details.
case User(age = _) => "Just wanted to use the extractor, lol!" | ||
``` | ||
|
||
### Reordering of user code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is an issue, because Extractors are supposed to be pure. This is already specified today: the pattern matcher is under no obligation to respect the order in which extractors are evaluated, nor how many times they will be evaluated, if any. I don't think it's a problem to extend that requirement to the def _x
accessors.
Whenever a single named argument is used in pattern, the pattern can have fewer arguments than the unapply provides. This is driven by the motivation the make pattern matching extensible. | ||
But this leads to (arguably small) inconsistency, as pointed out by Lionel Parreaux in the [Scala Contributors Thread](https://contributors.scala-lang.org/t/pattern-matching-with-named-fields/1829/44). This could lead users to use a named pattern, just to skip all parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular aspect troubles me. This will cause to silently ignore fields if we forget some.
I strongly believe we should have explicit syntax to deliberately ignore the remaining fields, if that's what we want. Perhaps something like
case User(name, city = c, _*) =>
with the generalized understanding that _*
deliberately ignores what's left, whether it's single fields or variadic fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sjrd I did not understand what troubles you, normally one destructures to use the extracted value, so what is a case in which forgetting fields causes an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, refactoring.
If I already have some case User(name, city = c) =>
in my code, and then later I change the definition of User
to add a new field, there is a pretty good chance that there are at least some case User(...)
that will have to take this field into account. In many cases, for me, most if not all of my case User
will need to take it into account. I want the compiler to help me with that refactoring. If we silently ignore any omitted field, refactoring becomes risky.
Co-authored-by: Sébastien Doeraene <[email protected]>
Co-authored-by: Sébastien Doeraene <[email protected]>
Co-authored-by: Sébastien Doeraene <[email protected]>
Co-authored-by: Sébastien Doeraene <[email protected]>
Co-authored-by: Sébastien Doeraene <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sjrd Thank you very much for your review. Now I have something to chew on.
|
||
If a pattern with a name is encountered, the compiler looks up list of provided names and places the trees accordingly. | ||
|
||
The list of names either provided by the return type of the unapply method or by the constructor list of the case class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would simplify things. But I have two concerns:
- This would change the signature of all generated
unapply
s. For most cases this should be fine, because the new type is more specific than the old type. I'm unsure about libraries, like slick, which rely on the unapply method. But thats why we have the community build, right? - The current implementation would retroactively add named pattern for libraries compiled with an older scala version. That's maybe a good thing.
I'll give a try and see how much breaks.
Thanks @Jentsch for the SIP and impl proposal! This would be an exciting feature to have in the language. Here's a concern I've read about way back when (https://grokbase.com/t/gg/scala-debate/12ad0n33b0/default-and-named-arguments-in-extractors#20121015a7udamnwcyasnybzqfsq7kmvlq) about this, which I'll reproduce:
I'm really on the fence on this one. If we can find one, I think it would be good to find a way to break this similarity/difference. I'm thinking maybe with some sort of "and ignore the rest" syntax, as in |
IMO this is a plus. The varargs element can also be read as "and ignore the rest". So we're truly generalizing it; not inventing a different concept. |
Yeah, I buy that. Nice. So that would mean: case class Foo(a: Int, b: Int = 42)
assert(Foo(a = 1) == Foo(a = 1, b = 42))
assert(Foo(a = 1) != Foo(a = 1, b = 0))
val Foo(a = x, _*) = Foo(a = 1, b = 42)
val Foo(a = x, _*) = Foo(a = 1, b = 0)
// and
val Foo(a = x, b = y) = Foo(a = 1, b = 0)
val Foo(a = x) = Foo(a = 1, b = 0) // error |
The easiest way is to locally build and publish the compiler, and then use it. Otherwise, you have to convince the compiler team to merge your feature behind a flag. |
Proprietary example In the (closed source, Scala 2) application I work on, we have >1,200 occurrences of four consecutive placeholders in pattern matches. Here's an example monstrosity (class names altered to avoid even the tiniest leakage of proprietary code, but syntax unchanged):
With the proposed syntax including
I think it's much clearer, especially since there are multiple Boolean parameters to Foo and it's very helpful as a reviewer to know which one is being checked. If we don't use
Now
(And of course we could have done that for Open source example I found plenty of examples of multiple placeholders in this dusty old compiler 😄 Here's one example:
Arguably it's fairly readable already to anyone who works on the compiler because vparamss matches the name of the field, and we know
Hypothetical example: It's not that unusual to have multiple fields with the same type, e.g. suppose we have |
Quite a few examples in the wild in Metals and Scalameta This is a very similar domain as @chrisandrews-ms's compiler example, since it's common to match on big case classes when dealing with compiler internals. Another egregious example I've found in my own code: assert(api.routes.collectFirst {
// format: off
case Route(_, List(
RouteSegment.String("campings"),
RouteSegment.String("getByCoolnessAndSize"),
), _, _, _, _, _, _, _, _, _) => ()
// format: on
}.isDefined) (source) It's quite telling that I had to turn off scalafmt for this particular snippet, or else each In proprietary code, I've surely seen similar examples when dealing with database records which tend to be big case classes. |
Hey, sorry for the radio silence here. I've been meaning to reply here for a while, but never got around to doing it. We had an offline discussion a while ago with @odersky on how to make progress here. Some principles that guided the reasoning were:
I also rehashed some of the ideas with @adpi2 this morning. The conclusion I've reached from all of that is that we could design things as follows:
All of that is orthogonal to whether patterns should enforce exhaustivity by default or not. I have nothing to say about that that I haven't said before. Example: class User(case val lastName: String, case val age: Int, case val city: String) extends Product:
def _1: String = lastName
def _2: Int = age
def _3: String = city
object User:
def unapply(u: User): User = u // Product-based extractor
x match
case User(n, a, c) => // all positional
case User(lastName = n, age = a, city = c) => // all named
case User(age = a, city = c, lastName = n) => // all named, order changed
case User(n, age = a, c) => // mix; named arg 'age' is used at its corresponding position
case User(n, city = c, a) => // illegal; named arg city comes before positional arg 'a' and does not correspond to its position If later, we want to deprecate class User(case val name: String, case val age: Int, case val city: String) extends Product:
def _1: String = name
def _2: Int = age
def _3: String = city
@deprecated
def lastName: String = name
end User while retaining backward binary and TASTy compatibility. Some illegal definitions: // Not same number of `case` defs versus `_N` defs
class User(case val name: String, case val age: Int):
def _1: String = name
def _2: Int = age
def _3: String = city
// Type mismatch between `case` def and `_N` def
class User(case val name: String, case val age: Int):
def _1: Int = age
def _2: String = name |
How about |
@sjrd IIUC, in your proposal, fields cannot be omitted in a match? To me, allowing to skip the |
We can omit fields, but only if we add a I really don't want to loose the safety of matching of all fields just because I write an explicit name for a |
I'm also sorry for the radio silence. Here a bulk of thoughts: Keywordwise: Could we use class User(match val name: String, ...) I'll using that syntax for now, to see how much it hurts the eyes. Pairing the name and the position by their position seems a brittle. Checking the types helps, but even the So here an alternative: If a class User(match val name: String, match val age: Int, match val city: String):
def _1: String = name
def _2: Int = age
def _3: String = city
@deprecated
match def lastName: String = name The first position (
Regardless how we recognize aliases, with the option of having fields without corresponding positional arguments we need have to think about the space engine for exhaustiv checking of patterns. Would it be okay to leave here room for future improvement? class User(match val name: String, ...):
match def firstName: String = name.split(" ")(0)
user match:
User(firstName = "Anna", ...) => // is equivalent to
u @ User(...) if u == "Anna" => // so exhaustiv checking is limited
@sjrd What the reasoning for the first rule? Especially because it seems to enforce the second rule / limitation. Regarding using the token And now to the spicy bit: exhaustivity of names I have a question: If we make Or another approach: Instead of deciding at the usage side, would it be better to decide on the declaration side? Like:
@gabro, @chrisandrews-ms, @soronpo: Sorry for not coming back to your examples. I couldn't come up with a questionary that does @sjrd concerns justice. I think every short self containing example is extremly biased against enforcing the usage of all names. |
The reviewers of this proposal are now @smarter, @raulraja, and @chrisandrews-ms. I’ve removed @sjrd from the team of reviewers because he will work actively on the proposal with @Jentsch. |
@sjrd any progress on this one? I think it would be good to discuss this on the next SIP meeting. |
Ouch ... another one that keeps slipping away. :(
Why not? I don't think it collides with any existing syntax, since
The main issue with that is that it makes the body of a val/def part of the public API. In terms of library evolution, that is a big problem. We would need a publicly visible way to ensure that. Regarding brittleness, perhaps we can enforce that if the positions match, then the bodies are aliasing. That enforces internal consistency, while still not exposing the body to the public API. It still does not allow deprecated things, however.
If we can indeed recognize aliases, I don't see why we would need to downgrade exhaustivity like that. What am I missing?
The rationale is that it makes sure that a positional argument at position 2 unambiguously correspond to position 2 in the definitions. The same restriction exists when mixing named and positional arguments in calls: scala> def foo(a: Int, b: Int, c: Int) = (a, b, c)
def foo(a: Int, b: Int, c: Int): (Int, Int, Int)
scala> foo(1, b = 2, 3)
val res0: (Int, Int, Int) = (1,2,3)
scala> foo(1, c = 2, 3)
-- [E171] Type Error: ----------------------------------------------------------
1 |foo(1, c = 2, 3)
|^^^^^^^^^^^^^^^^
|missing argument for parameter b of method foo: (a: Int, b: Int, c: Int): (Int, Int, Int)
1 error found
Yes, absolutely. We can keep it mandatory at this point, and it would be backward compatible to remove that restriction in the future if so desired. |
Is it worth to introduce just |
Since there hasn't been much activity on this one for a long time, we're withdrawing it. This feature, however, is something we're definitely interested in. So, if there's someone interested in picking this proposal up - feel free to reopen the PR or submit a new SIP from scratch! |
Another scheme would be possible if we had named tuples. Named pattern matches would be supported for
Example: object Address:
def unapply(x: Any): Option[(city: String, zip: Int, street: String)]
x match
case Address(city = c, zip = z) => ... I am against an explicit marker for named fields. Object deconstruction is usually by selecting fields, and there's no check that we have selected all the fields. Named pattern matches should be analogous. In my mind, it's a feature that we can add more fields to a class and not go through the ceremony of updating all patterns. In a case where we want to insist on complete matches, just define an unapply that returns a regular tuple. If that gets too large and you want to use names, I'd argue your spec is unreasonable. You should not have a great number of fields and at the same time insist that all fields are matched. So in balance I think I prefer partial named matches. Note that F# does the same thing:
|
Coming from Rust and Haskell, I keep running into exactly this syntactic limitation, so I'm definitely glad to see recent activity on it. The only thing I'd suggest given recent code snippets is that it should also be possible to name the binding separately from the match- for example in cases of an |
State of implementation is here: scala/scala3#15437