|
| 1 | +# Dart Tagged Strings (generalized) |
| 2 | + |
| 3 | +Author: @lrhn<br>Version: 1.0 |
| 4 | + |
| 5 | +## Problem statement |
| 6 | + |
| 7 | +String interpolations are awesome. With [interpolation elements][] allowing `if`/`for`/etc.-elements inside an interpolation, maybe even comma separated expressions, string interpolations will be even better! |
| 8 | + |
| 9 | +Sometimes you need to build something that is not a string, but where a string template with embedded values would be a useful format, but string interpolations can only create strings. |
| 10 | + |
| 11 | +Taking inspiration from other languages, this is a proposal for “tagged strings”, or “tagged interpolations”, which is a language feature that allows something looking like a string literal or string interpolation to be interpreted by user code by prefixing it with a value, called the “tag” because it’s often a single identifier, and that expressions’ value get access to the individual string parts and expression values of the interpolation expression. |
| 12 | + |
| 13 | +This is a *generalization* of [Munificent’s feature specification](feature-specification.md "Feature specification"), allowing *interpolation elements* inside interpolations, and allowing any (primary) *expression* as the tag. |
| 14 | + |
| 15 | +[interpolation elements]: https://github.com/dart-lang/language/issues/1478 "String interpolation elements issue" |
| 16 | + |
| 17 | +## Proposal |
| 18 | + |
| 19 | +### Grammar |
| 20 | + |
| 21 | +The grammar is updated by moving some of the the current `<primary>` productions to `<primaryOrTag>` which can produce everything the current `<primary>` can, except `<literal>` and `<functionExpression>`, and then adding the following new `<primary>` production: |
| 22 | + |
| 23 | +```ebnf |
| 24 | +<primaryOrTag> ::= |
| 25 | + <thisExpression> |
| 26 | + | `super' <unconditionalAssignableSelector> |
| 27 | + | `super' <argumentPart> |
| 28 | + | <identifier> |
| 29 | + | <newExpression> |
| 30 | + | <constObjectExpression> |
| 31 | + | <constructorInvocation> |
| 32 | + | `(' <expression> `)' |
| 33 | + |
| 34 | +<primary> ::= |
| 35 | + <primaryOrTag> |
| 36 | + | <functionExpression> |
| 37 | + | <literal> |
| 38 | + | <primaryOrTag> <stringLiteral> -- aka. tagged string. |
| 39 | +``` |
| 40 | + |
| 41 | +_(This avoids adjacent `<stringLiteral>`s because a `<stringLiteral>` can itself consist of multiple `<singleLineString>` and/or `<multiLineString>`s, so adjacent `<stringLiteral>`s would be ambiguous.)_ |
| 42 | + |
| 43 | +This updated grammar is incremental and unambiguous. A string literal can occur *as* a `<primary>` or *after* one non-string-literal. The former is the same as the existing grammar, and the latter is not allowed in the existing grammar, so all new syntax was previously invalid syntax, and all existing syntax is still valid and parses the same way. |
| 44 | + |
| 45 | +### Static semantics |
| 46 | + |
| 47 | +All `<primary>` productions that were also allowed by the existing grammar are parsed and treated the same way as the corresponding existing production. |
| 48 | + |
| 49 | +Type inference of `e s` where `e` is a `<primaryNoString>` and `s` is a `<stringLiteral>`, in with context type scheme `C` proceeds as follows: |
| 50 | + |
| 51 | +* Perform horizontal type inference on `e` and the interpolation elements, `e1` … `en`, of `s`, *as if* inferring types for an invocation `apply(e, [e1, …, en])` in context `C` where `apply` has signature |
| 52 | + |
| 53 | + ```dart |
| 54 | + R Function</*out*/ R, /*in*/ E>(StringInterpolation<R, E> e, List<E> es) |
| 55 | + ``` |
| 56 | + |
| 57 | + _We want the type of elements to be able to affect inference of the string interpolation object, and vice versa, so we use the one kind of inference we have that allows inference direction to depend on available information._ |
| 58 | + |
| 59 | +* Let the elaborated expression `e'` with static type `S` be the type inference result for `e`, and elaborated elements `e1’`…`en’` with static element types `S1`…,`Sn` the type inference results for the elements `e1`…`en`. Let `R` and `E` be the inferred type arguments to `apply`. |
| 60 | +
|
| 61 | +* It’s a compile-time error if *S* is not a subtype of `StringInterpolation<Object?, Object?>`. (If `e` would have type `dynamic`, type inference has downcast it to `StringInterpolation<R1, E1>` for some types `R1` an `E1`). |
| 62 | +
|
| 63 | +* If `S` implements `StringInterpolation<R1, E1>` for some types `R1`, `E1`, then let `R1`, `E1` be those types. |
| 64 | +
|
| 65 | +* Otherwise `S` is a bottom type. Then let let `E1` be `dynamic` and `R1` be `Never`. |
| 66 | +
|
| 67 | +* It’s a compile-time error if any of the types `S1`…`Sn` is not a subtype of `E1`. |
| 68 | +
|
| 69 | +* The inference result of `e s` is `e' s'` where `s'` is `s` with each interpolation expression `e1`…,`en` replaced by the corresponding elaborated expression `e1'`…`en'`. The static element type of `s'` is `U`, and with static type *T*, which is `R1` coerced to `C` if necessary. |
| 70 | +
|
| 71 | +### Runtime semantics |
| 72 | +
|
| 73 | +Evaluation of `e s` where `e` has static type `S` proceeds as follows: |
| 74 | +
|
| 75 | +* Evaluate `e` to a value *v*. By soundness `S` must not be a bottom type, so it implements `Interpolation<R1, E1>` for some types `R1` and `E1`, and therefore _v_ must implement `Interpolation<R2, E2>` with `R2` \<: `R1` and `E2` \<: `E1` (the latter only until we get variance annotations, then the direction switches.) |
| 76 | +
|
| 77 | +* For each single-line or multi-line string, `si`, in `s` in source order: |
| 78 | +
|
| 79 | + 1. Let *p0* be the start of the string literal. _For a multi-line string with only whitespace on the first line, that position is at the start of the next line._ |
| 80 | +
|
| 81 | + 2. Let *p1* be the position of the `$` of the first interpolation in the string literal after *p0*, or the end of the string literal if there are no further string interpolations _(always the case for a raw string)_. |
| 82 | +
|
| 83 | + 3. If *p0* \< *p1*: |
| 84 | +
|
| 85 | + * Let *s* be a string containing the characters denoted by the string literal content from *p0* to *p1*. |
| 86 | + * Invoke the `addString` member of *v* with the value *s*. |
| 87 | +
|
| 88 | + 4. If *p1* is not at the end of the string |
| 89 | +
|
| 90 | + * Let *ei* be the interpolation element of the interpolation starting at *p1*. |
| 91 | +
|
| 92 | + * Execute *ei* as an element, and for each yielded value *w*, invoke the `add` member of *v* with the value *w*. |
| 93 | +
|
| 94 | + * Let *p0* be the position after the interpolation starting at *p1*. _The position after the identifier or closing `}`._ |
| 95 | + * Goto 2. |
| 96 | +
|
| 97 | +* Invoke the `close` method of *v* with no arguments, and let *r* be the returned value. |
| 98 | +
|
| 99 | +* Then `e s` evaluates to *r*. |
| 100 | +
|
| 101 | +### Support class |
| 102 | +
|
| 103 | +This supposes an interface definition in the platform libraries (in `dart:core` most likely): |
| 104 | +
|
| 105 | +```dart |
| 106 | +abstract interface class StringInterpolation</*out*/ R, /*in*/ E> { |
| 107 | + void addString(String string); |
| 108 | + void add(E value); |
| 109 | + R close(); |
| 110 | +} |
| 111 | +``` |
| 112 | +
|
| 113 | +An instance of this class should support having `addString` and `add` invoked any number of times in any order, and then a final invocation of `close` should produce a result from those strings and values. Any further calls after calling `close` are allowed to fail. |
| 114 | +
|
| 115 | +## Alternatives and considerations |
| 116 | +
|
| 117 | +### Formatting |
| 118 | +
|
| 119 | +I would suggest formatting a tagged string interpolation with no space between the tax and the string. |
| 120 | +
|
| 121 | +```dart |
| 122 | +var x = color"FF8080"; |
| 123 | +var y = hex"DEADBEEF"; |
| 124 | +Uint8List z = utf8"☃️"; |
| 125 | +var w = Template<B>(defaultB: const B(42))"this is a template<${inject<B>()}> or something"; |
| 126 | +``` |
| 127 | +
|
| 128 | +(This shows that one might want some character `Encoding`s to implement `StringInterpolation`. Or maybe other types that can accept a `String` in some way. Not all of them make sense, but the ones that are mainly conversions, and where it may make sense to apply them to a literal, might.) |
| 129 | +
|
| 130 | +In general, being a primary expression suggests not having internal whitespace. |
| 131 | +
|
| 132 | +### Can’t use `r` as tag. |
| 133 | +
|
| 134 | +Since `r”a”` is a single string, the identifier `r` cannot be used as a tag name without parentheses. Parentheses are allowed since it’s a primary expression, so `(r)"tag${content}"` will work. |
| 135 | +
|
| 136 | +### Not using a live object as tag |
| 137 | +
|
| 138 | +Instead of using the active interpolation as the “tag” value, which likely implies the “tag name” being a getter producing a new value for each use, a tag could be a constant with a factory function creating the actual collector object. |
| 139 | +
|
| 140 | +Nothing much is gained from that, it just postpones the allocation and introduces an extra interface, and an extra step at each tag interpolation. |
| 141 | +
|
| 142 | +On the other hand, asking for a method instead of an interface to create the live value collector object *could* allow an extension method on a non-traditional object: |
| 143 | +
|
| 144 | +```dart |
| 145 | +extension Nyah on int { |
| 146 | + StringBuilder<int, int> get stringBuilder => _IntEval(); |
| 147 | +} |
| 148 | +void main() { |
| 149 | + print(5"+${4}-${3}"}); // Prints 6. |
| 150 | +} |
| 151 | +``` |
| 152 | +
|
| 153 | +Not sure anything *good* can come from allowing that. |
| 154 | +
|
| 155 | +### Providing an iterator instead of calling `add` methods |
| 156 | +
|
| 157 | +Rather than calling `add` methods on a live object, a single method could be called with an iterable that iterates through the strings and values of the interpolation, then the tag implementation can iterate as far as it needs. |
| 158 | +
|
| 159 | +That would complicate the evaluation massively, requiring the implementation to be suspendable after each value of an element. It would effectively introduce iterable literals as `iter"${...[ iterable elements here ]...}"`. |
| 160 | +
|
| 161 | +The only extra power is the ability to stop evaluation early, which is likely just making code less readable. Neither list literals nor string interpolations are lazy, we don’t need it here either. |
| 162 | +
|
| 163 | +### Evaluating all interpolation elements before creating the tag. |
| 164 | +
|
| 165 | +Instead of an iterator, the context could eagerly evaluate all the values, then pass a list of strings and values as a single argument. |
| 166 | +
|
| 167 | +That requires extra allocation that isn’t necessarily needed. If a tag implementation wants a list of all elements and strings, it can build one. If it doesn’t, it can choose not to. By the interpolation not doing anything other than providing the string or value *as soon as possible*, the tag implementation has maximal control and minimal overhead. |
| 168 | +
|
| 169 | +### Works with async |
| 170 | +
|
| 171 | +If the function is `async`, then any interpolation element expression can `await`. That just works, the tag implementation doesn’t do anything when not invoked, it can wait as long as it takes for the next value or a `close`. |
| 172 | +
|
| 173 | +It’s not possible to have a delay inside an interpolation *other* than by using `await`. |
| 174 | +
|
| 175 | +### The tag implementation cannot be async |
| 176 | +
|
| 177 | +The `add` and `addString` methods are not asynchronous. There is no way to *delay* the execution of a string interpolation, once it starts, it runs to completion unless the element expressions themselves use `await`. If combining the values requires time, the result type must itself be a future, a `StringInterpolation<Future<R>, E>`. |
| 178 | +
|
| 179 | +### Allowing more than just identifiers as tags |
| 180 | +
|
| 181 | +By allowing many non-literal primary expression as the “tag”, a some mistakes that would be syntax errors become type errors instead. Forgetting a comma in a list can leave `Banana() "banana"`, which gets the error “A 'Banana' value does not implement ’StringInterpolation’ .”. That *is* worse than the current “Expected to find ','.” |
| 182 | +
|
| 183 | +On the other hand, we could probably allow `<primary> <selector+> <stringLiteral>`, making a string literal a *selector* |
| 184 | +
|
| 185 | +It can probably safely be restricted to not allow literals or function expressions. Those are incapable of implementing `StringInterpolation` anyway. |
| 186 | +
|
| 187 | +## Versions |
| 188 | +
|
| 189 | +* 1.0 (2024-06-27): Initial version |
| 190 | +
|
| 191 | +
|
| 192 | +
|
| 193 | +
|
| 194 | +
|
| 195 | +
|
| 196 | +
|
0 commit comments