Skip to content

Add type-safe "read" methods #1903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
matanlurey opened this issue Apr 13, 2018 · 28 comments
Closed

Add type-safe "read" methods #1903

matanlurey opened this issue Apr 13, 2018 · 28 comments
Assignees
Labels
package:yaml type-enhancement A request for a change that isn't a bug

Comments

@matanlurey
Copy link
Contributor

Very similar to dart-lang/core#39.

Helps address

... and other, unfiled issues that we've hit trying to use package:yaml with Dart2 semantics.

We don't need full coverage of every possible type to start delivering lots of value. Here's my idea:

abstract class YamlMap {
  T read<T>(dynamic key);
  List<T> readList<T>(dynamic key);
  Map<K, V> readMap<K, V>(dynamic key);
}

Example simplified use:

void _parseMap(YamlMap config) {
  // Use <T>
  final name = config.read<String>('name');

  // OR, use inference:
  final String name = config.read('name');

  // Also works well with methods that already expect types
  _validateName(config.read('name'));
}

void _validateName(String name) { ... }

We could guard against trying to read collections, as well:

T read<T>(dynamic key) {
  if (T is Iterable || T is Map) {
    throw new UnsupportedError('Use readList or readMap instead');
  }
  ...
}

Here is an example implementation of readList:

List<T> readList<T>(dynamic key) {
  List<dynamic> list = this[key];
  return list.cast<T>();
}

One trickier case is readMap, for example, values could be a List<String>.

Two ideas:

  1. Special case:
Map<K, V> readMap<K, V>(dynamic key) {
  if (v is Iterable) {
    // Special case, and use readList here with .cast.
  }
}
  1. Add yet-another method:
Map<K, List<V>> readMapList<K, V>(dynamic key);

Thoughts? @srawlins @eseidelGoogle @nex3 @natebosch @jakemac53

@jakemac53
Copy link
Contributor

jakemac53 commented Apr 13, 2018

Pretty much all users of package:yaml will end up needing something like this to run under dart 2. I would prefer to have a shared implemention here then a smattering of implementations everywhere.

@nex3
Copy link
Member

nex3 commented Apr 13, 2018

I don't think this is inherently Dart 2-specific—it's always been important to have good type-checking for YAML files if you want to produce decent error messages. It's just that Dart 2 now reminds users more emphatically of that fact.

I'd really like to provide a general API for type-safe reading of untyped structured data. It's not just YAML where this comes up, but args (as Matan points out), JSON, and cross-isolate messaging. json_rpc_2's Parameters class is an implementation of this idea, but I think it could be generalized. Something like:

/// An untyped value that can be accessed in a typed way.
///
/// All typed accessors throw [FormatException]s if the underlying value isn't
/// of the appropriate type.
abstract class Value {
  bool get asBool;
  int get asInt;
  // ...
  ValueMap get asMap;
  ValueList get asList;

  /// Creates a [Value] that provides access to the given [object].
  factory Value(object);

  /// Asserts that this value is a [String] and passes that string to
  /// [callback].
  ///
  /// If [callback] throws a [FormatException], modifies that [FormatException]
  /// to indicate that it's associated with this value.
  T parse<T>(T callback(String value));
}

/// A [Value] that's known to be a map.
abstract class ValueMap implements Value, Map<Object, Object> {
  // Similar to [YamlMap.nodes].
  Map<Value, Value> get wrapped;

  /// Asserts that this has the given [key] and returns its value.
  ///
  /// Throws a [FormatException] if this doesn't have the given key.
  Value require(Object key);

  /// If this has the given [key], calls [callback] with its value and returns
  /// the result.
  ///
  /// Otherwise, calls [orElse] and returns the result.
  T valueOr<T>(Object key, T callback(Value value), T orElse());
}

This could live in its own package and provide mixins that would make it easy for packages like yaml and args to implement it and provide custom error handling (such as producing SpanFormatExceptions).

@matanlurey
Copy link
Contributor Author

That seems pretty heavyweight compared to 3 methods (lots of wrappers, lots of code to write and test, the overhead of wrapping simple objects - like booleans - in a class), and also just more code to write. I want to write this (assuming config is YamlMap or similar, and log takes a String):

log(config.read('name'))

not

log(config.read('name').asString);

I don't want to try and block the perfect API that anyone wants to implement for there own packages, but as far as something that's both easy to read, write, and consume, and doesn't require creating yet-another shared package, I think the 3x read methods is superior.

@nex3
Copy link
Member

nex3 commented Apr 13, 2018

Those three methods are simple, but they're also difficult to extend, difficult to provide user-friendly errors for, and prone to incorrect typing due to their sensitivity to inference from the surrounding code. I suspect that in practice raw read() calls would be confusing and error-prone enough that a convention would evolve to always include the type parameter, in which case read<String>('name') doesn't provide value over ['name'].asString. It's not even clear to me that it's better than ['name'] as String if it doesn't improve the error message.

I've written a lot of code that parses untyped structured data, and I've always eventually found myself needing more advanced features like intelligent handling of sub-parsing and default values. If we leave those off now we'll just be addressing a third of the problem, and if we don't have a central package for this API to live in we won't have a way of building it out once we want to add more features.

@matanlurey
Copy link
Contributor Author

Given our deadlines I don't think we need more features.

I suspect that in practice raw read() calls would be confusing and error-prone enough that a convention would evolve to always include the type parameter, in which case read<String>('name') doesn't provide value over ['name'].asString.

There's quite a bit of code where <T> is automatically filled in (assignments, passing to methods).

I've written a lot of code that parses untyped structured data, and I've always eventually found myself needing more advanced features like intelligent handling of sub-parsing and default values.

Sure. I'm not talking about that though, I'm talking about 3 methods to make Dart2 palatable.

@natebosch
Copy link
Member

When I used Parameters from json_rpc_2 I found it to be one of the more frustrated bits of the API.

@nex3
Copy link
Member

nex3 commented Apr 13, 2018

Given our deadlines I don't think we need more features.

I'm fine with getting something simple out the door quickly, but it should be in a direction that will allow for more thorough work later. We don't have to add all the methods I outlined, but we should put what methods we do add in a package and interface that can be extended later.

There's quite a bit of code where <T> is automatically filled in (assignments, passing to methods).

There's also quite a bit of code where inferring this will lead to hidden errors that aren't detectable until runtime—if the method you're passing it to changes its argument type, read() will continue statically checking even if it's not compatible at all. That's exactly what the Dart 2 type system was meant to prevent.

@natebosch Can you go into more detail?

@natebosch
Copy link
Member

natebosch commented Apr 13, 2018

Using Parameters mainly doesn't feel like any other Dart code I'm used to writing, including when I interact with other untyped stuff like jsonDecode.

  • I don't want to have to use .value because it's noise.
  • I don't want to use asString because Dart already has a way to do that - as String (luckily I can avoid this one with .value as String).
  • I don't want to use valueOr(something)because Dart already has a way to do that - ?? something. IIRC this one I can't avoid because I'm not allowed to read out the null, I'd have to catch an exception.

I also don't see how the Parameter concept gets us any closer to solving the as List<String> issue... It changes parsed.readList<String>('key') into parsed['key'].asList<String>() I guess? Which also means that even in the cases where we don't need the noise we have the noise of parsed['key'].value

@nex3
Copy link
Member

nex3 commented Apr 13, 2018

You're right that Dart has built-in support for casting, but that support isn't customizable—it only ever throws CastErrors with no information about the value that produced the error. That's fine when you're sure out-of-band that your typing is correct—and in that case I agree that casting is a fine solution today.

But that's not the case for user-provided configuration like you get from YAML or for a protocol with well-defined error-handling behavior like JSON-RPC 2.0. In those situations, it's important for the errors to have the correct type and for them to be very comprehensible by the end user. To do that, you need domain-specific code to have a chance to learn that the type doesn't match and produce a nice error that says "this part of the YAML file was supposed to be a string" (or "this parameter of the RPC").

I don't want to use valueOr(something)because Dart already has a way to do that - ?? something. IIRC this one I can't avoid because I'm not allowed to read out the null, I'd have to catch an exception.

You could do parameters.asMap[key] ?? something, or assign the result of asMap to a variable like you'd assign jsonDecode(string) as Map. It's worth noting that YamlMap takes a different approach here, where the default operator [] returns dynamic and you have to opt in to the version wrapped with extra behavior. I'm not sure which is the better paradigm.

@matanlurey
Copy link
Contributor Author

matanlurey commented Apr 13, 2018

I think in general the consensus here unfortunately is not for Parameter-like behavior.

Given that we have very little time, and there isn't consensus, I'd like to push forward to adding the three read* methods, and we can always revisit this when time isn't a factor and the language team can also be more involved thinking of creative solutions (AFAIK, @leafpetersen and others are very motivated to figure out better patterns for JSON decoding post Dart-2).

@nex3
Copy link
Member

nex3 commented Apr 13, 2018

Three people chiming in on an issue isn't a consensus-gathering process. If you think it's worth spending the time trying for a real consensus, we can write up a doc that both of us agree lays out our positions accurately and send it out to the team for review and discussion. If you believe this needs to be addressed quickly, I've already proposed a compromise: create a small interface that implements as little as possible to make this work well going forward. But I'm not willing to allow the excuse of time constraints to push us into making decisions that aren't future-proof and accumulating more and more technical debt.

As Nate points out, Dart already has a built in mechanism for casting that works pretty well here if you're confident the input data is valid and so don't care about generating user-friendly errors. If you do care about generating nice errors, read<T>() is not a sufficient API.

@natebosch
Copy link
Member

As Nate points out, Dart already has a built in mechanism for casting that works pretty well here

The reason this issue was opened is that these mechanisms stop working in Dart 2 runtime semantics for the Map and List case. as List<String> will fail, but we can make readList<String>(key) work.

The reason I brought up the Dart cast was that it does work for non-generics so the extra asString doesn't add value for me like readList<T>() would, but it imposes a syntax tax on every parameter.

@matanlurey
Copy link
Contributor Author

Natalie, you can't hold this issue hostage because you disagree with the resolution.

I've individually talked to folks the use the YAML package most often

Sure, we can ask the entire of the rest of the team, but it's going to be similar to you asking the rest of the team about a SASS API, they aren't likely to have a lot of context (or desire to get that context).

I look forward to alternative proposals post-Dart2, but I'm planning on moving forward here.

@matanlurey matanlurey added the type-enhancement A request for a change that isn't a bug label Apr 13, 2018
@matanlurey matanlurey self-assigned this Apr 13, 2018
@natebosch
Copy link
Member

For more background on the motivation - here's an example of trying to get yaml parsing into typed collections working in Dart 2. This PR still doesn't solve everything but solves at least the first few issues I hit.

Note especially the awkward Map.map with a List.cast.
https://github.com/dart-lang/build/pull/1295/files#diff-145acce6f4d7418297151f983356688dR257

@nex3
Copy link
Member

nex3 commented Apr 14, 2018

@matanlurey Let's see if we can find a solution that works for all the stakeholders—including pub, test, and json_rpc_2 which all have strong error-reporting needs—before we land something that we'll have to either maintain forever or roll back. This issue has only been open for a few hours, and I've only heard from one user. I'm confident we can find something that works well for all use-cases.

@natebosch Here's what I'd like to see you be able to write there:

return options.require(_buildExtensions).asMap.wrapped
    .map((key, value) => new MapEntry(key.asString, value.asList.ofStrings);
  • You'd be able to remove the existing explicit error checks.
  • If the map comes from a YAML file, those errors are even nicer: they include source span information that points to the exact location of the invalid config.
  • You also get a nice error if a non-string ends up in a map value, where currently you just get a CastError lazily when the list is accessed.
  • Because the same interface that's used to wrap a plain Map would also be implemented by YamlNode, you'd be able to share validation code without needing to translate one into the other beforehand. This code could transparently produce whatever type of error makes the most sense for the input—either an ArgumentError or a SourceSpanFormatException.

Also FYI, I don't think _actualDartType is necessary in that code today—loadYaml() returns normal Dart objects. Only loadYamlNode() will wrap scalar values in YamlScalars, and when you have a YamlMap you need to explicitly access YamlMap.nodes to see YamlScalars—other methods like .map() return normal Dart values. It's designed to act just like JSON decoding, but with the ability to opt in to extra metadata.

@matanlurey
Copy link
Contributor Author

matanlurey commented Apr 14, 2018

I don't see any of those packages as related, sorry. I've talked to the maintainers of several packages who are currently hitting the problem, got some feedback (entirely positive other than your objection, which is noted) and I am pushing for moving forward here.

Here is my first pass at completing this feature: dart-archive/yaml#41.

@itsjoeconway
Copy link

Running into some of this as well. Some thoughts:

I prefer using a generic 'read' method. In an ideal world, type parameters to functions would be available at runtime and the type check is performed in the read. This is a common pattern in languages that support generics. If looking towards the future, I'd suspect that Dart will eventually have this ability.

T read<T>(dynamic key, {T onCastError(CastError e)}) {
  try {
    return this[key] as T;
  } on CastError catch (e) {  
    if (onCastError == null) rethrow;
    return onCastError(e);
  }
}

It's good practice, imo, to transition dynamic data structures like JSON and YAML into a concrete type where each key is a property of that type. Again in an ideal world, the type parameter is inferred to avoid specifying the type at the call-site:

class T implements YamlDecodable {
  String key;
  
  void decode(YamlMap map) {
    key = map.read('key');
  }
}

This then supports nested objects of a decodable type:

class U implements YamlDecodable {
  T nested;
  
  void decode(YamlMap map) {
    nested = map.readObject('nested');
  }
}

Using annotations + code generation seems like a good next step. My hunch is that a code generator would be much easier to write when using generic methods.

Here is a real world example of this. This could be further improved if generic methods could be overloaded, but it's not terrible. (FYI, Dart2 will force decode to be split into decode and decodeObject, if you read that far into it.)

(Additionally, asMap() is preferable to asMap because of the ability to override and provide optional arguments. asMap also looks like a member method reference to me and easy to mistake.)

@leafpetersen
Copy link
Member

I've been thinking about ways to make the json decoding API work nicely with types, and I wonder if the same approach might work here. The idea I'd like to push on is providing a way to build up schema descriptions and use that either to convert untyped parsed json to a typed format (lazily or eagerly), or better to drive the parsing itself so that we could parse directly into a typed format. I think the resulting client API would look something like this:

import 'cast.dart' as cast;

const c = cast.Keyed<String, dynamic>({
  "id": cast.int,
  "fields": cast.Keyed({"field1": cast.int, "field2": cast.int})
});

void test(dynamic json) {
  // Correct reified type
  Map<String, dynamic> result = c.castJson(json);
  int id = result["id"];
  // Correct reified type
  Map<String, int> fields = result["field"];
}

void main() {
  var json = <String, dynamic>{
    "id": 0,
    "fields": <String, dynamic>{"field1": 1, "field2": 2}
  };

  test(json);
}

This seems like it might work well for yaml as well. Is it worth exploring this further in this context?

@jakemac53
Copy link
Contributor

jakemac53 commented Apr 18, 2018

I wonder how error reporting might work with an api like this @leafpetersen ? Especially with the nested casts - it is important to be able to report the full context of which field failed to parse and why.

@matanlurey
Copy link
Contributor Author

There are definitely ~3 (or 4) different ways to handle YAML/JSON structured documents in Dart.

Let's take the following example from above:

{
  "id": 101,
  "fields": {
    "field1": 1,
    "field2": 2,
  }
}

Cowboy Style

... Or, have an intimate knowledge of Dart's type system and collections.

This is basically what is required today out of the box. You'd have to write:

void decodeJson(dynamic jsonResponse) {
  final map = jsonResponse as Map<String, dynamic>;
  final id = map['id'] as int;
  final fields = map['fields'] as Map<String, dynamic>;
  final field1 = fields['field1'] as int;
  ...  
}

... note, this is different for YAML (where there is no type guarantees, even for Map). If you get one of these explicit or implicit casts wrong, you'll get a CastError at runtime in strong-mode runtimes.

PROs:

  • Least overhead (no generated code, could use implicit casts to help Dart2JS in some places)
  • Only uses built-in language/libraries

CONs:

  • Requires PhD in Dart 2.0
  • Strategy differs depending on serialization format (JSON, YAML, ...)
  • Runtime errors only that are difficult to debug
  • No tooling assists (better keep referencing that JSON/YAML schema!)

Assisted Style

... Or, my idea above, using read* methods (or similar) to assist with casts:

void decodeJson(dynamic jsonResponse) {
  final map = jsonResponse as Map;
  final id = readTyped<int>(map, 'id');
  final fields = readTypedMap<String, dynamic>(map, 'fields');
  final field1 = readTyped<int>(fields, 'field1');
  ...  
}

PROs:

  • Similar overhead to "cowboy style" if enough inlining happens in the VM/Dart2JS

  • Plays really nicely with type inference in some cases, i.e:

    printAListOfStrings(List<String> x) {}
    
    decodeJson(jsonResponse) {
      // Automatically inserts <String> below using type inference.
      printAListOfStrings(readTypedList(jsonResponse['names']));
    }

CONs:

  • Still requires (nearly) a PhD in Dart 2.0
  • Runtime errors only (though you could catch CastError and give nicer errors?)
  • No tooling assists (better keep referencing that JSON/YAML schema!)

Structured casts

Or, @leafpetersen's suggestion above, so I won't copy and paste sample code.

PROs:

  • Does "the right thing" reification-wise
  • Fairly lightweight (no build system required)
  • A bit more intuitive to the average user (requires less Dart 2.0 knowledge if used)

CONs:

  • Runtime errors only (though you could catch CastError and give nicer errors?)
  • No tooling assists even though you spent the time defining a schema in imperative Dart code

Build system-based serialization

Or, @kevmoo's json_serializable or @davidmorgan's build_value, as examples.

PROs:

  • Does "the right thing" reification-wise
  • Tooling, glorious tooling (compile-time errors, auto-complete)

CONs:

  • Requires a build system, which is awkward in Dart, hard to show as copy-paste sample code
  • Heavyweight (or will at least seem so compare to other languages, even if not true)

Built-in language support

... i.e. something like value types or external types (@JS() might be an example today).

Unless this is planned and will ship in 6-12 months, it's probably not worth considering at all.

@matanlurey
Copy link
Contributor Author

matanlurey commented Apr 18, 2018

@leafpetersen:

This seems like it might work well for yaml as well. Is it worth exploring this further in this context?

See above. My worry with that suggestion is that while it does somewhat solve the issue of how to deserialize tightly structured unstructured data, and remove some of the need to become a Dart 2.0 PhD, you will still require at least a masters degree to handle more complex pieces of structured data.

It also doesn't provide any tooling or assists other than at runtime. So If I want to access document -> fields -> field1 -> name, I will need to do entirely through dynamic['fields']['field1']['name'] or insert casts and hope they are correct. You are basically describing the entire document schema, but all you get as a benefit is that Dart doesn't throw a CastError if used properly.

@natebosch
Copy link
Member

Another thing that I'd like our solution to allow is handling flexible schemas.

For instance if I have a field which must end up being a List<String> I might want to allow my user to enter only a String to avoid the extra syntax noise. Compare:

targets:
  $defaults:
    sources: "lib/**"
targets:
  $default:
    sources:
      - "lib/**"

It's a minor difference, but one that I think makes the configuration feel more friendly.

I don't think any of the approaches discussed have explicit support for this, but some of them are likely to get in the way.

I'll add a con for the "cowboy style" which is solved by the others - it requires different solutions for different fields. It's safe to as int, but not safe to as List<int> so in those cases we need to use something like (field as List).cast<int> and it feels very inconsistent.

@matanlurey
Copy link
Contributor Author

True that's probably the biggest concern. Two patterns for casting (at least if not more depending on the structure and type).

@leafpetersen
Copy link
Member

For instance if I have a field which must end up being a List I might want to allow my user to enter only a String to avoid the extra syntax noise. Compare:

I don't think this is hard to support:

import 'cast.dart' as cast;

const c = cast.Keyed<String, dynamic>({
  "id": cast.int,
  "fields": cast.Keyed({"field1": cast.int, "field2": cast.int}),
  "extra1": cast.OneOf(cast.String, cast.List(cast.String)),
  "extra2": cast.OneOf(cast.String, cast.List(cast.String)),
});

void test(dynamic json) {
  // Correct reified type
  Map<String, dynamic> result = c.castJson(json);
  int id = result["id"];
  Map<String, int> fields = result["field"];
  String extra1 = result["extra1"];
  List<String> extra2 = result["extra2"];
}

void main() {
  var json = <String, dynamic>{
    "id": 0,
    "fields": <String, dynamic>{"field1": 1, "field2": 2},
    "extra1": "hello",
    "extra2": <dynamic>["hello"],
  };

  test(json);
}

@leafpetersen
Copy link
Member

It also doesn't provide any tooling or assists other than at runtime. So If I want to access document -> fields -> field1 -> name, I will need to do entirely through dynamic['fields']['field1']['name'] or insert casts and hope they are correct.

To the extent that the result has a type expressible in Dart, this is not true: the result type of the cast will be as precise as possible. But it's true that absent union types, as precise as possible is not all that precise. :(

I think this approach could be made to support inserting user defined wrapper classes as you parse so that you could parse directly into a fully nominally typed API, but I haven't tried to write that API.

@leafpetersen
Copy link
Member

I wonder how error reporting might work with an api like this @leafpetersen ? Especially with the nested casts - it is important to be able to report the full context of which field failed to parse and why.

It doesn't seem harder to me than error reporting in general, but I'm not sure if that's helpful or not. Certainly when decoding from already parsed json, you can give essentially a stack trace showing the path through the json object that led to the error (probably at some cost to performance). Not sure if that's good enough or not. If you're parsing directly from a string, I think it's basically the same, but haven't worked it through.

@matanlurey
Copy link
Contributor Author

I talked to @leafpetersen offline, and his cast library seems to be a decent approach - specifically because implementations could offer optimized ways of creating these structures (helpful, especially for JSON - less so for YAML since there isn't a "native" parser).

@leafpetersen do you think this will still happen soon (TM), or should we have a stop-gap? I think it would be very reasonable to put something in package:convert for now, and if we want to add an optimized version in dart:convert in the future I wouldn't be opposed.

@matanlurey
Copy link
Contributor Author

So this didn't end up happening and probably nobody was better for it. Oh well :-/

@mosuem mosuem transferred this issue from dart-archive/yaml Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package:yaml type-enhancement A request for a change that isn't a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants