-
Notifications
You must be signed in to change notification settings - Fork 310
Systematically survey message content for unimplemented features. #917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
95688ba
to
5728c5f
Compare
e9133c3
to
43682cd
Compare
Should be ready for review once the CI passes |
ba7190e
to
b0d8d5a
Compare
Pushed some typo fixes. |
Still needs to support:
|
e87250c
to
1a20f60
Compare
Exciting!! Would it be easy to also include the number of messages with each feature, in the output of |
Yeah, it should be straightforward. |
1a20f60
to
66466f7
Compare
fbc0745
to
0bc1c06
Compare
TODO: |
c6e9c88
to
ad9f338
Compare
be9d79a
to
fb790a7
Compare
Nice! This worked for me locally :) and nothing stood out to me from a quick skim of the code. Marking for @gnprice's review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @PIG208 for building this, and @chrisbobbe for the previous review!
I'm not yet done reading this, but initial comments below. Most are small.
pubspec.yaml
Outdated
ini: ^2.1.0 | ||
# Keep list sorted when adding dependencies; it helps prevent merge conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note comment 🙂
ini: | ||
dependency: "direct dev" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit message understates this change:
deps: Make args and ini direct dev dependencies.
For args
we're just converting it from a transitive dependency to a direct (dev) dependency.
But for ini
the change is bigger: it wasn't a dependency at all, and now it is one.
So the commit message should describe the most important change. Then the args
part is minor, and is fine to squash into the same commit and just mention in passing.
/// See also: | ||
/// * lib/model/content.dart, which implements of the content parser. | ||
/// * tools/content/fetch_messages.dart, which produces the corpuses. | ||
void main() async { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test can be run manually via:
`flutter test --dart-define=corpusDir=path/to/corpusDir tools/content`
This is a pretty funky command line.
There's a good reason it's done this way, and we smooth it over in the end by having a nice wrapper script. But the wrapper script should appear in the same commit as this file does, so that the "test" file offers a reasonable way to run it from the beginning.
If you want to split the changes into two commits, probably a good way to do so would be one commit for fetching and another for parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting them in the same commit makes sense. There is not much complexity that can be resolved by a sequence of separate ones.
@@ -0,0 +1,147 @@ | |||
// Override `fluter test`'s default timeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no #!
line ("shebang line") here, so trying to execute it directly won't work. It therefore shouldn't have the executable flag set (unlike check-features
).
tools/content/fetch_messages.dart
Outdated
@@ -0,0 +1,223 @@ | |||
#!/usr/bin/env dart |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably cleanest to leave the shebang line (and the executable flag) off of this file, too. Then it's clear there's one intended way to invoke this functionality, via the check-features
wrapper script.
tools/content/fetch_messages.dart
Outdated
exit(0); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this exit
call differ from simply falling off the end of main
here and returning?
tools/content/fetch_messages.dart
Outdated
final fetchNewer = parsedArguments['fetch-newer'] as bool; | ||
int? anchorMessageId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a CLI program I think it's generally helpful to split the parsing of the command line (and of config that's closely tied to the command line) from the rest of the logic.
So between these two lines is basically where that transition happens — we're done inspecting parsedArguments
(and parsedConfig
), and we're starting to do the work and to have side effects. It'd therefore be good to take the bottom half of the function, below this point, and move it to its own function; main
would end by calling that function, passing a bunch of arguments like email
and outputDirStr
.
tools/content/model.dart
Outdated
import 'package:json_annotation/json_annotation.dart'; | ||
|
||
/// A data structure representing a message. | ||
@JsonSerializable() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this JsonSerializable
(and so the json_annotation
import) doesn't end up getting used.
/// * lib/model/content.dart, which implements of the content parser. | ||
/// * tools/content/fetch_messages.dart, which produces the corpuses. | ||
void main() async { | ||
Future<void> checkForUnimplementedFeatureInFile(File file) async { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
Future<void> checkForUnimplementedFeatureInFile(File file) async { | |
Future<void> checkForUnimplementedFeaturesInFile(File file) async { |
It's checking for any possible unimplemented features, right? Vs. some particular unimplemented feature.
if (htmlNode.className.isEmpty) { | ||
featureName = '<${htmlNode.localName!}>'; | ||
} else { | ||
featureName = '<${htmlNode.localName!} class="${htmlNode.classes.join(" ")}">'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
featureName = '<${htmlNode.localName!} class="${htmlNode.classes.join(" ")}">'; | |
featureName = '<${htmlNode.localName!} class="${htmlNode.className}">'; |
Or is the difference useful?
(className
is the original data; classes
is a view on it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The classes
view just trims each individual name, I think it is not necessary here.
fb790a7
to
8b662c7
Compare
Thanks for the review! The PR has been updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I've read through the whole thing — and I've also now run it myself, and browsed through the output. Thanks again @PIG208 for building this!
Below are a few small comments. Just one (the last) affects behavior; it's a papercut I ran into when using the script.
Then after these small things are fixed, I'd like to go ahead and merge this. It works and has served its purpose; any deeper changes can wait until a possible future where we're using this script more extensively.
(I discovered a few new tidbits from reading through the output myself: #921 (comment) . Definitely glad the script makes nice detailed output to look at.)
tools/content/fetch_messages.dart
Outdated
required bool fetchNewer, | ||
}) async { | ||
int? anchorMessageId; | ||
IOSink output = stdout; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This initializer never gets used now. Can delete this line, and just say final output = …
below.
tools/content/fetch_messages.dart
Outdated
|
||
// Avoid any Flutter-related dependencies so this can be run as a CLI program. | ||
import 'package:args/args.dart'; | ||
import 'package:http/http.dart'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
import 'package:http/http.dart'; | |
import 'package:http/http.dart' as http; |
Otherwise the names like Client
are too generic.
tools/content/fetch_messages.dart
Outdated
// This fallback will only be used when first fetching from a server. | ||
'anchor': anchorMessageId != null ? jsonEncode(anchorMessageId) : 'newest', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this logic really makes most sense in the caller
// This fallback will only be used when first fetching from a server. | |
'anchor': anchorMessageId != null ? jsonEncode(anchorMessageId) : 'newest', | |
'anchor': anchorString, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then newest
effectively assumes fetchNewer
is false, right? Probably just say oldest
instead if fetchNewer
is true.
final outputLines = <String>[]; | ||
int failedMessageCount = 0; | ||
if (messageIdsByFeature.isNotEmpty) { | ||
failedMessageCount = messageIdsByFeature.values.map((x) => x.length).reduce((a, b) => a + b); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: line too long (the a + b
, which is a critical piece of its meaning, is past 80 columns)
Could put .reduce
on the next line; or could use .sum
, from package:collection
.
// `_walk` modifies `messageIdsByFeature` and `contentsByFeature` | ||
// in-place. | ||
_walk(message.id, parseContent(message.content).toDiagnosticsNode(), | ||
messageIdsByFeature: messageIdsByFeature, | ||
contentsByFeature: contentsByFeature); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This structure works fine, and it's not worth spending time to rework it in this PR — this script works, so I'd like to merge it after just fixing some nits.
But FWIW a useful alternative way to structure this sort of thing is to make a class that would have messageIdsByFeature
and contentsByFeature
(and for that matter totalMessageCount
) as fields. Then we'd _walk
would become a method on that class; we'd construct an instance of that class just before the loop, and call the method in the loop.
// This buffer allows us to avoid using prints directly. | ||
final outputLines = <String>[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think the usual Dart solution here would be:
// This buffer allows us to avoid using prints directly. | |
final outputLines = <String>[]; | |
final buf = StringBuffer(); |
and then buf.write('foo\n')
below in place of outputLines.add('foo')
.
That's a little cleaner, partly because it's more generic: it doesn't need \n
to be treated differently from other characters.
outputLines.addAll([ | ||
'Unsupported feature #$unsupportedCounter: $featureName', | ||
'message IDs:\n${messageIdsByFeature[featureName]!.join(', ')}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outputLines.addAll([ | |
'Unsupported feature #$unsupportedCounter: $featureName', | |
'message IDs:\n${messageIdsByFeature[featureName]!.join(', ')}', | |
final messageIds = messageIdsByFeature[featureName]!; | |
outputLines.addAll([ | |
'Unsupported feature #$unsupportedCounter: $featureName', | |
'message IDs (up to 100): ${messageIds.take(100).join(', ')}', |
As is, this list is long enough that it gets in the way for very high-frequency unimplemented features, like Twitter previews.
Additionally, args is made a direct dev dependency. We will later use them to write CLI scripts for fetching messages. Signed-off-by: Zixuan James Li <[email protected]>
… features. We added 2 scripts and a wrapper for them both. - fetch_messages.dart, the script that fetches messages from a given Zulip server, that does not depend on Flutter or other involved Zulip Flutter packages, so that it can run without Flutter. It is meant to be run first to produce the corpora needed for surveying the unimplemented features. The fetched messages are formatted in JSON Lines format, where each individual entry is JSON containing the message ID and the rendered HTML content. The script stores output in separate files for messages from each server, because message IDs are not unique across them. - unimplemented_features_test.dart, a test that goes over all messages collected, parses then with the content parser, and report the unimplemented features it discovered. This is implemented as a test mainly because of its dependency on the content parser, which depends on the Flutter engine (and `flutter test` conveniently sets up a test device). We mostly avoid prints (https://dart.dev/tools/linter-rules/avoid_print) in both scripts. While we don't lose much by disabling this lint rule for them, because they are supposed to be CLI programs after all, the rule (potentially) helps with reducing developer inclination to be verbose. See comments from the scripts for more details on the implementations. ===== Some main benefits of having the wrapper script to access dart code are that we can provide a more intuitive interface consistent with other tools, for fetching message corpora and/or running the check for unimplemented features. Very rarely, you might want to use fetch_messages.dart directly, to use the `fetch-newer` flag for example to update an existing corpus file. If we find it helpful, the flag can be added to check-features as well, but we are skipping that for now. The script is intended to be run manually, not as a part of the CI, because it is very slow, and it relies on some out of tree files like API configs (zuliprc files) and big dumps of chat history. For the most part, we intend to only keep the detailed explanations in the underlying scripts close to the implementation, and selectively repeat some of the helpful information in the wrapper. This also repeats some easy checks for options, so that we can produce nicer error messages for some common errors (like missing zuliprc for `fetch`). Fixes: zulip#190 Signed-off-by: Zixuan James Li <[email protected]>
8b662c7
to
1540427
Compare
Thanks for the review! I have updated the PR skipping the |
Thanks for the revision! All looks good — merging. |
Currently a WIP. The code needs a bit of cleanup.