Systematically survey message content for unimplemented features. #917

PIG208 · 2024-08-27T21:44:47Z

Currently a WIP. The code needs a bit of cleanup.

PIG208 · 2024-08-29T06:56:38Z

Should be ready for review once the CI passes

PIG208 · 2024-08-29T18:38:50Z

Pushed some typo fixes.

PIG208 · 2024-08-29T19:47:07Z

Still needs to support:

Run it on all public messages in realms that publicly list themselves as open communities. For example, have the script sign up a test user in each realm and log in as that test user.

chrisbobbe · 2024-08-30T03:15:43Z

Exciting!!

Would it be easy to also include the number of messages with each feature, in the output of unimplemented_features_test.dart?

PIG208 · 2024-08-30T03:17:31Z

Yeah, it should be straightforward.

PIG208 · 2024-09-07T05:13:45Z

TODO: ~~Add this to tools/check making it easier to run.~~

chrisbobbe · 2024-10-08T22:06:30Z

Nice! This worked for me locally :) and nothing stood out to me from a quick skim of the code. Marking for @gnprice's review.

gnprice

Thanks @PIG208 for building this, and @chrisbobbe for the previous review!

I'm not yet done reading this, but initial comments below. Most are small.

gnprice · 2024-10-11T00:54:58Z

pubspec.yaml

+  ini: ^2.1.0
  # Keep list sorted when adding dependencies; it helps prevent merge conflicts.


note comment 🙂

gnprice · 2024-10-11T00:56:34Z

pubspec.lock

+  ini:
+    dependency: "direct dev"


The commit message understates this change:

deps: Make args and ini direct dev dependencies.

For args we're just converting it from a transitive dependency to a direct (dev) dependency.

But for ini the change is bigger: it wasn't a dependency at all, and now it is one.

So the commit message should describe the most important change. Then the args part is minor, and is fine to squash into the same commit and just mention in passing.

gnprice · 2024-10-11T01:00:39Z

tools/content/unimplemented_features_test.dart

+/// See also:
+/// * lib/model/content.dart, which implements of the content parser.
+/// * tools/content/fetch_messages.dart, which produces the corpuses.
+void main() async {


The test can be run manually via: `flutter test --dart-define=corpusDir=path/to/corpusDir tools/content`

This is a pretty funky command line.

There's a good reason it's done this way, and we smooth it over in the end by having a nice wrapper script. But the wrapper script should appear in the same commit as this file does, so that the "test" file offers a reasonable way to run it from the beginning.

If you want to split the changes into two commits, probably a good way to do so would be one commit for fetching and another for parsing.

Putting them in the same commit makes sense. There is not much complexity that can be resolved by a sequence of separate ones.

gnprice · 2024-10-11T01:02:33Z

tools/content/unimplemented_features_test.dart

@@ -0,0 +1,147 @@
+// Override `fluter test`'s default timeout


There's no #! line ("shebang line") here, so trying to execute it directly won't work. It therefore shouldn't have the executable flag set (unlike check-features).

gnprice · 2024-10-11T01:03:48Z

tools/content/fetch_messages.dart

@@ -0,0 +1,223 @@
+#!/usr/bin/env dart


Probably cleanest to leave the shebang line (and the executable flag) off of this file, too. Then it's clear there's one intended way to invoke this functionality, via the check-features wrapper script.

gnprice · 2024-10-11T01:15:57Z

tools/content/fetch_messages.dart

+  exit(0);
+}


Does this exit call differ from simply falling off the end of main here and returning?

gnprice · 2024-10-11T01:21:00Z

tools/content/fetch_messages.dart

+  final fetchNewer = parsedArguments['fetch-newer'] as bool;
+  int? anchorMessageId;


In a CLI program I think it's generally helpful to split the parsing of the command line (and of config that's closely tied to the command line) from the rest of the logic.

So between these two lines is basically where that transition happens — we're done inspecting parsedArguments (and parsedConfig), and we're starting to do the work and to have side effects. It'd therefore be good to take the bottom half of the function, below this point, and move it to its own function; main would end by calling that function, passing a bunch of arguments like email and outputDirStr.

gnprice · 2024-10-11T01:22:32Z

tools/content/model.dart

+import 'package:json_annotation/json_annotation.dart';
+
+/// A data structure representing a message.
+@JsonSerializable()


It looks like this JsonSerializable (and so the json_annotation import) doesn't end up getting used.

gnprice · 2024-10-11T01:25:02Z

tools/content/unimplemented_features_test.dart

+/// * lib/model/content.dart, which implements of the content parser.
+/// * tools/content/fetch_messages.dart, which produces the corpuses.
+void main() async {
+  Future<void> checkForUnimplementedFeatureInFile(File file) async {


nit:

Suggested change

Future<void> checkForUnimplementedFeatureInFile(File file) async {

Future<void> checkForUnimplementedFeaturesInFile(File file) async {

It's checking for any possible unimplemented features, right? Vs. some particular unimplemented feature.

gnprice · 2024-10-11T01:29:01Z

tools/content/unimplemented_features_test.dart

+    if (htmlNode.className.isEmpty) {
+      featureName = '<${htmlNode.localName!}>';
+    } else {
+      featureName = '<${htmlNode.localName!} class="${htmlNode.classes.join(" ")}">';


Suggested change

featureName = '<${htmlNode.localName!} class="${htmlNode.classes.join(" ")}">';

featureName = '<${htmlNode.localName!} class="${htmlNode.className}">';

Or is the difference useful?

(className is the original data; classes is a view on it.)

The classes view just trims each individual name, I think it is not necessary here.

PIG208 · 2024-10-11T21:12:01Z

Thanks for the review! The PR has been updated.

gnprice

OK, I've read through the whole thing — and I've also now run it myself, and browsed through the output. Thanks again @PIG208 for building this!

Below are a few small comments. Just one (the last) affects behavior; it's a papercut I ran into when using the script.

Then after these small things are fixed, I'd like to go ahead and merge this. It works and has served its purpose; any deeper changes can wait until a possible future where we're using this script more extensively.

(I discovered a few new tidbits from reading through the output myself: #921 (comment) . Definitely glad the script makes nice detailed output to look at.)

gnprice · 2024-10-17T22:55:57Z

tools/content/fetch_messages.dart

+  required bool fetchNewer,
+}) async {
+  int? anchorMessageId;
+  IOSink output = stdout;


This initializer never gets used now. Can delete this line, and just say final output = … below.

gnprice · 2024-10-17T23:12:22Z

tools/content/fetch_messages.dart

+
+// Avoid any Flutter-related dependencies so this can be run as a CLI program.
+import 'package:args/args.dart';
+import 'package:http/http.dart';


nit:

Suggested change

import 'package:http/http.dart';

import 'package:http/http.dart' as http;

Otherwise the names like Client are too generic.

gnprice · 2024-10-17T23:15:57Z

tools/content/fetch_messages.dart

+      // This fallback will only be used when first fetching from a server.
+      'anchor': anchorMessageId != null ? jsonEncode(anchorMessageId) : 'newest',


nit: this logic really makes most sense in the caller

Suggested change

// This fallback will only be used when first fetching from a server.

'anchor': anchorMessageId != null ? jsonEncode(anchorMessageId) : 'newest',

'anchor': anchorString,

Then newest effectively assumes fetchNewer is false, right? Probably just say oldest instead if fetchNewer is true.

gnprice · 2024-10-17T23:24:06Z

tools/content/unimplemented_features_test.dart

+    final outputLines = <String>[];
+    int failedMessageCount = 0;
+    if (messageIdsByFeature.isNotEmpty) {
+      failedMessageCount = messageIdsByFeature.values.map((x) => x.length).reduce((a, b) => a + b);


nit: line too long (the a + b, which is a critical piece of its meaning, is past 80 columns)

Could put .reduce on the next line; or could use .sum, from package:collection.

gnprice · 2024-10-17T23:27:07Z

tools/content/unimplemented_features_test.dart

+      // `_walk` modifies `messageIdsByFeature` and `contentsByFeature`
+      // in-place.
+      _walk(message.id, parseContent(message.content).toDiagnosticsNode(),
+        messageIdsByFeature: messageIdsByFeature,
+        contentsByFeature: contentsByFeature);


This structure works fine, and it's not worth spending time to rework it in this PR — this script works, so I'd like to merge it after just fixing some nits.

But FWIW a useful alternative way to structure this sort of thing is to make a class that would have messageIdsByFeature and contentsByFeature (and for that matter totalMessageCount) as fields. Then we'd _walk would become a method on that class; we'd construct an instance of that class just before the loop, and call the method in the loop.

gnprice · 2024-10-17T23:29:45Z

tools/content/unimplemented_features_test.dart

+    // This buffer allows us to avoid using prints directly.
+    final outputLines = <String>[];


nit: I think the usual Dart solution here would be:

Suggested change

// This buffer allows us to avoid using prints directly.

final outputLines = <String>[];

final buf = StringBuffer();

and then buf.write('foo\n') below in place of outputLines.add('foo').

That's a little cleaner, partly because it's more generic: it doesn't need \n to be treated differently from other characters.

gnprice · 2024-10-17T23:48:42Z

tools/content/unimplemented_features_test.dart

+      outputLines.addAll([
+        'Unsupported feature #$unsupportedCounter: $featureName',
+        'message IDs:\n${messageIdsByFeature[featureName]!.join(', ')}',


Suggested change

outputLines.addAll([

'Unsupported feature #$unsupportedCounter: $featureName',

'message IDs:\n${messageIdsByFeature[featureName]!.join(', ')}',

final messageIds = messageIdsByFeature[featureName]!;

outputLines.addAll([

'Unsupported feature #$unsupportedCounter: $featureName',

'message IDs (up to 100): ${messageIds.take(100).join(', ')}',

As is, this list is long enough that it gets in the way for very high-frequency unimplemented features, like Twitter previews.

Additionally, args is made a direct dev dependency. We will later use them to write CLI scripts for fetching messages. Signed-off-by: Zixuan James Li <[email protected]>

… features. We added 2 scripts and a wrapper for them both. - fetch_messages.dart, the script that fetches messages from a given Zulip server, that does not depend on Flutter or other involved Zulip Flutter packages, so that it can run without Flutter. It is meant to be run first to produce the corpora needed for surveying the unimplemented features. The fetched messages are formatted in JSON Lines format, where each individual entry is JSON containing the message ID and the rendered HTML content. The script stores output in separate files for messages from each server, because message IDs are not unique across them. - unimplemented_features_test.dart, a test that goes over all messages collected, parses then with the content parser, and report the unimplemented features it discovered. This is implemented as a test mainly because of its dependency on the content parser, which depends on the Flutter engine (and `flutter test` conveniently sets up a test device). We mostly avoid prints (https://dart.dev/tools/linter-rules/avoid_print) in both scripts. While we don't lose much by disabling this lint rule for them, because they are supposed to be CLI programs after all, the rule (potentially) helps with reducing developer inclination to be verbose. See comments from the scripts for more details on the implementations. ===== Some main benefits of having the wrapper script to access dart code are that we can provide a more intuitive interface consistent with other tools, for fetching message corpora and/or running the check for unimplemented features. Very rarely, you might want to use fetch_messages.dart directly, to use the `fetch-newer` flag for example to update an existing corpus file. If we find it helpful, the flag can be added to check-features as well, but we are skipping that for now. The script is intended to be run manually, not as a part of the CI, because it is very slow, and it relies on some out of tree files like API configs (zuliprc files) and big dumps of chat history. For the most part, we intend to only keep the detailed explanations in the underlying scripts close to the implementation, and selectively repeat some of the helpful information in the wrapper. This also repeats some easy checks for options, so that we can produce nicer error messages for some common errors (like missing zuliprc for `fetch`). Fixes: zulip#190 Signed-off-by: Zixuan James Li <[email protected]>

PIG208 · 2024-10-21T21:28:33Z

Thanks for the review! I have updated the PR skipping the _walk refactor, which conceptually sounds like a good practice to be aware of.

gnprice · 2024-10-22T01:34:20Z

Thanks for the revision! All looks good — merging.

PIG208 changed the title ~~Systematically survey unimplemented message content features.~~ Systematically survey message content for unimplemented features. Aug 27, 2024

PIG208 force-pushed the pr-check-features branch 4 times, most recently from 95688ba to 5728c5f Compare August 29, 2024 06:25

PIG208 mentioned this pull request Aug 29, 2024

Handle the case when a code block has no grandchildren #919

Merged

PIG208 force-pushed the pr-check-features branch 2 times, most recently from e9133c3 to 43682cd Compare August 29, 2024 06:54

PIG208 marked this pull request as ready for review August 29, 2024 06:54

PIG208 added the maintainer review PR ready for review by Zulip maintainers label Aug 29, 2024

PIG208 requested a review from chrisbobbe August 29, 2024 06:56

PIG208 force-pushed the pr-check-features branch 2 times, most recently from ba7190e to b0d8d5a Compare August 29, 2024 18:38

PIG208 linked an issue Aug 29, 2024 that may be closed by this pull request

Systematically survey message content for unimplemented features #190

Closed

PIG208 removed the maintainer review PR ready for review by Zulip maintainers label Aug 29, 2024

PIG208 removed the request for review from chrisbobbe August 29, 2024 19:09

PIG208 force-pushed the pr-check-features branch from e87250c to 1a20f60 Compare August 30, 2024 01:24

PIG208 force-pushed the pr-check-features branch from 1a20f60 to 66466f7 Compare August 30, 2024 03:34

PIG208 mentioned this pull request Aug 30, 2024

☂️ Handle remaining unimplemented content parsing features #921

Closed

14 tasks

PIG208 force-pushed the pr-check-features branch 2 times, most recently from fbc0745 to 0bc1c06 Compare September 7, 2024 04:18

PIG208 force-pushed the pr-check-features branch 3 times, most recently from c6e9c88 to ad9f338 Compare September 9, 2024 20:03

PIG208 removed the request for review from chrisbobbe September 13, 2024 17:38

PIG208 removed the maintainer review PR ready for review by Zulip maintainers label Sep 13, 2024

PIG208 force-pushed the pr-check-features branch from be9d79a to fb790a7 Compare September 25, 2024 07:13

PIG208 requested a review from chrisbobbe September 25, 2024 07:13

PIG208 assigned chrisbobbe Sep 25, 2024

PIG208 added the maintainer review PR ready for review by Zulip maintainers label Sep 25, 2024

chrisbobbe assigned gnprice Oct 8, 2024

chrisbobbe requested a review from gnprice October 8, 2024 22:06

chrisbobbe added integration review Added by maintainers when PR may be ready for integration and removed maintainer review PR ready for review by Zulip maintainers labels Oct 8, 2024

gnprice reviewed Oct 11, 2024

View reviewed changes

PIG208 mentioned this pull request Oct 11, 2024

deps: Reduce flutter_checks and legacy_checks to dev dependencies #994

Merged

PIG208 force-pushed the pr-check-features branch from fb790a7 to 8b662c7 Compare October 11, 2024 19:52

This was referenced Oct 18, 2024

Handle "tex-error" spans #1003

Open

Handle old forms of HTML for math/TeX #1004

Open

gnprice reviewed Oct 18, 2024

View reviewed changes

PIG208 added 2 commits October 21, 2024 13:47

deps: Add ini as a direct dev dependency.

a2c94cc

Additionally, args is made a direct dev dependency. We will later use them to write CLI scripts for fetching messages. Signed-off-by: Zixuan James Li <[email protected]>

PIG208 force-pushed the pr-check-features branch from 8b662c7 to 1540427 Compare October 21, 2024 21:24

gnprice mentioned this pull request Oct 21, 2024

Handle message_embed website previews #1016

Closed

gnprice merged commit 1540427 into zulip:main Oct 22, 2024
1 check passed

PIG208 deleted the pr-check-features branch October 22, 2024 14:30

		ini: ^2.1.0
		# Keep list sorted when adding dependencies; it helps prevent merge conflicts.

		@@ -0,0 +1,147 @@
		// Override `fluter test`'s default timeout

		final fetchNewer = parsedArguments['fetch-newer'] as bool;
		int? anchorMessageId;

	Future<void> checkForUnimplementedFeatureInFile(File file) async {
	Future<void> checkForUnimplementedFeaturesInFile(File file) async {

	featureName = '<${htmlNode.localName!} class="${htmlNode.classes.join(" ")}">';
	featureName = '<${htmlNode.localName!} class="${htmlNode.className}">';

	import 'package:http/http.dart';
	import 'package:http/http.dart' as http;

		// This fallback will only be used when first fetching from a server.
		'anchor': anchorMessageId != null ? jsonEncode(anchorMessageId) : 'newest',

	// This fallback will only be used when first fetching from a server.
	'anchor': anchorMessageId != null ? jsonEncode(anchorMessageId) : 'newest',
	'anchor': anchorString,

		// This buffer allows us to avoid using prints directly.
		final outputLines = <String>[];

	// This buffer allows us to avoid using prints directly.
	final outputLines = <String>[];
	final buf = StringBuffer();

Systematically survey message content for unimplemented features. #917

Systematically survey message content for unimplemented features. #917

Uh oh!

Conversation

PIG208 commented Aug 27, 2024

Uh oh!

PIG208 commented Aug 29, 2024

Uh oh!

PIG208 commented Aug 29, 2024

Uh oh!

PIG208 commented Aug 29, 2024

Uh oh!

chrisbobbe commented Aug 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PIG208 commented Aug 30, 2024

Uh oh!

PIG208 commented Sep 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrisbobbe commented Oct 8, 2024

Uh oh!

gnprice left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PIG208 commented Oct 11, 2024

Uh oh!

gnprice left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PIG208 commented Oct 21, 2024

Uh oh!

gnprice commented Oct 22, 2024

Uh oh!

Uh oh!

Uh oh!

chrisbobbe commented Aug 30, 2024 •

edited

Loading

PIG208 commented Sep 7, 2024 •

edited

Loading