Skip to content

store: Add PerAccountStore.users data structure, as Map<int, User> #84

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 5, 2023

Conversation

chrisbobbe
Copy link
Collaborator

Seeing the convenient line flutter: initial fetch time: … in the logs, I decided to do some trials before and after these commits. πŸ™‚ First after, then I went back and did before (i.e. main 9152167).

After:

3790ms
2842ms
2774ms
2781ms
2735ms
2851ms
2954ms
2926ms
2888ms
2998ms

Before:

2798ms
2551ms
2916ms
3017ms
2832ms
3032ms
2281ms
3400ms
2868ms
2387ms

@chrisbobbe chrisbobbe requested a review from gnprice April 21, 2023 23:01
@chrisbobbe
Copy link
Collaborator Author

I guess those numbers do include the time spent in the new from-JSON code, but not the time spent assembling PerAccountStore.users, hmm.

@chrisbobbe
Copy link
Collaborator Author

(And this is on CZO where as we know there are very many users.)

@gnprice
Copy link
Member

gnprice commented Apr 21, 2023

Seeing the convenient line flutter: initial fetch time: … in the logs, I decided to do some trials before and after

Good idea!

Taking the medians of those trials (select them so they're on my clipboard, then xclip -o | sort -n | nl -ba, and look at the middle lines), it's 2850ms before and 2870ms after, suggesting a cost of about 20ms. That's milder than I'd hoped, really.

Getting some times that include the fromInitialSnapshot time would be good. Should be a matter of restarting that Stopwatch and then checking it again.

@gnprice
Copy link
Member

gnprice commented Apr 21, 2023

suggesting a cost of about 20ms. That's milder than I'd hoped, really.

Though that's also a lot smaller than the variation within both sets of samples. So really what the data says is that the cost is smaller than this experiment could detect β€” which means smaller than something like 500ms.

@chrisbobbe
Copy link
Collaborator Author

Sure:

diff --git lib/model/store.dart lib/model/store.dart
index 051883a87..05c0f027f 100644
--- lib/model/store.dart
+++ lib/model/store.dart
@@ -319,11 +319,16 @@ class LivePerAccountStore extends PerAccountStore {
     // TODO log the time better
     if (kDebugMode) print("initial fetch time: ${t.inMilliseconds}ms");
 
+    stopwatch.start();
     final store = LivePerAccountStore.fromInitialSnapshot(
       account: account,
       connection: connection,
       initialSnapshot: initialSnapshot,
     );
+    final t2 = (stopwatch..stop()).elapsed;
+    // TODO log the time better
+    if (kDebugMode) print("time with LivePerAccountStore.fromInitialSnapshot: ${t2.inMilliseconds}ms");
+
     store.poll();
     return store;
   }
BEFORE:

flutter: initial fetch time: 2625ms
flutter: time with LivePerAccountStore.fromInitialSnapshot: 2625ms

flutter: initial fetch time: 2572ms
flutter: time with LivePerAccountStore.fromInitialSnapshot: 2572ms

flutter: initial fetch time: 2296ms
flutter: time with LivePerAccountStore.fromInitialSnapshot: 2296ms

flutter: initial fetch time: 2787ms
flutter: time with LivePerAccountStore.fromInitialSnapshot: 2787ms

flutter: initial fetch time: 2510ms
flutter: time with LivePerAccountStore.fromInitialSnapshot: 2510ms

AFTER:

flutter: initial fetch time: 2075ms
flutter: time with LivePerAccountStore.fromInitialSnapshot: 2081ms

flutter: initial fetch time: 2379ms
flutter: time with LivePerAccountStore.fromInitialSnapshot: 2387ms

flutter: initial fetch time: 2138ms
flutter: time with LivePerAccountStore.fromInitialSnapshot: 2144ms

flutter: initial fetch time: 2712ms
flutter: time with LivePerAccountStore.fromInitialSnapshot: 2717ms

flutter: initial fetch time: 2730ms
flutter: time with LivePerAccountStore.fromInitialSnapshot: 2739ms

So the time spent in LivePerAccountStore.fromInitialSnapshot is:

Before:

  • <0.5ms
  • <0.5ms
  • <0.5ms
  • <0.5ms
  • <0.5ms

After:

  • 6ms
  • 8ms
  • 6ms
  • 5ms
  • 9ms

Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for building this! Comments below: some nits and an optimization.

From those timings, it looks like constructing the users map in LivePerAccountStore.fromInitialSnapshot is taking about 5-8 ms. That's pretty OK β€” especially given that the users are by far the biggest swath of the initial data in a large realm.

Specifically on CZO:

$ curl -sX POST --netrc-file ~/z/.netrc https://chat.zulip.org/api/v1/register >data.json
$ <data.json jq '.realm_users + .realm_non_active_users + .cross_realm_bots' -c | wc -c
11386100
$ <data.json jq . -c | wc -c
12480847

they're 11.4MB of the total 12.5MB, so 91% as measured by bytes of JSON.

I guess the other thing to improve in the initial-fetch timing would be to separate the time spent (a) in the network request and (b) decoding JSON from that spent (c) in fromJson. The former two are largely out of our control (short of server or API changes to reduce the amount of data sent and/or the time the server spends preparing the data), but the last one is a potential source of bottlenecks we introduce, and a potential site of optimizations.

No need to do that as part of this PR, though.

Comment on lines 214 to 216
if (user.profileData == null) {
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (user.profileData == null) {
return;
}
final profileData = (user.profileData ??= {});

Seems like if we do get an event saying to add some custom profile field data for a user that has null there, we may as well play along.

Copy link
Collaborator Author

@chrisbobbe chrisbobbe May 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to set user.profileData with {} is getting flagged by the analyzer because it's final:

'profileData' can't be used as a setter because it's final. (Documentation)

Try finding a different setter, or making 'profileData' non-final.

It also has an auto-fix button that says "Add 'late' modifier".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it'd go with making it non-final. This is a mutable class in general β€” the rest of the fields aren't final β€” so that's fine.

Copy link
Collaborator Author

@chrisbobbe chrisbobbe May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mutable class in general β€” the rest of the fields aren't final β€” so that's fine.

Sure. I think it's reasonable to prefer making a property that's a Map final, so that all the code using the property will definitely be working with the same Map instance.

But here there's this performance thing that involves not making it final, and anyway we're not thinking of introducing multiple Map instances, we just want to introduce one instance in the case where we don't have one yet.

String timezone;
String? avatarUrl; // TODO distinguish null from missing https://chat.zulip.org/#narrow/stream/243-mobile-team/topic/flutter.3A.20omitted.20vs.2E.20null.20in.20JSON/near/1551759
int avatarVersion;
final Map<int, ProfileFieldUserData>? profileData;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A further tweak, which would represent an optimization for us, is that when we get {} for profile_data, we could represent it as null. This won't even really make any other code more complicated, because when consuming this we already have to handle the possibility that it's null. (It requires the simplification from my previous comment that uses ??= {} on getting an event to update one of these.)

That should be doable with a small readValue, I think.

The advantage of doing it that way is that we avoid allocating a huge number of LinkedHashMap data structures that we can do without. A hash table is inevitably going to involve some overhead (several words, at minimum), even when nothing's stored in it yet.

Here's some quick data from chat.zulip.org:

$ curl -sX POST --netrc-file ~/z/.netrc https://chat.zulip.org/api/v1/register --data-urlencode 'event_types=["realm_user"]' >data.json
$ <data.json jq '.realm_users[] | .profile_data' -c | sort | uniq -c | sort -n | tail -3
     47 {"13":{"value":"1"}}
    709 null
  23957 {}
$ <data.json jq '.realm_users | length'
25870

So of 25870 users in realm_users, 23957 of them (93%) have profile_data: {} and this optimization would apply. Doing the calculation more carefully including the other two lists, it's 24635/26822 ∼ 92%.

Copy link
Collaborator Author

@chrisbobbe chrisbobbe May 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be just as good to do this instead:

late final Map<int, ProfileFieldUserData> profileData = {};

?

From the doc:

When you mark a variable as late but initialize it at its declaration, then the initializer runs the first time the variable is used. This lazy initialization is handy in a couple of cases:

  • The variable might not be needed, and initializing it is costly.
  • […]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That version would still mean that we'd end up allocating these if we ever look at the users' profile data. I'm not sure we necessarily would end up ever looking en masse at all or most users' profile data β€” but if we did, that'd be on the face of it a pretty innocent bit of code, and so this version of the field would be laying a performance trap that we could later walk into.

For that reason, better to have an explicit null. Then the consuming code will just have to write ?. for method calls on it and so on (and code that's updating it will need to write ??= {}); the type-checker means we can't forget to do those, and the performance characteristics will always be straightforward.

@chrisbobbe chrisbobbe force-pushed the pr-user-data-structure branch from aacf44f to fd20b71 Compare May 1, 2023 18:13
@chrisbobbe
Copy link
Collaborator Author

Thanks for the review! Revision pushed, with an approach to that performance thing that uses a language feature (late final with an initializer at the declaration) but that I'm not yet quite sure about.

Comment on lines 103 to 104
if (profileData != null && profileData.isNotEmpty) {
profileData.addAll(profileData);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copying all the elements of the one Map into another Map. Better to just assign the maps:

  this.profileData = profileData;

String timezone;
String? avatarUrl; // TODO distinguish null from missing https://chat.zulip.org/#narrow/stream/243-mobile-team/topic/flutter.3A.20omitted.20vs.2E.20null.20in.20JSON/near/1551759
int avatarVersion;
final Map<int, ProfileFieldUserData>? profileData;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That version would still mean that we'd end up allocating these if we ever look at the users' profile data. I'm not sure we necessarily would end up ever looking en masse at all or most users' profile data β€” but if we did, that'd be on the face of it a pretty innocent bit of code, and so this version of the field would be laying a performance trap that we could later walk into.

For that reason, better to have an explicit null. Then the consuming code will just have to write ?. for method calls on it and so on (and code that's updating it will need to write ??= {}); the type-checker means we can't forget to do those, and the performance characteristics will always be straightforward.

@gnprice
Copy link
Member

gnprice commented May 3, 2023

Thanks for the revision! Comments remaining only on the profileData performance/efficiency question. See above and also #84 (comment) further above.

@gnprice
Copy link
Member

gnprice commented May 3, 2023

Merging the first commit:
901a99e lint: Suppress unnecessary_cast in generated files

because it's ready, and to avoid further rebases becoming spammy in the thread of the mentioned issue. We do in our style guide say:

You might find that a reference in a commit message causes GitHub to show a noisy list of backlinks in the issue thread, after you make revisions to the branch and push new versions of the commits. Please include the reference anyway, despite that UI bug in GitHub. We'll live with a little noise on the issue threads, and the references are extremely helpful when reading the Git log.

but when the reference is to someone else's GitHub issue β€” so we get the benefit in commit-message clarity, but they get the noise β€” then I'm more hesitant. (In this case, I'd probably have left the link in just the code comment, where it doesn't cause GitHub to generate such a backlink.)

@chrisbobbe
Copy link
Collaborator Author

chrisbobbe commented May 4, 2023

to avoid further rebases becoming spammy in the thread of the mentioned issue.

Oh hmm, yeah I guess GitHub makes those backlinks not just for Fixes: #NNNN (and #NNNN and org/repo#NNNN) but also for full GitHub issue URLs of the form https://github.com/org/repo/issues/NNNN, as in this case. (The GitHub detail page for 901a99e makes it look like I wrote an org/repo#NNNN reference, but I actually wrote a URL.)

@chrisbobbe chrisbobbe force-pushed the pr-user-data-structure branch from fd20b71 to 1ebe080 Compare May 4, 2023 17:46
@chrisbobbe
Copy link
Collaborator Author

Thanks for the review! Revision pushed.

Comment on lines 84 to 85
static Map<int, ProfileFieldUserData>? _readProfileData(json, key) {
final value = (json[key] as Map<int, ProfileFieldUserData>?);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this doesn't seem right. See this in the doc:

Note: using this feature does not change any of the subsequent decoding logic for the field. For instance, if the field is of type DateTime we expect the function provided here to return a String.

That's a bit opaque; but more concretely, see the generated code (which this revision is missing, btw):

      profileData:
          (User._readProfileData(json, 'profile_data') as Map<String, dynamic>?)
              ?.map(
        (k, e) => MapEntry(int.parse(k),
            ProfileFieldUserData.fromJson(e as Map<String, dynamic>)),
      ),

That also reminds me of something I should have thought of earlier: we should have some unit tests for the nontrivial logic in our JSON decoding. No need to have it for all the boring fields β€” I don't think tests for those would do much good β€” but for basically the places we're using readValue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Similar to how we have a test for MessageEvent.fromJson, in test/api/model/events_test.dart.)

@chrisbobbe chrisbobbe force-pushed the pr-user-data-structure branch from 1ebe080 to 4f9fe3c Compare May 4, 2023 20:04
@chrisbobbe
Copy link
Collaborator Author

Thanks for the review! Revision pushed, with those added tests.

Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just small comments in the tests; please go ahead and merge after.

Comment on lines 38 to 41
check(mkUser({'profile_data': {'1': {'value': 'foo'}}}).profileData)
.isNotNull().entries.single
..has((s) => s.key, 'key').equals(1)
..has((s) => s.value, 'value').isA<ProfileFieldUserData>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fun equivalent version:

Suggested change
check(mkUser({'profile_data': {'1': {'value': 'foo'}}}).profileData)
.isNotNull().entries.single
..has((s) => s.key, 'key').equals(1)
..has((s) => s.value, 'value').isA<ProfileFieldUserData>();
check(mkUser({'profile_data': {'1': {'value': 'foo'}}}).profileData)
.isNotNull().deepEquals({1: it()..isA<ProfileFieldUserData>()});

Docs:
https://pub.dev/documentation/checks/latest/checks/MapChecks/deepEquals.html

But further, the isA<ProfileFieldUserData>() is redundant with type soundness, because this is statically a Map<int, ProfileFieldUserData>?. So we can just write:

      check(mkUser({'profile_data': {'1': {'value': 'foo'}}}).profileData)
        .isNotNull().deepEquals({1: it()});

import 'package:zulip/api/model/model.dart';

void main() {
group('user', () {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
group('user', () {
group('User', () {

});

test('is_system_bot', () {
assert(baseJson['is_system_bot'] == null); // could refactor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment meant as a TODO ("we could perhaps stand to refactor this test"), or as an explanation of why the assert is needed ("we could refactor how baseJson and mkUser work, and accidentally defeat this test")?

For the latter, I think we can add a test case that effectively covers that:

Suggested change
assert(baseJson['is_system_bot'] == null); // could refactor
check(mkUser({}).isSystemBot).equals(null);

As a bonus, that makes this test no longer care about the internals of how mkUser does its job, in the form of the relationship of the latter to baseJson.

Copy link
Collaborator Author

@chrisbobbe chrisbobbe May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or as an explanation of why the assert is needed ("we could refactor how baseJson and mkUser work, and accidentally defeat this test")?

Yeah, that's why I included the assert. But I didn't state that reason explicitly (oops); my comment "could refactor" was like a TODO to fix an awkwardness with the test setup: the test wants is_system_bot to be absent, but it has no way to ensure that, using the available interface (mkUser's specialJson arg). I think ideally it would have a way to do that. But I didn't say TODO because it won't really call out for a refactor until we change baseJson/mkUser in a way that trips up the assert, if we ever do.

As a bonus, that makes this test no longer care about the internals of how mkUser does its job, in the form of the relationship of the latter to baseJson.

Sure, that sounds good. The awkwardness I mentioned is still there, but at least the check gives the same guarantee as the assert did: it makes sure mkUser won't include isSystemBot unless asked by the caller. (I guess it now also tests a feature of the User implementation: it confirms that User.isSystemBot can be null, which is redundant with the static types.)

Comment on lines 25 to 27
final json = Map.of(baseJson);
json.addAll(specialJson);
return User.fromJson(json);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a more fun way to spell this:

Suggested change
final json = Map.of(baseJson);
json.addAll(specialJson);
return User.fromJson(json);
return User.fromJson({ ...baseJson, ...specialJson });

Should be pretty much equivalent even in performance terms, too. Here's the implementation of Map.of:

  factory LinkedHashMap.of(Map<K, V> other) =>
      LinkedHashMap<K, V>()..addAll(other);

so the current PR revision's version makes an empty map and calls addAll with the two input maps in succession, and I expect that's basically what the ... syntax expands to as well.

Comment on lines 23 to 25
});
User mkUser (Map<String, dynamic> specialJson) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
});
User mkUser (Map<String, dynamic> specialJson) {
});
User mkUser (Map<String, dynamic> specialJson) {

@chrisbobbe chrisbobbe force-pushed the pr-user-data-structure branch from 4f9fe3c to cdfa189 Compare May 5, 2023 16:00
@chrisbobbe
Copy link
Collaborator Author

Thanks for the review! Merging, with those changes.

@chrisbobbe chrisbobbe merged commit cdfa189 into zulip:main May 5, 2023
@chrisbobbe chrisbobbe deleted the pr-user-data-structure branch May 5, 2023 16:01
gnprice added a commit to gnprice/zulip-flutter that referenced this pull request May 30, 2023
This happens automatically if you rerun build_runner.  Seems like
its omission was basically a mismerge between 901a99e, when it
was merged early from zulip#84:
  zulip#84 (comment)
and 780b092 / zulip#22, which added this file.

Chalk it up as another occasion where zulip#60 would have helped keep
things clean.
gnprice added a commit to gnprice/zulip-flutter that referenced this pull request May 31, 2023
This happens automatically if you rerun build_runner.  Seems like
its omission was basically a mismerge between 901a99e, when it
was merged early from zulip#84:
  zulip#84 (comment)
and 780b092 / zulip#22, which added this file.

Chalk it up as another occasion where zulip#60 would have helped keep
things clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants