-
Notifications
You must be signed in to change notification settings - Fork 310
store: Add PerAccountStore.users data structure, as Map<int, User> #84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I guess those numbers do include the time spent in the new from-JSON code, but not the time spent assembling |
(And this is on CZO where as we know there are very many users.) |
Good idea! Taking the medians of those trials (select them so they're on my clipboard, then Getting some times that include the |
Though that's also a lot smaller than the variation within both sets of samples. So really what the data says is that the cost is smaller than this experiment could detect β which means smaller than something like 500ms. |
Sure: diff --git lib/model/store.dart lib/model/store.dart
index 051883a87..05c0f027f 100644
--- lib/model/store.dart
+++ lib/model/store.dart
@@ -319,11 +319,16 @@ class LivePerAccountStore extends PerAccountStore {
// TODO log the time better
if (kDebugMode) print("initial fetch time: ${t.inMilliseconds}ms");
+ stopwatch.start();
final store = LivePerAccountStore.fromInitialSnapshot(
account: account,
connection: connection,
initialSnapshot: initialSnapshot,
);
+ final t2 = (stopwatch..stop()).elapsed;
+ // TODO log the time better
+ if (kDebugMode) print("time with LivePerAccountStore.fromInitialSnapshot: ${t2.inMilliseconds}ms");
+
store.poll();
return store;
}
So the time spent in Before:
After:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for building this! Comments below: some nits and an optimization.
From those timings, it looks like constructing the users map in LivePerAccountStore.fromInitialSnapshot
is taking about 5-8 ms. That's pretty OK β especially given that the users are by far the biggest swath of the initial data in a large realm.
Specifically on CZO:
$ curl -sX POST --netrc-file ~/z/.netrc https://chat.zulip.org/api/v1/register >data.json
$ <data.json jq '.realm_users + .realm_non_active_users + .cross_realm_bots' -c | wc -c
11386100
$ <data.json jq . -c | wc -c
12480847
they're 11.4MB of the total 12.5MB, so 91% as measured by bytes of JSON.
I guess the other thing to improve in the initial-fetch timing would be to separate the time spent (a) in the network request and (b) decoding JSON from that spent (c) in fromJson
. The former two are largely out of our control (short of server or API changes to reduce the amount of data sent and/or the time the server spends preparing the data), but the last one is a potential source of bottlenecks we introduce, and a potential site of optimizations.
No need to do that as part of this PR, though.
lib/model/store.dart
Outdated
if (user.profileData == null) { | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (user.profileData == null) { | |
return; | |
} | |
final profileData = (user.profileData ??= {}); |
Seems like if we do get an event saying to add some custom profile field data for a user that has null
there, we may as well play along.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to set user.profileData
with {}
is getting flagged by the analyzer because it's final
:
'profileData' can't be used as a setter because it's final. (Documentation)
Try finding a different setter, or making 'profileData' non-final.
It also has an auto-fix button that says "Add 'late' modifier".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it'd go with making it non-final
. This is a mutable class in general β the rest of the fields aren't final
β so that's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a mutable class in general β the rest of the fields aren't
final
β so that's fine.
Sure. I think it's reasonable to prefer making a property that's a Map
final
, so that all the code using the property will definitely be working with the same Map
instance.
But here there's this performance thing that involves not making it final
, and anyway we're not thinking of introducing multiple Map
instances, we just want to introduce one instance in the case where we don't have one yet.
lib/api/model/model.dart
Outdated
String timezone; | ||
String? avatarUrl; // TODO distinguish null from missing https://chat.zulip.org/#narrow/stream/243-mobile-team/topic/flutter.3A.20omitted.20vs.2E.20null.20in.20JSON/near/1551759 | ||
int avatarVersion; | ||
final Map<int, ProfileFieldUserData>? profileData; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A further tweak, which would represent an optimization for us, is that when we get {}
for profile_data
, we could represent it as null
. This won't even really make any other code more complicated, because when consuming this we already have to handle the possibility that it's null. (It requires the simplification from my previous comment that uses ??= {}
on getting an event to update one of these.)
That should be doable with a small readValue
, I think.
The advantage of doing it that way is that we avoid allocating a huge number of LinkedHashMap
data structures that we can do without. A hash table is inevitably going to involve some overhead (several words, at minimum), even when nothing's stored in it yet.
Here's some quick data from chat.zulip.org:
$ curl -sX POST --netrc-file ~/z/.netrc https://chat.zulip.org/api/v1/register --data-urlencode 'event_types=["realm_user"]' >data.json
$ <data.json jq '.realm_users[] | .profile_data' -c | sort | uniq -c | sort -n | tail -3
47 {"13":{"value":"1"}}
709 null
23957 {}
$ <data.json jq '.realm_users | length'
25870
So of 25870 users in realm_users
, 23957 of them (93%) have profile_data: {}
and this optimization would apply. Doing the calculation more carefully including the other two lists, it's 24635/26822 βΌ 92%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be just as good to do this instead:
late final Map<int, ProfileFieldUserData> profileData = {};
?
From the doc:
When you mark a variable as
late
but initialize it at its declaration, then the initializer runs the first time the variable is used. This lazy initialization is handy in a couple of cases:
- The variable might not be needed, and initializing it is costly.
- [β¦]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That version would still mean that we'd end up allocating these if we ever look at the users' profile data. I'm not sure we necessarily would end up ever looking en masse at all or most users' profile data β but if we did, that'd be on the face of it a pretty innocent bit of code, and so this version of the field would be laying a performance trap that we could later walk into.
For that reason, better to have an explicit null. Then the consuming code will just have to write ?.
for method calls on it and so on (and code that's updating it will need to write ??= {}
); the type-checker means we can't forget to do those, and the performance characteristics will always be straightforward.
aacf44f
to
fd20b71
Compare
Thanks for the review! Revision pushed, with an approach to that performance thing that uses a language feature ( |
lib/api/model/model.dart
Outdated
if (profileData != null && profileData.isNotEmpty) { | ||
profileData.addAll(profileData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copying all the elements of the one Map into another Map. Better to just assign the maps:
this.profileData = profileData;
lib/api/model/model.dart
Outdated
String timezone; | ||
String? avatarUrl; // TODO distinguish null from missing https://chat.zulip.org/#narrow/stream/243-mobile-team/topic/flutter.3A.20omitted.20vs.2E.20null.20in.20JSON/near/1551759 | ||
int avatarVersion; | ||
final Map<int, ProfileFieldUserData>? profileData; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That version would still mean that we'd end up allocating these if we ever look at the users' profile data. I'm not sure we necessarily would end up ever looking en masse at all or most users' profile data β but if we did, that'd be on the face of it a pretty innocent bit of code, and so this version of the field would be laying a performance trap that we could later walk into.
For that reason, better to have an explicit null. Then the consuming code will just have to write ?.
for method calls on it and so on (and code that's updating it will need to write ??= {}
); the type-checker means we can't forget to do those, and the performance characteristics will always be straightforward.
Thanks for the revision! Comments remaining only on the |
Merging the first commit: because it's ready, and to avoid further rebases becoming spammy in the thread of the mentioned issue. We do in our style guide say:
but when the reference is to someone else's GitHub issue β so we get the benefit in commit-message clarity, but they get the noise β then I'm more hesitant. (In this case, I'd probably have left the link in just the code comment, where it doesn't cause GitHub to generate such a backlink.) |
Oh hmm, yeah I guess GitHub makes those backlinks not just for |
fd20b71
to
1ebe080
Compare
Thanks for the review! Revision pushed. |
lib/api/model/model.dart
Outdated
static Map<int, ProfileFieldUserData>? _readProfileData(json, key) { | ||
final value = (json[key] as Map<int, ProfileFieldUserData>?); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this doesn't seem right. See this in the doc:
Note: using this feature does not change any of the subsequent decoding logic for the field. For instance, if the field is of type DateTime we expect the function provided here to return a String.
That's a bit opaque; but more concretely, see the generated code (which this revision is missing, btw):
profileData:
(User._readProfileData(json, 'profile_data') as Map<String, dynamic>?)
?.map(
(k, e) => MapEntry(int.parse(k),
ProfileFieldUserData.fromJson(e as Map<String, dynamic>)),
),
That also reminds me of something I should have thought of earlier: we should have some unit tests for the nontrivial logic in our JSON decoding. No need to have it for all the boring fields β I don't think tests for those would do much good β but for basically the places we're using readValue
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Similar to how we have a test for MessageEvent.fromJson
, in test/api/model/events_test.dart
.)
1ebe080
to
4f9fe3c
Compare
Thanks for the review! Revision pushed, with those added tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Just small comments in the tests; please go ahead and merge after.
test/api/model/model_test.dart
Outdated
check(mkUser({'profile_data': {'1': {'value': 'foo'}}}).profileData) | ||
.isNotNull().entries.single | ||
..has((s) => s.key, 'key').equals(1) | ||
..has((s) => s.value, 'value').isA<ProfileFieldUserData>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A fun equivalent version:
check(mkUser({'profile_data': {'1': {'value': 'foo'}}}).profileData) | |
.isNotNull().entries.single | |
..has((s) => s.key, 'key').equals(1) | |
..has((s) => s.value, 'value').isA<ProfileFieldUserData>(); | |
check(mkUser({'profile_data': {'1': {'value': 'foo'}}}).profileData) | |
.isNotNull().deepEquals({1: it()..isA<ProfileFieldUserData>()}); |
Docs:
https://pub.dev/documentation/checks/latest/checks/MapChecks/deepEquals.html
But further, the isA<ProfileFieldUserData>()
is redundant with type soundness, because this is statically a Map<int, ProfileFieldUserData>?
. So we can just write:
check(mkUser({'profile_data': {'1': {'value': 'foo'}}}).profileData)
.isNotNull().deepEquals({1: it()});
test/api/model/model_test.dart
Outdated
import 'package:zulip/api/model/model.dart'; | ||
|
||
void main() { | ||
group('user', () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
group('user', () { | |
group('User', () { |
test/api/model/model_test.dart
Outdated
}); | ||
|
||
test('is_system_bot', () { | ||
assert(baseJson['is_system_bot'] == null); // could refactor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment meant as a TODO ("we could perhaps stand to refactor this test"), or as an explanation of why the assert is needed ("we could refactor how baseJson
and mkUser
work, and accidentally defeat this test")?
For the latter, I think we can add a test case that effectively covers that:
assert(baseJson['is_system_bot'] == null); // could refactor | |
check(mkUser({}).isSystemBot).equals(null); |
As a bonus, that makes this test no longer care about the internals of how mkUser
does its job, in the form of the relationship of the latter to baseJson
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or as an explanation of why the assert is needed ("we could refactor how
baseJson
andmkUser
work, and accidentally defeat this test")?
Yeah, that's why I included the assert. But I didn't state that reason explicitly (oops); my comment "could refactor" was like a TODO to fix an awkwardness with the test setup: the test wants is_system_bot
to be absent, but it has no way to ensure that, using the available interface (mkUser
's specialJson
arg). I think ideally it would have a way to do that. But I didn't say TODO
because it won't really call out for a refactor until we change baseJson
/mkUser
in a way that trips up the assert
, if we ever do.
As a bonus, that makes this test no longer care about the internals of how
mkUser
does its job, in the form of the relationship of the latter tobaseJson
.
Sure, that sounds good. The awkwardness I mentioned is still there, but at least the check
gives the same guarantee as the assert
did: it makes sure mkUser
won't include isSystemBot
unless asked by the caller. (I guess it now also tests a feature of the User
implementation: it confirms that User.isSystemBot
can be null, which is redundant with the static types.)
test/api/model/model_test.dart
Outdated
final json = Map.of(baseJson); | ||
json.addAll(specialJson); | ||
return User.fromJson(json); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a more fun way to spell this:
final json = Map.of(baseJson); | |
json.addAll(specialJson); | |
return User.fromJson(json); | |
return User.fromJson({ ...baseJson, ...specialJson }); |
Should be pretty much equivalent even in performance terms, too. Here's the implementation of Map.of
:
factory LinkedHashMap.of(Map<K, V> other) =>
LinkedHashMap<K, V>()..addAll(other);
so the current PR revision's version makes an empty map and calls addAll
with the two input maps in succession, and I expect that's basically what the ...
syntax expands to as well.
test/api/model/model_test.dart
Outdated
}); | ||
User mkUser (Map<String, dynamic> specialJson) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
}); | |
User mkUser (Map<String, dynamic> specialJson) { | |
}); | |
User mkUser (Map<String, dynamic> specialJson) { |
4f9fe3c
to
cdfa189
Compare
Thanks for the review! Merging, with those changes. |
This happens automatically if you rerun build_runner. Seems like its omission was basically a mismerge between 901a99e, when it was merged early from zulip#84: zulip#84 (comment) and 780b092 / zulip#22, which added this file. Chalk it up as another occasion where zulip#60 would have helped keep things clean.
This happens automatically if you rerun build_runner. Seems like its omission was basically a mismerge between 901a99e, when it was merged early from zulip#84: zulip#84 (comment) and 780b092 / zulip#22, which added this file. Chalk it up as another occasion where zulip#60 would have helped keep things clean.
Seeing the convenient line
flutter: initial fetch time: β¦
in the logs, I decided to do some trials before and after these commits. π First after, then I went back and did before (i.e.main
9152167).