-
-
Notifications
You must be signed in to change notification settings - Fork 164
ref(metrics): Add normalization and update set metrics hashing #658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5f41028
to
801d7ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks correct and has good test coverage, so no problem there.
My main concerns are:
- performance, quite a bit so
- dependencies
At the very least, you should follow the performance best practices of the regex
crate here: https://docs.rs/regex/latest/regex/#avoid-re-compiling-regexes-especially-in-a-loop
I would consider that a blocker, as you are compiling the same regex
over and over again in every single call.
Both for performance and dependencies: Do we really care that tags are limited to N graphemes? Why not unicode char
s or even bytes? As in a bunch of cases (keys mainly?) we are filtering for ASCII chars anyway, iterating over graphemes is absolute overkill.
Otherwise, there is a bunch of String
allocation happening all over the place which might be avoidable. Those are just a drop in the bucket compared to the regex
topic mentioned above, so it might not be worth micro-optimizing the last drop of performance out of this.
Speaking of micro-optimization, this very much reminds me of the work I did recently in rust-lang/rust#121150 which I see is about to land today 🎉
Thanks @Swatinem! I will optimize the regex usage. The graphemes can be removed for tag keys since, as you mentioned, we already filter away multi-byte characters. But for tag values, which allow the full UTF-8 character range, the code will panic if we truncate the tag value in the middle of a multi-byte character. Do you have any suggestion for how to truncate the tag value safely without graphemes? Maybe we can catch the panic and then reduce the truncation with 1 byte until it works? |
"catching panics" is not really a thing because people can (and often do) use
Depending on what our goal here is (bytes, chars or graphemes), you might as well just copy over the underlying implementation from |
Great thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but please wait for Arpad's approve to merge and release
c08c0b4
to
c8c0aeb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good. I have one criticism, and it's admittedly nitpicky: I don't think the From
impls on NormalizedName/Tags/Unit
are appropriate. Specifically, they are neither lossless nor value-preserving, as per https://doc.rust-lang.org/std/convert/trait.From.html#when-to-implement-from. I think free functions normalize_name/tags/unit
exported from the normalization
module would be better here.
90d0182
to
77f1a42
Compare
77f1a42
to
835f025
Compare
@Swatinem Fixed the regex optimization, removed graphemes, and reduced the number of new string allocations 👍 |
@loewenheim On it! |
cab00f7
to
26e4893
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #658 +/- ##
==========================================
+ Coverage 73.09% 73.56% +0.46%
==========================================
Files 62 66 +4
Lines 7448 7727 +279
==========================================
+ Hits 5444 5684 +240
- Misses 2004 2043 +39 |
@loewenheim I removed the name and unit structs and just use a function instead now. For the tags, I export a function but keep the struct. Does this look ok? |
26e4893
to
75179bf
Compare
75179bf
to
f3e7a57
Compare
@@ -31,9 +31,12 @@ UNSTABLE_cadence = ["dep:cadence", "UNSTABLE_metrics"] | |||
|
|||
[dependencies] | |||
cadence = { version = "0.29.0", optional = true } | |||
crc32fast = "1.4.0" | |||
itertools = "0.13.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we still using this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, crc32 is used for hashing set values and itertools is used for sorting and joining the metric tags! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to #659, we should make both these dependencies optional.
Also instead of manually sorting the metric tags, how about you switch to a BTreeMap
which is sorted by definition?
Add metrics normalization in accordance with the metrics developer docs.
Add metric name, unit, and tag truncation to adhere to the metrics user docs.
Add
to_envelope
for a singleMetric
instance to facilitate sending metrics from sentry-cli.Change hash function to crc32 as described here, this ensures compatibility between different SDKs using the same metric.
Use
clone_from
instead ofclone
andclone_into
instead ofto_owned
in a couple of places as suggested by clippy.