Skip to content

feat(replay): Use new afterSend hook to improve error linking #7390

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 16, 2023

Conversation

mydea
Copy link
Member

@mydea mydea commented Mar 9, 2023

This replaces our current way of connection errors & replays with a new hook afterSendErrorEvent.

This hook is sent after transport.send() has completed, and receives the event & response as arguments.

When we get a successful (e.g. statusCode >200<300) response, we add the event id to errorIds and eventually sample the error session.

This has one main wrinkle, which is that when users provide a custom transport, this may not respond with the new TransportMakeRequestResponse but can also just return void. This will change in V8, but until then, a custom transport may not be implementing this.

To avoid us never connecting an error with a replay in such a case, we detect if the transport is not the base transport (which is guaranteed to return a non-void response), we circumvent this check and simply always connect the error. This may connect an error that has been dropped, but there is no way for us to know that for certain. IMHO this is an acceptable outcome.

Note that for the case where a custom client is provided that does not implement hooks (unlikely, but possible), we can quite easily make this work by falling back to using the functionality in the global event processor.

This should fix/improve the following behaviors:

  • Errors that are dropped via e.g. beforeSend or a later-registered global event processor will not trigger an error session anymore
  • Errors that have not been sent successfully to Sentry (e.g. rate limited, ...) will not trigger an error session anymore
  • Errors that have been not been sent successfully to sentry (e.g. API error, ...) will not be added to errorIds anymore
  • We do not depend on event processor order anymore, e.g. the linking always runs after any regular event processor (think deduping, ...)
  • Overall, this should avoid us sending error-sessions to Sentry that have no corresponding error in Sentry.

@mydea mydea added the Package: replay Issues related to the Sentry Replay SDK label Mar 9, 2023
@mydea mydea requested review from billyvg, Lms24 and AbhiPrasad March 9, 2023 12:01
@mydea mydea self-assigned this Mar 9, 2023

if (hasHooks) {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
(client as BaseClient<any>).on('afterSendErrorEvent', handleAfterSendError(replay));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a more general hook that is called afterSendEvent? Then in handleAfterSendError we can check for isErrorEvent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wasn't entirely sure here 🤔 I can make it more generic. My main reasoning was to avoid the beforeSend/beforeSendTransaction naming mess we have, down the line 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for generic -- we'll want to do something similar with traceIds, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make it generic, but please note that this has downsides to - as Event can also be a replay event or profile event, we'll have to guard this everywhere etc., even though actually as of now these events go through a different code path and can't even get in there.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2023

size-limit report 📦

Path Size
@sentry/browser - ES5 CDN Bundle (gzipped + minified) 20.46 KB (+0.28% 🔺)
@sentry/browser - ES5 CDN Bundle (minified) 63.42 KB (+0.2% 🔺)
@sentry/browser - ES6 CDN Bundle (gzipped + minified) 19.02 KB (+0.19% 🔺)
@sentry/browser - ES6 CDN Bundle (minified) 56.32 KB (+0.18% 🔺)
@sentry/browser - Webpack (gzipped + minified) 20.69 KB (+0.18% 🔺)
@sentry/browser - Webpack (minified) 67.54 KB (+0.16% 🔺)
@sentry/react - Webpack (gzipped + minified) 20.71 KB (+0.18% 🔺)
@sentry/nextjs Client - Webpack (gzipped + minified) 52.16 KB (+0.08% 🔺)
@sentry/browser + @sentry/tracing - ES5 CDN Bundle (gzipped + minified) 33.72 KB (+0.18% 🔺)
@sentry/browser + @sentry/tracing - ES6 CDN Bundle (gzipped + minified) 26.05 KB (+0.16% 🔺)
@sentry/replay ES6 CDN Bundle (gzipped + minified) 43.5 KB (+0.13% 🔺)
@sentry/replay - Webpack (gzipped + minified) 37.57 KB (+0.25% 🔺)
@sentry/browser + @sentry/tracing + @sentry/replay - ES6 CDN Bundle (gzipped + minified) 61.69 KB (+0.17% 🔺)
@sentry/browser + @sentry/replay - ES6 CDN Bundle (gzipped + minified) 54.75 KB (+0.17% 🔺)


if (hasHooks) {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
(client as BaseClient<any>).on('afterSendErrorEvent', handleAfterSendError(replay));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for generic -- we'll want to do something similar with traceIds, no?

Copy link
Member

@Lms24 Lms24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good from my point of view!

Comment on lines 7 to 19
/*
* This scenario currently shows somewhat unexpected behavior from the PoV of a user:
* The error is dropped, but the recording is started and continued anyway.
* If folks only sample error replays, this will lead to a lot of confusion as the resulting replay
* won't contain the error that started it (possibly none or only additional errors that occurred later on).
*
* This is because in error-mode, we start recording as soon as replay's eventProcessor is called with an error.
* If later event processors or beforeSend drop the error, the recording is already started.
*
* We'll need a proper SDK lifecycle hook (WIP) to fix this properly.
* TODO: Once we have lifecycle hooks, we should revisit this test and make sure it behaves as expected.
* This means that the recording should not be started or stopped if the error that triggered it is not sent.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 👏 👏

@github-actions
Copy link
Contributor

github-actions bot commented Mar 14, 2023

Replay SDK metrics 🚀

    Plain +Sentry +Replay
Revision Value Value Diff Ratio Value Diff Ratio
LCP This PR 603d188 59.66 ms 67.60 ms +7.94 ms +13.32 % 68.90 ms +9.24 ms +15.50 %
Baseline 7399b99 70.49 ms 65.18 ms -5.31 ms -7.53 % 63.63 ms -6.86 ms -9.73 %
CLS This PR 603d188 0.00 ms 0.00 ms 0.00 ms 0.00 % 0.24 ms +0.23 ms +5026.54 %
Baseline 7399b99 0.00 ms 0.00 ms 0.00 ms 0.00 % 0.24 ms +0.23 ms +5077.55 %
CPU This PR 603d188 17.65 % 20.62 % +2.98 pp +16.87 % 36.60 % +18.95 pp +107.41 %
Baseline 7399b99 21.63 % 21.31 % -0.32 pp -1.50 % 37.48 % +15.85 pp +73.24 %
JS heap avg This PR 603d188 3.54 MB 6.9 MB +3.37 MB +95.11 % 11.02 MB +7.48 MB +211.51 %
Baseline 7399b99 3.52 MB 6.89 MB +3.37 MB +95.87 % 10.98 MB +7.46 MB +212.18 %
JS heap max This PR 603d188 3.89 MB 8.33 MB +4.44 MB +114.08 % 14.18 MB +10.29 MB +264.51 %
Baseline 7399b99 3.88 MB 8.29 MB +4.41 MB +113.73 % 13.88 MB +10 MB +257.86 %
netTx This PR 603d188 0 B 360.46 kB +360.46 kB n/a 107.67 kB +107.67 kB n/a
Baseline 7399b99 0 B 360.47 kB +360.47 kB n/a 107.04 kB +107.04 kB n/a
netRx This PR 603d188 17.79 kB 19.06 kB +1.28 kB +7.18 % 16.61 kB -1.18 kB -6.64 %
Baseline 7399b99 16.4 kB 20.34 kB +3.94 kB +24.01 % 19 kB +2.6 kB +15.83 %
netCount This PR 603d188 1 2 +1 +100.00 % 5 +4 +400.00 %
Baseline 7399b99 1 2 +1 +100.00 % 5 +4 +400.00 %
netTime This PR 603d188 333.60 ms 357.82 ms +24.22 ms +7.26 % 500.75 ms +167.15 ms +50.11 %
Baseline 7399b99 372.87 ms 432.39 ms +59.52 ms +15.96 % 609.92 ms +237.05 ms +63.57 %

Baseline results on branch: develop

RevisionLCPCLSCPUJS heap avgJS heap maxnetTxnetRxnetCountnetTime
b278429-14.33 ms+0.25 ms+15.16 pp+7.66 MB+10.24 MB+107.71 kB-706 B+4+278.27 ms
ef6b3c7-7.70 ms+0.25 ms+16.28 pp+7.64 MB+10.02 MB+107.46 kB+57 B+4+222.84 ms
ef6b3c7-10.10 ms+0.23 ms+15.85 pp+7.66 MB+10.56 MB+107.49 kB+836 B+4+209.40 ms
b1ef00d+5.17 ms+0.26 ms+15.95 pp+7.59 MB+10.18 MB+107.02 kB-220 B+4+306.46 ms
42e542e+12.94 ms+0.25 ms+18.24 pp+7.8 MB+10.31 MB+107.39 kB-2.17 kB+4+289.66 ms
2f3c93c+6.35 ms+0.26 ms+16.82 pp+7.44 MB+9.99 MB+106.58 kB+1.53 kB+4+303.23 ms
4bff5a9+0.60 ms+0.26 ms+15.70 pp+7.44 MB+10.2 MB+106.93 kB-2.04 kB+4+163.67 ms
ba99e7c-0.34 ms+0.23 ms+16.45 pp+7.17 MB+10.05 MB+106.7 kB+2.46 kB+4+119.03 ms
a70376e-7.64 ms+0.25 ms+17.86 pp+6.5 MB+10.21 MB+106.82 kB-35 B+4+321.76 ms

Previous results on branch: fn/afterSend

RevisionLCPCLSCPUJS heap avgJS heap maxnetTxnetRxnetCountnetTime
7399b99-6.86 ms+0.23 ms+15.85 pp+7.46 MB+10 MB+107.04 kB+2.6 kB+4+237.05 ms
b278429-14.33 ms+0.25 ms+15.16 pp+7.66 MB+10.24 MB+107.71 kB-706 B+4+278.27 ms
ef6b3c7-7.70 ms+0.25 ms+16.28 pp+7.64 MB+10.02 MB+107.46 kB+57 B+4+222.84 ms
ef6b3c7-10.10 ms+0.23 ms+15.85 pp+7.66 MB+10.56 MB+107.49 kB+836 B+4+209.40 ms
b1ef00d+5.17 ms+0.26 ms+15.95 pp+7.59 MB+10.18 MB+107.02 kB-220 B+4+306.46 ms
42e542e+12.94 ms+0.25 ms+18.24 pp+7.8 MB+10.31 MB+107.39 kB-2.17 kB+4+289.66 ms
2f3c93c+6.35 ms+0.26 ms+16.82 pp+7.44 MB+9.99 MB+106.58 kB+1.53 kB+4+303.23 ms
4bff5a9+0.60 ms+0.26 ms+15.70 pp+7.44 MB+10.2 MB+106.93 kB-2.04 kB+4+163.67 ms
ba99e7c-0.34 ms+0.23 ms+16.45 pp+7.17 MB+10.05 MB+106.7 kB+2.46 kB+4+119.03 ms
a70376e-7.64 ms+0.25 ms+17.86 pp+6.5 MB+10.21 MB+106.82 kB-35 B+4+321.76 ms

*) pp - percentage points - an absolute difference between two percentages.
Last updated: Thu, 16 Mar 2023 13:04:53 GMT

@mydea mydea force-pushed the fn/afterSend branch 2 times, most recently from 5f06d61 to 467671f Compare March 16, 2023 12:20
@mydea mydea merged commit 7aa20d0 into develop Mar 16, 2023
@mydea mydea deleted the fn/afterSend branch March 16, 2023 13:19
@lucas-zimerman
Copy link
Contributor

lucas-zimerman commented Apr 27, 2023

This has one main wrinkle, which is that when users provide a custom transport, this may not respond with the new TransportMakeRequestResponse but can also just return void. This will change in V8, but until then, a custom transport may not be implementing this.

Wish this was on the changelog because it took me a while to figure out why it stopped sending Replays since version 4.45 :C

@mydea
Copy link
Member Author

mydea commented Apr 27, 2023

This has one main wrinkle, which is that when users provide a custom transport, this may not respond with the new TransportMakeRequestResponse but can also just return void. This will change in V8, but until then, a custom transport may not be implementing this.

Wish this was on the changelog because it took me a while to figure out why it stopped sending Replays since version 4.45 :C

Hmm, we should actually be handling this case - we fall back to the "old" behavior in that case. What exactly is happening for you? If replays are not being sent when errors happen, that would be a bug!

@lucas-zimerman
Copy link
Contributor

lucas-zimerman commented Apr 27, 2023

This has one main wrinkle, which is that when users provide a custom transport, this may not respond with the new TransportMakeRequestResponse but can also just return void. This will change in V8, but until then, a custom transport may not be implementing this.

Wish this was on the changelog because it took me a while to figure out why it stopped sending Replays since version 4.45 :C

Hmm, we should actually be handling this case - we fall back to the "old" behavior in that case. What exactly is happening for you? If replays are not being sent when errors happen, that would be a bug!

It's kinda hard to debug but What I noticed is the replay integration starts as usual but doesn't flush/send replays when the send funciton is void

The issue could be here
https://github.com/getsentry/sentry-javascript/pull/7390/files#diff-a9838d39c7470a876de4e928887c7f8fcab924f9aecf69acbe20d6fc9a8fbcdeR53-R55

Because I noticed that afterSendHandler was undefined on my case

In this case I am only altering the Transport, I am not touching the default client/hub

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Package: replay Issues related to the Sentry Replay SDK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants