-
Notifications
You must be signed in to change notification settings - Fork 469
fix(profiling): reinstall signal handler if needed #15415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(profiling): reinstall signal handler if needed #15415
Conversation
|
|
a61e958 to
8afa0e9
Compare
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 246 ± 3 ms. The average import time from base is: 248 ± 3 ms. The import time difference between this PR and base is: -2.3 ± 0.1 ms. Import time breakdownThe following import paths have shrunk:
|
Performance SLOsComparing candidate kowalski/fix-profiling-reinstall-signal-handler-if-needed (ba89443) with baseline main (4753e14) 📈 Performance Regressions (2 suites)📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 4.943µs (SLO: <10.000µs 📉 -50.6%) vs baseline: 📈 +20.6% Memory: ✅ 40.305MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +5.1% ✅ ospathbasename_noaspectTime: ✅ 1.086µs (SLO: <10.000µs 📉 -89.1%) vs baseline: -0.4% Memory: ✅ 40.246MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.4% ✅ ospathjoin_aspectTime: ✅ 6.176µs (SLO: <10.000µs 📉 -38.2%) vs baseline: -0.6% Memory: ✅ 40.305MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +4.9% ✅ ospathjoin_noaspectTime: ✅ 2.275µs (SLO: <10.000µs 📉 -77.3%) vs baseline: -0.7% Memory: ✅ 40.344MB (SLO: <41.000MB 🟡 -1.6%) vs baseline: +5.0% ✅ ospathnormcase_aspectTime: ✅ 3.393µs (SLO: <10.000µs 📉 -66.1%) vs baseline: -0.3% Memory: ✅ 40.285MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +4.9% ✅ ospathnormcase_noaspectTime: ✅ 0.571µs (SLO: <10.000µs 📉 -94.3%) vs baseline: -1.6% Memory: ✅ 40.265MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.5% ✅ ospathsplit_aspectTime: ✅ 4.783µs (SLO: <10.000µs 📉 -52.2%) vs baseline: +0.9% Memory: ✅ 40.265MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.7% ✅ ospathsplit_noaspectTime: ✅ 1.603µs (SLO: <10.000µs 📉 -84.0%) vs baseline: +1.4% Memory: ✅ 40.226MB (SLO: <41.000MB 🟡 -1.9%) vs baseline: +4.5% ✅ ospathsplitdrive_aspectTime: ✅ 3.603µs (SLO: <10.000µs 📉 -64.0%) vs baseline: -1.1% Memory: ✅ 40.305MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +4.8% ✅ ospathsplitdrive_noaspectTime: ✅ 0.699µs (SLO: <10.000µs 📉 -93.0%) vs baseline: +0.1% Memory: ✅ 40.187MB (SLO: <41.000MB 🟡 -2.0%) vs baseline: +4.6% ✅ ospathsplitext_aspectTime: ✅ 4.447µs (SLO: <10.000µs 📉 -55.5%) vs baseline: +0.4% Memory: ✅ 40.265MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.9% ✅ ospathsplitext_noaspectTime: ✅ 1.386µs (SLO: <10.000µs 📉 -86.1%) vs baseline: ~same Memory: ✅ 40.265MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.9% 📈 telemetryaddmetric - 30/30✅ 1-count-metric-1-timesTime: ✅ 3.415µs (SLO: <20.000µs 📉 -82.9%) vs baseline: 📈 +15.2% Memory: ✅ 35.271MB (SLO: <35.500MB 🟡 -0.6%) vs baseline: +4.9% ✅ 1-count-metrics-100-timesTime: ✅ 201.223µs (SLO: <220.000µs -8.5%) vs baseline: -1.3% Memory: ✅ 35.134MB (SLO: <35.500MB 🟡 -1.0%) vs baseline: +4.4% ✅ 1-distribution-metric-1-timesTime: ✅ 3.247µs (SLO: <20.000µs 📉 -83.8%) vs baseline: -1.8% Memory: ✅ 35.154MB (SLO: <35.500MB 🟡 -1.0%) vs baseline: +4.8% ✅ 1-distribution-metrics-100-timesTime: ✅ 214.080µs (SLO: <230.000µs -6.9%) vs baseline: -0.8% Memory: ✅ 35.212MB (SLO: <35.500MB 🟡 -0.8%) vs baseline: +4.9% ✅ 1-gauge-metric-1-timesTime: ✅ 2.188µs (SLO: <20.000µs 📉 -89.1%) vs baseline: -1.3% Memory: ✅ 35.173MB (SLO: <35.500MB 🟡 -0.9%) vs baseline: +4.9% ✅ 1-gauge-metrics-100-timesTime: ✅ 137.409µs (SLO: <150.000µs -8.4%) vs baseline: -0.3% Memory: ✅ 35.232MB (SLO: <35.500MB 🟡 -0.8%) vs baseline: +5.0% ✅ 1-rate-metric-1-timesTime: ✅ 3.062µs (SLO: <20.000µs 📉 -84.7%) vs baseline: -2.2% Memory: ✅ 35.173MB (SLO: <35.500MB 🟡 -0.9%) vs baseline: +4.8% ✅ 1-rate-metrics-100-timesTime: ✅ 215.199µs (SLO: <250.000µs 📉 -13.9%) vs baseline: -0.7% Memory: ✅ 35.232MB (SLO: <35.500MB 🟡 -0.8%) vs baseline: +4.9% ✅ 100-count-metrics-100-timesTime: ✅ 20.346ms (SLO: <22.000ms -7.5%) vs baseline: +0.6% Memory: ✅ 35.252MB (SLO: <35.500MB 🟡 -0.7%) vs baseline: +5.0% ✅ 100-distribution-metrics-100-timesTime: ✅ 2.238ms (SLO: <2.300ms -2.7%) vs baseline: -0.2% Memory: ✅ 34.918MB (SLO: <35.500MB 🟡 -1.6%) vs baseline: +4.1% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.410ms (SLO: <1.550ms -9.0%) vs baseline: ~same Memory: ✅ 35.173MB (SLO: <35.500MB 🟡 -0.9%) vs baseline: +4.7% ✅ 100-rate-metrics-100-timesTime: ✅ 2.225ms (SLO: <2.550ms 📉 -12.7%) vs baseline: +1.0% Memory: ✅ 35.291MB (SLO: <35.500MB 🟡 -0.6%) vs baseline: +4.9% ✅ flush-1-metricTime: ✅ 4.457µs (SLO: <20.000µs 📉 -77.7%) vs baseline: -0.4% Memory: ✅ 35.271MB (SLO: <35.500MB 🟡 -0.6%) vs baseline: +5.3% ✅ flush-100-metricsTime: ✅ 175.257µs (SLO: <250.000µs 📉 -29.9%) vs baseline: +0.6% Memory: ✅ 35.271MB (SLO: <35.500MB 🟡 -0.6%) vs baseline: +4.8% ✅ flush-1000-metricsTime: ✅ 2.178ms (SLO: <2.500ms 📉 -12.9%) vs baseline: +0.5% Memory: ✅ 36.156MB (SLO: <36.500MB 🟡 -0.9%) vs baseline: +5.0% 🟡 Near SLO Breach (19 suites)🟡 coreapiscenario - 10/10 (1 unstable)
|
8afa0e9 to
ba89443
Compare
| @@ -1,3 +1,5 @@ | |||
| #pragma once | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was missing, causing duplicate declarations of types/classes if we re-included the file somewhere.
| // hello | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙂
| static std::once_flag segv_handler_once; | ||
| if (use_alternative_copy_memory()) { | ||
| std::call_once(segv_handler_once, init_segv_catcher); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call_once is far from free, but this is only called once at the start of the Sampling thread, so it's OK as far as I can tell.
Description
https://datadoghq.atlassian.net/browse/PROF-13114
This PR makes sure the Python Profiler's signal handler (for
SIGSEGVandSIGBUS) is properly installed when the Sampler thread starts.Note that this (reinstalling our signal handler) does NOT break any other signal handler (Python's or another extension's) as our signal handler only swallows faults / jumps to the recovery path if it's been "armed" (otherwise it re-raises). What matters is that we should be the "first responder" when a fault happens.
This is an attempt to fix a crash we saw in the testing environment where some workloads receive segmentation faults clearly coming from
safe_memcpyin FastAPI / Django apps.The "real" root cause isn't yet known of me – Django and FastAPI don't seem to use
PYTHONFAULTHANDLERorfaulthandlerbased on the Github code – but after deploying those changes, we stopped seeing those crashes (0 in the past 3 days).