-
Notifications
You must be signed in to change notification settings - Fork 48
Add logs to debug this bug: Wrong UDP Average Connect Time metric #1590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add logs to debug this bug: Wrong UDP Average Connect Time metric #1590
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1590 +/- ##
===========================================
+ Coverage 84.60% 85.14% +0.53%
===========================================
Files 287 287
Lines 21447 22302 +855
Branches 21447 22302 +855
===========================================
+ Hits 18146 18989 +843
- Misses 2985 2992 +7
- Partials 316 321 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…s::repository::Repository
7652c27 to
5fc255f
Compare
|
I think the problem could be a race condition between recalculating the average and increasing the total number of requests. Since we need the total number of requests to calculate the running average, it is possible that we calculate the average before the request counter has been incremented. In that case, the initial average would be infinite. However, I have added a test and, although it's a bug, it's not the same problem I'm getting in production. #[tokio::test]
async fn it_should_handle_moving_average_calculation_before_any_connections_are_recorded() {
let repo = Repository::new();
let now = CurrentClock::now();
// This test checks the behavior of `recalculate_udp_avg_connect_processing_time_ns``
// when no connections have been recorded yet. The first call should
// handle division by zero gracefully and return an infinite average,
// which is the current behavior.
// todo: the first average should be 2000ns, not infinity.
// This is because the first connection is not counted in the average
// calculation if the counter is increased after calculating the average.
// The problem is that we count requests when they are accepted, not
// when they are processed. And we calculate the average when the
// response is sent.
// First calculation: no connections recorded yet, should result in infinity
let processing_time_1 = Duration::from_nanos(2000);
let avg_1 = repo.recalculate_udp_avg_connect_processing_time_ns(processing_time_1).await;
// Division by zero: 1000 + (2000 - 1000) / 0 = infinity
assert!(
avg_1.is_infinite(),
"First calculation should be infinite due to division by zero"
);
// Now add one connection and try again
let ipv4_labels = LabelSet::from([("server_binding_address_ip_family", "inet"), ("request_kind", "connect")]);
repo.increase_counter(&metric_name!(UDP_TRACKER_SERVER_REQUESTS_ACCEPTED_TOTAL), &ipv4_labels, now)
.await
.unwrap();
// Second calculation: 1 connection, but previous average is infinity
let processing_time_2 = Duration::from_nanos(3000);
let avg_2 = repo.recalculate_udp_avg_connect_processing_time_ns(processing_time_2).await;
assert!(
(avg_2 - 3000.0).abs() < f64::EPSILON,
"Second calculation should be 3000ns, but got {avg_2}"
);
} |
|
ACK bf9d16a |
Relates to: #1589
I am unable to reproduce the bug described here locally.
My plan is to: