Add logs to debug this bug: Wrong UDP Average Connect Time metric #1590

josecelano · 2025-06-18T10:10:19Z

Relates to: #1589

I am unable to reproduce the bug described here locally.

My plan is to:

Add unit tests to see if I can reproduce the problem with edge cases.
Enable debugging with tracing and redeploy to the tracker demo to collect data from the demo.

codecov · 2025-06-18T10:26:00Z

Codecov Report

Attention: Patch coverage is 98.71795% with 11 lines in your changes missing coverage. Please review.

Project coverage is 85.14%. Comparing base (b254ffd) to head (bf9d16a).
Report is 5 commits behind head on develop.

Files with missing lines	Patch %	Lines
...es/udp-tracker-server/src/statistics/repository.rs	96.72%	4 Missing and 7 partials ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1590      +/-   ##
===========================================
+ Coverage    84.60%   85.14%   +0.53%     
===========================================
  Files          287      287              
  Lines        21447    22302     +855     
  Branches     21447    22302     +855     
===========================================
+ Hits         18146    18989     +843     
- Misses        2985     2992       +7     
- Partials       316      321       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…s::repository::Repository

josecelano · 2025-06-18T11:14:02Z

I think the problem could be a race condition between recalculating the average and increasing the total number of requests. Since we need the total number of requests to calculate the running average, it is possible that we calculate the average before the request counter has been incremented. In that case, the initial average would be infinite. However, I have added a test and, although it's a bug, it's not the same problem I'm getting in production.

    #[tokio::test]
    async fn it_should_handle_moving_average_calculation_before_any_connections_are_recorded() {
        let repo = Repository::new();
        let now = CurrentClock::now();

        // This test checks the behavior of `recalculate_udp_avg_connect_processing_time_ns``
        // when no connections have been recorded yet. The first call should
        // handle division by zero gracefully and return an infinite average,
        // which is the current behavior.

        // todo: the first average should be 2000ns, not infinity.
        // This is because the first connection is not counted in the average
        // calculation if the counter is increased after calculating the average.
        // The problem is that we count requests when they are accepted, not
        // when they are processed. And we calculate the average when the
        // response is sent.

        // First calculation: no connections recorded yet, should result in infinity
        let processing_time_1 = Duration::from_nanos(2000);
        let avg_1 = repo.recalculate_udp_avg_connect_processing_time_ns(processing_time_1).await;

        // Division by zero: 1000 + (2000 - 1000) / 0 = infinity
        assert!(
            avg_1.is_infinite(),
            "First calculation should be infinite due to division by zero"
        );

        // Now add one connection and try again
        let ipv4_labels = LabelSet::from([("server_binding_address_ip_family", "inet"), ("request_kind", "connect")]);
        repo.increase_counter(&metric_name!(UDP_TRACKER_SERVER_REQUESTS_ACCEPTED_TOTAL), &ipv4_labels, now)
            .await
            .unwrap();

        // Second calculation: 1 connection, but previous average is infinity
        let processing_time_2 = Duration::from_nanos(3000);
        let avg_2 = repo.recalculate_udp_avg_connect_processing_time_ns(processing_time_2).await;

        assert!(
            (avg_2 - 3000.0).abs() < f64::EPSILON,
            "Second calculation should be 3000ns, but got {avg_2}"
        );
    }

…s::metrics::Metrics

josecelano · 2025-06-18T15:25:38Z

ACK bf9d16a

refactor: [torrust#1589] add logs for debugging

f7ab993

josecelano requested a review from da2ce7 June 18, 2025 10:10

josecelano self-assigned this Jun 18, 2025

josecelano added the Bug Incorrect Behavior label Jun 18, 2025

josecelano temporarily deployed to coverage June 18, 2025 10:10 — with GitHub Actions Inactive

tests(udp-tracker-server): [torrust#1589] add unit tests to statistic…

5fc255f

…s::repository::Repository

josecelano force-pushed the 1589-wrong-udp-average-connect-time-metric branch from 7652c27 to 5fc255f Compare June 18, 2025 11:12

josecelano temporarily deployed to coverage June 18, 2025 11:12 — with GitHub Actions Inactive

josecelano added 2 commits June 18, 2025 12:15

fix(udt-tracker-server): metric description

7e9d982

tests(udp-tracker-server): [torrust#1589] add unit tests to statistic…

bf9d16a

…s::metrics::Metrics

josecelano temporarily deployed to coverage June 18, 2025 11:34 — with GitHub Actions Inactive

josecelano changed the title ~~Fix bug: Wrong UDP Average Connect Time metric~~ Add logs to debug this bug: Wrong UDP Average Connect Time metric Jun 18, 2025

josecelano marked this pull request as ready for review June 18, 2025 11:37

josecelano merged commit 09f52e0 into torrust:develop Jun 18, 2025
62 of 69 checks passed

This was referenced Jun 18, 2025

Overhaul stats: Wrong UDP Average Connect Time metric #1589

Closed

Deploy new tracker version: more logs to find bug torrust/torrust-demo#70

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add logs to debug this bug: Wrong UDP Average Connect Time metric #1590

Add logs to debug this bug: Wrong UDP Average Connect Time metric #1590

Uh oh!

josecelano commented Jun 18, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jun 18, 2025 •

edited

Loading

Uh oh!

josecelano commented Jun 18, 2025

Uh oh!

josecelano commented Jun 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add logs to debug this bug: Wrong UDP Average Connect Time metric #1590

Add logs to debug this bug: Wrong UDP Average Connect Time metric #1590

Uh oh!

Conversation

josecelano commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

josecelano commented Jun 18, 2025

Uh oh!

josecelano commented Jun 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

josecelano commented Jun 18, 2025 •

edited

Loading

codecov bot commented Jun 18, 2025 •

edited

Loading