Skip to content

Conversation

@josecelano
Copy link
Member

@josecelano josecelano commented Jun 18, 2025

Relates to: #1589

I am unable to reproduce the bug described here locally.

My plan is to:

  • Add unit tests to see if I can reproduce the problem with edge cases.
  • Enable debugging with tracing and redeploy to the tracker demo to collect data from the demo.

@josecelano josecelano requested a review from da2ce7 June 18, 2025 10:10
@josecelano josecelano self-assigned this Jun 18, 2025
@josecelano josecelano added the Bug Incorrect Behavior label Jun 18, 2025
@codecov
Copy link

codecov bot commented Jun 18, 2025

Codecov Report

Attention: Patch coverage is 98.71795% with 11 lines in your changes missing coverage. Please review.

Project coverage is 85.14%. Comparing base (b254ffd) to head (bf9d16a).
Report is 5 commits behind head on develop.

Files with missing lines Patch % Lines
...es/udp-tracker-server/src/statistics/repository.rs 96.72% 4 Missing and 7 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1590      +/-   ##
===========================================
+ Coverage    84.60%   85.14%   +0.53%     
===========================================
  Files          287      287              
  Lines        21447    22302     +855     
  Branches     21447    22302     +855     
===========================================
+ Hits         18146    18989     +843     
- Misses        2985     2992       +7     
- Partials       316      321       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@josecelano josecelano force-pushed the 1589-wrong-udp-average-connect-time-metric branch from 7652c27 to 5fc255f Compare June 18, 2025 11:12
@josecelano
Copy link
Member Author

I think the problem could be a race condition between recalculating the average and increasing the total number of requests. Since we need the total number of requests to calculate the running average, it is possible that we calculate the average before the request counter has been incremented. In that case, the initial average would be infinite. However, I have added a test and, although it's a bug, it's not the same problem I'm getting in production.

    #[tokio::test]
    async fn it_should_handle_moving_average_calculation_before_any_connections_are_recorded() {
        let repo = Repository::new();
        let now = CurrentClock::now();

        // This test checks the behavior of `recalculate_udp_avg_connect_processing_time_ns``
        // when no connections have been recorded yet. The first call should
        // handle division by zero gracefully and return an infinite average,
        // which is the current behavior.

        // todo: the first average should be 2000ns, not infinity.
        // This is because the first connection is not counted in the average
        // calculation if the counter is increased after calculating the average.
        // The problem is that we count requests when they are accepted, not
        // when they are processed. And we calculate the average when the
        // response is sent.

        // First calculation: no connections recorded yet, should result in infinity
        let processing_time_1 = Duration::from_nanos(2000);
        let avg_1 = repo.recalculate_udp_avg_connect_processing_time_ns(processing_time_1).await;

        // Division by zero: 1000 + (2000 - 1000) / 0 = infinity
        assert!(
            avg_1.is_infinite(),
            "First calculation should be infinite due to division by zero"
        );

        // Now add one connection and try again
        let ipv4_labels = LabelSet::from([("server_binding_address_ip_family", "inet"), ("request_kind", "connect")]);
        repo.increase_counter(&metric_name!(UDP_TRACKER_SERVER_REQUESTS_ACCEPTED_TOTAL), &ipv4_labels, now)
            .await
            .unwrap();

        // Second calculation: 1 connection, but previous average is infinity
        let processing_time_2 = Duration::from_nanos(3000);
        let avg_2 = repo.recalculate_udp_avg_connect_processing_time_ns(processing_time_2).await;

        assert!(
            (avg_2 - 3000.0).abs() < f64::EPSILON,
            "Second calculation should be 3000ns, but got {avg_2}"
        );
    }

@josecelano josecelano changed the title Fix bug: Wrong UDP Average Connect Time metric Add logs to debug this bug: Wrong UDP Average Connect Time metric Jun 18, 2025
@josecelano josecelano marked this pull request as ready for review June 18, 2025 11:37
@josecelano
Copy link
Member Author

ACK bf9d16a

@josecelano josecelano merged commit 09f52e0 into torrust:develop Jun 18, 2025
62 of 69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Incorrect Behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant