Skip to content

Conversation

@naoNao89
Copy link

@naoNao89 naoNao89 commented Aug 21, 2025

No description provided.

josecelano added 30 commits May 1, 2025 18:21
…itory to TorrentRepository

InMemoryTorrentRepository is now a wrapper over TorrentRepository.

It's planned to make the InMemoryTorrentRepository responsible for triggering events.
Unneeded wrapper for TorrentRepository now that there is only one
implementaion.
…ge before adding events

7215f6e refactor: [torrust#1491] clean tests in torrent-repository (Jose Celano)
1f5d18f refactor: [torrust#1491] remove duplicate code (Jose Celano)
09bbef7 refactor: [torrust#1491] rename varaible (Jose Celano)
32acbb1 refactor: [torrust#1491] rename method (Jose Celano)
21b18e4 refactor: [torrust#1491] move functionality from InMemoryTorrentRepository to TorrentRepository (Jose Celano)
df00533 refactor: [torrust#1491] rename main types in torrent-repository pkg (Jose Celano)
9be7c68 refactor: [torrust#1491] remove redundant type aliases (Jose Celano)
71aa8d0 fix: [torrust#1491] deadlock running tests (Jose Celano)
f106c01 refactor: [torrust#1491] move type alias to torrent-repository pkg (Jose Celano)
2b0727e refactor: [torrust#1491] reorganize repository module (Jose Celano)
0acfc8f refactor: [torrust#1491] remove unneeded type alias EntrySingle (Jose Celano)
f868b04 refactor: [torrust#1491] put implementation and type in the same module (Jose Celano)
e87479e refactor: [torrust#1491] extract mod torrent (Jose Celano)
e0a4aac feat!: [torrust#1491] remove unneeded generic (Jose Celano)
ecd2266 feat!: [torrust#1491] remove unused traits Entry and EntrySync (Jose Celano)
382e0af faet!: [torrust#1491] remove unused trait Repository (Jose Celano)
89123fa feat!: [torrust#1491] remove unused traits RepositoryAsync and EntryAsync (Jose Celano)
b2a9684 feat!: [torrust#1491] remove unused torrent repository entry types (Jose Celano)
16a6d08 feat!: [torrust#1491] remove unused torrent repositories (Jose Celano)
3d53b23 feat: [torrust#1491] copy torrent repo benchmarking into a new pkg (Jose Celano)

Pull request description:

  Simplify `torrent-repository` package [before adding events](torrust#1358). It would be overkill to maintain all the implementations.

  ### Subtasks

  - [x] Copy benchmarking code to a new workspace package `torrent-repository-benchmarking`.
  - [x] Remove code not used in production from the `torrent-repository` package.
    - [x] Unused repositories
    - [x] Unused entries
  - [x] Remove unused traits `RepositoryAsync`, `EntryAsync`, `Entry`, `EntrySync`, etc.
  - [x] Remove generics.
  - [x] Rename main types
  - [x] ~~Import the `bittorrent_tracker_core::torrent::repository::in_memory::InMemoryTorrentRepository` type from the `tracker-repo` package.~~ **DISCARDED**. See torrust#1358 (comment).
  - [x] Clean tests

ACKs for top commit:
  josecelano:
    ACK 7215f6e

Tree-SHA512: a7616eae597ad15302bd30f216437bdf2bcf101e7bcc5d2391c6d23646826aed92cf393d5b79865a57ad10e827da06b6adbb830c1cd5b55eb44dcd0b985853dc
```
cargo update
    Updating crates.io index
     Locking 18 packages to latest compatible versions
    Updating async-executor v1.13.1 -> v1.13.2
    Updating axum v0.8.3 -> v0.8.4
    Updating bytemuck v1.22.0 -> v1.23.0
    Updating cc v1.2.20 -> v1.2.21
    Updating chrono v0.4.40 -> v0.4.41
    Updating hashbrown v0.15.2 -> v0.15.3
    Updating miette v7.5.0 -> v7.6.0
    Updating miette-derive v7.5.0 -> v7.6.0
    Updating openssl-sys v0.9.107 -> v0.9.108
    Updating rustix v1.0.5 -> v1.0.7
    Updating sha2 v0.10.8 -> v0.10.9
    Updating syn v2.0.100 -> v2.0.101
    Updating synstructure v0.13.1 -> v0.13.2
    Updating toml v0.8.20 -> v0.8.22
    Updating toml_datetime v0.6.8 -> v0.6.9
    Updating toml_edit v0.22.24 -> v0.22.26
      Adding toml_write v0.1.1
    Updating winnow v0.7.7 -> v0.7.8
```
62e57ae chore(deps): udpate dependencies (Jose Celano)

Pull request description:

  ```
  cargo update
      Updating crates.io index
       Locking 18 packages to latest compatible versions
      Updating async-executor v1.13.1 -> v1.13.2
      Updating axum v0.8.3 -> v0.8.4
      Updating bytemuck v1.22.0 -> v1.23.0
      Updating cc v1.2.20 -> v1.2.21
      Updating chrono v0.4.40 -> v0.4.41
      Updating hashbrown v0.15.2 -> v0.15.3
      Updating miette v7.5.0 -> v7.6.0
      Updating miette-derive v7.5.0 -> v7.6.0
      Updating openssl-sys v0.9.107 -> v0.9.108
      Updating rustix v1.0.5 -> v1.0.7
      Updating sha2 v0.10.8 -> v0.10.9
      Updating syn v2.0.100 -> v2.0.101
      Updating synstructure v0.13.1 -> v0.13.2
      Updating toml v0.8.20 -> v0.8.22
      Updating toml_datetime v0.6.8 -> v0.6.9
      Updating toml_edit v0.22.24 -> v0.22.26
        Adding toml_write v0.1.1
      Updating winnow v0.7.7 -> v0.7.8
  ```

ACKs for top commit:
  josecelano:
    ACK 62e57ae

Tree-SHA512: dd96e03058cde72abe3071376686d805c651adfc8d6f81037edcf718c4c65dd5a6e7c56fdbd9a81b8d877e2d32dcdf10df71991499069cdda6cd5a068664a1e4
This change prevents duplicate peers with the same address but different IDs, ensuring more accurate peer tracking.
- Moved responsability for keeping metadata to the Swarm type.
- Number of seeder and leechers is now calculated when the Swarm changes
  not on-demand. We avoid iterating over the peers to get the number of
seeders and leechers.
- The number of downloads is also calculate now in the Swarm. It will be
  removed from the TrackedTorrent.
Changed from:

```
/// Returns true if the torrents meets the retention policy, meaning that
/// it should be kept in the tracker.
pub fn meets_retaining_policy(&self, policy: &TrackerPolicy) -> bool {
    if policy.persistent_torrent_completed_stat && self.metadata().downloaded > 0 {
        return true;
    }

    if policy.remove_peerless_torrents && self.is_empty() {
        return false;
    }

    true
}
```

To:

```
pub fn meets_retaining_policy(&self, policy: &TrackerPolicy) -> bool {
    !(policy.remove_peerless_torrents && self.is_empty())
}
```

I think the first condition was introduced to avoid loosing the number of
downloads we¡hen the torrent is removed becuase there are no peers.

Now, we load that number from database when the torrent is added again
after removing it from the tracker.
josecelano and others added 24 commits June 18, 2025 12:15
…nect Time metric

bf9d16a tests(udp-tracker-server): [torrust#1589] add unit tests to statistics::metrics::Metrics (Jose Celano)
7e9d982 fix(udt-tracker-server): metric description (Jose Celano)
5fc255f tests(udp-tracker-server): [torrust#1589] add unit tests to statistics::repository::Repository (Jose Celano)
f7ab993 refactor: [torrust#1589] add logs for debugging (Jose Celano)

Pull request description:

  Relates to: torrust#1589

  I am unable to reproduce the bug described [here](torrust#1589) locally.

  My plan is to:

  - Add unit tests to see if I can reproduce the problem with edge cases.
  - Enable debugging with tracing and redeploy to the tracker demo to collect data from the demo.

ACKs for top commit:
  josecelano:
    ACK bf9d16a

Tree-SHA512: 8ebebbd26b323de28abe9b0e95e6f469e1560a0379e85e99ebb969bfb89dd822b7876c1b6d69cf18e951d0b5a1915832de0ed5edbe4440a931cb7d650cff685a
… update

520fd8b chore: [torrust#1589] add debug logs for avg processing time metric update (Jose Celano)

Pull request description:

  Related to: torrust#1589

  Add more debug logs for the average processing time metric update.

  I will deploy to the tracker demo and collect more data.

ACKs for top commit:
  josecelano:
    ACK 520fd8b

Tree-SHA512: 92396908fa854d6e781e893d4963c710a68662023ba460dfd1d7cd283d52d414dd3d0306bfc7b6eef261a0ae6d7b3e19525799a47801260f1b4cf183709688aa
…time metric and update atomic

It also fixes a division by zero bug when the metrics is updated before
the counter for number of conenctions has been increased.

It only avoid the division by zero. I will propoerly fixed with
independent request counter for the moving average calculation.
… time metric and update atomic

It also fixes a division by zero bug when the metrics is updated before
the counter for number of conenctions has been increased.

It only avoid the division by zero. I will propoerly fixed with
independent request counter for the moving average calculation.
…ime metric and update atomic

It also fixes a division by zero bug when the metrics is updated before
the counter for number of conenctions has been increased.

It only avoid the division by zero. I will propoerly fixed with
independent request counter for the moving average calculation.
…e series

We can't count the total number of UDP requests while calculating the
moving average but updating it only for a concrete label set (time
series).

Averages are calculate for each label set. They could be aggregated by
caclulating the average for all time series.
…n moving average calculation

Add a new metric `UDP_TRACKER_SERVER_PERFORMANCE_PROCESSED_REQUESTS_TOTAL` to track
requests processed specifically for performance metrics, eliminating race conditions
in the moving average calculation.

**Changes:**
- Add new metric constant `UDP_TRACKER_SERVER_PERFORMANCE_PROCESSED_REQUESTS_TOTAL`
- Update `recalculate_udp_avg_processing_time_ns()` to use dedicated counter instead of accepted requests total
- Add `udp_processed_requests_total()` method to retrieve the new metric value
- Add `increment_udp_processed_requests_total()` helper method
- Update metric descriptions to include the new counter

**Problem Fixed:**
Previously, the moving average calculation used the accepted requests counter that could be
updated independently, causing race conditions where the same request count was used for
multiple calculations. The new implementation increments its own dedicated counter atomically
during the calculation, ensuring consistency.

**Behavior Change:**
The counter now starts at 0 and gets incremented to 1 on the first calculation call,
then uses proper moving average formula for subsequent calls. This eliminates division
by zero issues and provides more accurate moving averages.

**Tests Updated:**
Updated repository tests to reflect the new atomic behavior where the processed requests
counter is managed specifically for moving average calculations.

Fixes race conditions in UDP request processing time metrics while maintaining
backward compatibility of all public APIs.
Implements a new aggregate function for calculating averages of metric samples
that match specific label criteria, complementing the existing Sum aggregation.

- **metrics/src/metric/aggregate/avg.rs**: New metric-level average trait and implementations
  - `Avg` trait with `avg()` method for calculating averages
  - Implementation for `Metric<Counter>` returning `f64`
  - Implementation for `Metric<Gauge>` returning `f64`
  - Comprehensive unit tests with edge cases (empty samples, large values, etc.)

- **metrics/src/metric_collection/aggregate/avg.rs**: New collection-level average trait
  - `Avg` trait for `MetricCollection` and `MetricKindCollection<T>`
  - Delegates to metric-level implementations
  - Handles mixed counter/gauge collections by trying counters first, then gauges
  - Returns `None` for non-existent metrics
  - Comprehensive test suite covering various scenarios

- **metrics/src/metric/aggregate/mod.rs**: Export new `avg` module
- **metrics/src/metric_collection/aggregate/mod.rs**: Export new `avg` module

- **metrics/README.md**: Add example usage of the new `Avg` trait in the aggregation section

- **Type Safety**: Returns appropriate types (`f64` for both counters and gauges)
- **Label Filtering**: Supports filtering samples by label criteria like existing `Sum`
- **Edge Case Handling**: Returns `0.0` for empty sample sets
- **Performance**: Uses iterator chains for efficient sample processing
- **Comprehensive Testing**: 205 tests pass including new avg functionality

```rust
use torrust_tracker_metrics::metric_collection::aggregate::Avg;

// Calculate average of all matching samples
let avg_value = metrics.avg(&metric_name, &label_criteria);
```

The implementation follows the same patterns as the existing `Sum` aggregate function,
ensuring consistency in the codebase and maintaining the same level of type safety
and performance characteristics.
Improve AI-generated code.

Moves the collect_matching_samples helper method from individual aggregate
implementations to the generic Metric<T> implementation, making it reusable
across all aggregate functions.

- Add collect_matching_samples method to Metric<T> for filtering samples
  by label criteria
- Remove code duplication between Sum and Avg aggregate implementations
- Improve code organization by centralizing sample collection logic
- Maintain backward compatibility and all existing functionality

This refactoring improves maintainability by providing a single, well-tested
implementation of sample filtering that can be used by current and future
aggregate functions.
Division by zero issues was solved. It can't happen now becuase we
increase the counter at the beggining of the function.

```rust
    #[allow(clippy::cast_precision_loss)]
    pub fn recalculate_udp_avg_processing_time_ns(
        &mut self,
        req_processing_time: Duration,
        label_set: &LabelSet,
        now: DurationSinceUnixEpoch,
    ) -> f64 {
        self.increment_udp_processed_requests_total(label_set, now);

        let processed_requests_total = self.udp_processed_requests_total(label_set) as f64;
        let previous_avg = self.udp_avg_processing_time_ns(label_set);
        let req_processing_time = req_processing_time.as_nanos() as f64;

        // Moving average: https://en.wikipedia.org/wiki/Moving_average
        let new_avg = previous_avg as f64 + (req_processing_time - previous_avg as f64) / processed_requests_total;

        tracing::debug!(
            "Recalculated UDP average processing time for labels {}: {} ns (previous: {} ns, req_processing_time: {} ns, request_processed_total: {})",
            label_set,
            new_avg,
            previous_avg,
            req_processing_time,
            processed_requests_total
        );

        self.update_udp_avg_processing_time_ns(new_avg, label_set, now);

        new_avg
    }
```
…etrics

When calculating aggregated values for processing time metrics across
multiple servers, we need to use the average (.avg()) instead of sum
(.sum()) because the metric samples are already averages per server.

Using sum() on pre-averaged values would produce incorrect results,
as it would add up the averages rather than computing the true average
across all servers.

Changes:
- Add new *_averaged() methods that use .avg() for proper aggregation
- Update services.rs to use the corrected averaging methods
- Import Avg trait for metric collection averaging functionality

Fixes incorrect metric aggregation for:
- udp_avg_connect_processing_time_ns
- udp_avg_announce_processing_time_ns
- udp_avg_scrape_processing_time_ns"
Adds a comprehensive unit test to validate thread safety when updating
UDP_TRACKER_SERVER_PERFORMANCE_AVG_PROCESSING_TIME_NS metrics under
concurrent load.

The test:
- Spawns 200 concurrent tasks (100 per server) simulating two UDP servers
- Server 1: cycles through [1000, 2000, 3000, 4000, 5000] ns processing times
- Server 2: cycles through [2000, 3000, 4000, 5000, 6000] ns processing times
- Validates request counts, average calculations, and metric relationships
- Uses tolerance-based assertions (±50ns) to account for moving average
  calculation variations in concurrent environments
- Ensures thread safety and mathematical correctness of the metrics system

This test helps ensure the UDP tracker server's metrics collection remains
accurate and thread-safe under high-concurrency scenarios.
…cs race condition test

Restructures the race condition test to follow clear Arrange-Act-Assert pattern
and eliminates code duplication through helper function extraction.

The test maintains identical functionality while being more maintainable,
readable, and following DRY principles. All 200 concurrent tasks still
validate thread safety and mathematical correctness of the metrics system.
b423bf6 refactor: [torrust#1589] improve readability of UDP performance metrics race condition test (Jose Celano)
dc8d4a9 test: [torrust#1589] add race condition test for UDP performance metrics (Jose Celano)
a9acca5 refactor: [torrust#1589] rename methods and remove unused code (Jose Celano)
4c082fa refactor: [torrust#1589] make methods private (Jose Celano)
cd57f7a fix: [torrust#1589] use average aggregation for UDP processing time metrics (Jose Celano)
ba3d8a9 fix: format (Jose Celano)
caa69ae test: [torrust#1589] remove uneeded test (Jose Celano)
f402b02 chore: remove deprecated comment (Jose Celano)
8fbcf90 refactor(metrics): extract collect_matching_samples to Metric<T> impl (Jose Celano)
384b887 feat(metrics): [torrust#1589] add Avg (average) aggregate function (Jose Celano)
ed5f1e6 fix: [torrust#1589] add dedicated metric for UDP request processing in moving average calculation (Jose Celano)
164de92 refactor: [torrust#1589] remvoe duplicate code (Jose Celano)
1c13b12 fix: [torrust#1589] partially. Moving average calculated for each time series (Jose Celano)
47c2949 refactor: [torrust#1598] make recalculate udp avg scrape processing time metric and update atomic (Jose Celano)
59fbb39 refactor: [torrust#1598] make recalculate udp avg announce processing time metric and update atomic (Jose Celano)
d50948e refactor: [torrust#1598] make recalculate udp avg connect processing time metric and update atomic (Jose Celano)
e6c05b6 refactor(udp-tracker-server): [torrust#1589] move average processing time calculation from Repository to Metrics (Jose Celano)

Pull request description:

  Fix bug: Wrong UDP Average Connect Time metric.

  ### Context

  See torrust#1589.

  ### Subtasks

  - [x] Read, recalculate, and update average should be atomic.
  - [x] Average should be calculated for a time series (label set). We are mixing series: we update the average using a label set (segregate average), but count the requests for the average globally (not segregated).
  - [x] Add a new metric to count requests for the moving average. This counter is also increased atomically when the new average is updated.
  - [x] Fix global (all trackers, no labels) value for the average. It should be the average of all average samples, not the `SUM`.
    - [x] Add new average aggregate function to the `metrics` package.
  - [x] Add tests:
      - [x] With two times series. Two trackers running on different ports (`6868`, `6969`)
      - [x] For race conditions, running multiple requests in parallel. To ensure the average is accurate after many iterations.

  ### How to test

  1. Run the tracker: `cargo run`
  2. Simulate one `announce` request per UDP server

  ```
  cargo run -p torrust-tracker-client --bin udp_tracker_client announce udp://127.0.0.1:6868 000620bbc6c52d5a96d98f6c0f1dfa523a40df82 | jq

  cargo run -p torrust-tracker-client --bin udp_tracker_client announce udp://127.0.0.1:6969 000620bbc6c52d5a96d98f6c0f1dfa523a40df82 | jq
  ```

  3. Check metrics

  In the labelled metrics, there should be two metric samples like these:

  curl -s "http://localhost:1212/api/v1/metrics?token=MyAccessToken&format=prometheus"

  ```
  # HELP udp_tracker_server_performance_avg_processing_time_ns Average time to process a UDP request in nanoseconds
  # TYPE udp_tracker_server_performance_avg_processing_time_ns gauge
  udp_tracker_server_performance_avg_processing_time_ns{request_kind="announce",server_binding_address_ip_family="inet",server_binding_address_ip_type="plain",server_binding_ip="0.0.0.0",server_binding_port="6868",server_binding_protocol="udp"} 54773
  udp_tracker_server_performance_avg_processing_time_ns{request_kind="connect",server_binding_address_ip_family="inet",server_binding_address_ip_type="plain",server_binding_ip="0.0.0.0",server_binding_port="6868",server_binding_protocol="udp"} 40326
  udp_tracker_server_performance_avg_processing_time_ns{request_kind="announce",server_binding_address_ip_family="inet",server_binding_address_ip_type="plain",server_binding_ip="0.0.0.0",server_binding_port="6969",server_binding_protocol="udp"} 66063.71428571429
  udp_tracker_server_performance_avg_processing_time_ns{request_kind="connect",server_binding_address_ip_family="inet",server_binding_address_ip_type="plain",server_binding_ip="0.0.0.0",server_binding_port="6969",server_binding_protocol="udp"} 43497.71428571428
  ```

  In the global aggregated metrics:

  curl -s "http://localhost:1212/api/v1/metrics?token=MyAccessToken&format=prometheus"

  The values should be the average of the server's averages:

  ```
  udp_avg_connect_processing_time_ns 41911
  udp_avg_announce_processing_time_ns 60418
  udp_avg_scrape_processing_time_ns 0
  ```

  41911 = (40326 + 43497.71428571428)/2 = 41911,857142857
  60418 = (54773 + 66063.71428571429)/2 = 60418,357142857

  The values are rounded because we use a u64 for the global aggregated metrics.

ACKs for top commit:
  josecelano:
    ACK b423bf6

Tree-SHA512: 3d2544860838d0817f0700587af87bc6890eaef596db6cc15d02340e2ad9e459e425d2589d4e5c542ec3f0fc250b6e49fb4964be343e670a13d1de3a59b8f712
- Prefer parsed sample timestamps over fallback
- Preserve HELP as description; keep unsupported families as errors (for now)
- Remove gauge normalization in tests; only ensure trailing newline
- Pin openmetrics-parser to 0.4.4 (latest)

Refs: torrust#1582
@naoNao89 naoNao89 requested a review from a team as a code owner August 21, 2025 18:41
@naoNao89 naoNao89 closed this Aug 21, 2025
@naoNao89 naoNao89 deleted the feat/metrics-prometheus-deser-1582 branch August 21, 2025 18:43
@naoNao89 naoNao89 restored the feat/metrics-prometheus-deser-1582 branch August 21, 2025 18:47
@da2ce7
Copy link
Contributor

da2ce7 commented Aug 24, 2025

hello @naoNao89 , it looks like you are trying to apply you commit on a very old version, please get the latest version of torrust and remake your changes :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants