-
Notifications
You must be signed in to change notification settings - Fork 48
feat/metrics prometheus deser 1582 #1610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
naoNao89
wants to merge
951
commits into
torrust:main
from
naoNao89:feat/metrics-prometheus-deser-1582
Closed
feat/metrics prometheus deser 1582 #1610
naoNao89
wants to merge
951
commits into
torrust:main
from
naoNao89:feat/metrics-prometheus-deser-1582
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…itory to TorrentRepository InMemoryTorrentRepository is now a wrapper over TorrentRepository. It's planned to make the InMemoryTorrentRepository responsible for triggering events.
Unneeded wrapper for TorrentRepository now that there is only one implementaion.
…ge before adding events 7215f6e refactor: [torrust#1491] clean tests in torrent-repository (Jose Celano) 1f5d18f refactor: [torrust#1491] remove duplicate code (Jose Celano) 09bbef7 refactor: [torrust#1491] rename varaible (Jose Celano) 32acbb1 refactor: [torrust#1491] rename method (Jose Celano) 21b18e4 refactor: [torrust#1491] move functionality from InMemoryTorrentRepository to TorrentRepository (Jose Celano) df00533 refactor: [torrust#1491] rename main types in torrent-repository pkg (Jose Celano) 9be7c68 refactor: [torrust#1491] remove redundant type aliases (Jose Celano) 71aa8d0 fix: [torrust#1491] deadlock running tests (Jose Celano) f106c01 refactor: [torrust#1491] move type alias to torrent-repository pkg (Jose Celano) 2b0727e refactor: [torrust#1491] reorganize repository module (Jose Celano) 0acfc8f refactor: [torrust#1491] remove unneeded type alias EntrySingle (Jose Celano) f868b04 refactor: [torrust#1491] put implementation and type in the same module (Jose Celano) e87479e refactor: [torrust#1491] extract mod torrent (Jose Celano) e0a4aac feat!: [torrust#1491] remove unneeded generic (Jose Celano) ecd2266 feat!: [torrust#1491] remove unused traits Entry and EntrySync (Jose Celano) 382e0af faet!: [torrust#1491] remove unused trait Repository (Jose Celano) 89123fa feat!: [torrust#1491] remove unused traits RepositoryAsync and EntryAsync (Jose Celano) b2a9684 feat!: [torrust#1491] remove unused torrent repository entry types (Jose Celano) 16a6d08 feat!: [torrust#1491] remove unused torrent repositories (Jose Celano) 3d53b23 feat: [torrust#1491] copy torrent repo benchmarking into a new pkg (Jose Celano) Pull request description: Simplify `torrent-repository` package [before adding events](torrust#1358). It would be overkill to maintain all the implementations. ### Subtasks - [x] Copy benchmarking code to a new workspace package `torrent-repository-benchmarking`. - [x] Remove code not used in production from the `torrent-repository` package. - [x] Unused repositories - [x] Unused entries - [x] Remove unused traits `RepositoryAsync`, `EntryAsync`, `Entry`, `EntrySync`, etc. - [x] Remove generics. - [x] Rename main types - [x] ~~Import the `bittorrent_tracker_core::torrent::repository::in_memory::InMemoryTorrentRepository` type from the `tracker-repo` package.~~ **DISCARDED**. See torrust#1358 (comment). - [x] Clean tests ACKs for top commit: josecelano: ACK 7215f6e Tree-SHA512: a7616eae597ad15302bd30f216437bdf2bcf101e7bcc5d2391c6d23646826aed92cf393d5b79865a57ad10e827da06b6adbb830c1cd5b55eb44dcd0b985853dc
```
cargo update
Updating crates.io index
Locking 18 packages to latest compatible versions
Updating async-executor v1.13.1 -> v1.13.2
Updating axum v0.8.3 -> v0.8.4
Updating bytemuck v1.22.0 -> v1.23.0
Updating cc v1.2.20 -> v1.2.21
Updating chrono v0.4.40 -> v0.4.41
Updating hashbrown v0.15.2 -> v0.15.3
Updating miette v7.5.0 -> v7.6.0
Updating miette-derive v7.5.0 -> v7.6.0
Updating openssl-sys v0.9.107 -> v0.9.108
Updating rustix v1.0.5 -> v1.0.7
Updating sha2 v0.10.8 -> v0.10.9
Updating syn v2.0.100 -> v2.0.101
Updating synstructure v0.13.1 -> v0.13.2
Updating toml v0.8.20 -> v0.8.22
Updating toml_datetime v0.6.8 -> v0.6.9
Updating toml_edit v0.22.24 -> v0.22.26
Adding toml_write v0.1.1
Updating winnow v0.7.7 -> v0.7.8
```
62e57ae chore(deps): udpate dependencies (Jose Celano) Pull request description: ``` cargo update Updating crates.io index Locking 18 packages to latest compatible versions Updating async-executor v1.13.1 -> v1.13.2 Updating axum v0.8.3 -> v0.8.4 Updating bytemuck v1.22.0 -> v1.23.0 Updating cc v1.2.20 -> v1.2.21 Updating chrono v0.4.40 -> v0.4.41 Updating hashbrown v0.15.2 -> v0.15.3 Updating miette v7.5.0 -> v7.6.0 Updating miette-derive v7.5.0 -> v7.6.0 Updating openssl-sys v0.9.107 -> v0.9.108 Updating rustix v1.0.5 -> v1.0.7 Updating sha2 v0.10.8 -> v0.10.9 Updating syn v2.0.100 -> v2.0.101 Updating synstructure v0.13.1 -> v0.13.2 Updating toml v0.8.20 -> v0.8.22 Updating toml_datetime v0.6.8 -> v0.6.9 Updating toml_edit v0.22.24 -> v0.22.26 Adding toml_write v0.1.1 Updating winnow v0.7.7 -> v0.7.8 ``` ACKs for top commit: josecelano: ACK 62e57ae Tree-SHA512: dd96e03058cde72abe3071376686d805c651adfc8d6f81037edcf718c4c65dd5a6e7c56fdbd9a81b8d877e2d32dcdf10df71991499069cdda6cd5a068664a1e4
This change prevents duplicate peers with the same address but different IDs, ensuring more accurate peer tracking.
- Moved responsability for keeping metadata to the Swarm type. - Number of seeder and leechers is now calculated when the Swarm changes not on-demand. We avoid iterating over the peers to get the number of seeders and leechers. - The number of downloads is also calculate now in the Swarm. It will be removed from the TrackedTorrent.
…ad of from TrackedTorrent
Changed from:
```
/// Returns true if the torrents meets the retention policy, meaning that
/// it should be kept in the tracker.
pub fn meets_retaining_policy(&self, policy: &TrackerPolicy) -> bool {
if policy.persistent_torrent_completed_stat && self.metadata().downloaded > 0 {
return true;
}
if policy.remove_peerless_torrents && self.is_empty() {
return false;
}
true
}
```
To:
```
pub fn meets_retaining_policy(&self, policy: &TrackerPolicy) -> bool {
!(policy.remove_peerless_torrents && self.is_empty())
}
```
I think the first condition was introduced to avoid loosing the number of
downloads we¡hen the torrent is removed becuase there are no peers.
Now, we load that number from database when the torrent is added again
after removing it from the tracker.
…s::metrics::Metrics
…nect Time metric bf9d16a tests(udp-tracker-server): [torrust#1589] add unit tests to statistics::metrics::Metrics (Jose Celano) 7e9d982 fix(udt-tracker-server): metric description (Jose Celano) 5fc255f tests(udp-tracker-server): [torrust#1589] add unit tests to statistics::repository::Repository (Jose Celano) f7ab993 refactor: [torrust#1589] add logs for debugging (Jose Celano) Pull request description: Relates to: torrust#1589 I am unable to reproduce the bug described [here](torrust#1589) locally. My plan is to: - Add unit tests to see if I can reproduce the problem with edge cases. - Enable debugging with tracing and redeploy to the tracker demo to collect data from the demo. ACKs for top commit: josecelano: ACK bf9d16a Tree-SHA512: 8ebebbd26b323de28abe9b0e95e6f469e1560a0379e85e99ebb969bfb89dd822b7876c1b6d69cf18e951d0b5a1915832de0ed5edbe4440a931cb7d650cff685a
… update 520fd8b chore: [torrust#1589] add debug logs for avg processing time metric update (Jose Celano) Pull request description: Related to: torrust#1589 Add more debug logs for the average processing time metric update. I will deploy to the tracker demo and collect more data. ACKs for top commit: josecelano: ACK 520fd8b Tree-SHA512: 92396908fa854d6e781e893d4963c710a68662023ba460dfd1d7cd283d52d414dd3d0306bfc7b6eef261a0ae6d7b3e19525799a47801260f1b4cf183709688aa
…time calculation from Repository to Metrics
…time metric and update atomic It also fixes a division by zero bug when the metrics is updated before the counter for number of conenctions has been increased. It only avoid the division by zero. I will propoerly fixed with independent request counter for the moving average calculation.
… time metric and update atomic It also fixes a division by zero bug when the metrics is updated before the counter for number of conenctions has been increased. It only avoid the division by zero. I will propoerly fixed with independent request counter for the moving average calculation.
…ime metric and update atomic It also fixes a division by zero bug when the metrics is updated before the counter for number of conenctions has been increased. It only avoid the division by zero. I will propoerly fixed with independent request counter for the moving average calculation.
…e series We can't count the total number of UDP requests while calculating the moving average but updating it only for a concrete label set (time series). Averages are calculate for each label set. They could be aggregated by caclulating the average for all time series.
…n moving average calculation Add a new metric `UDP_TRACKER_SERVER_PERFORMANCE_PROCESSED_REQUESTS_TOTAL` to track requests processed specifically for performance metrics, eliminating race conditions in the moving average calculation. **Changes:** - Add new metric constant `UDP_TRACKER_SERVER_PERFORMANCE_PROCESSED_REQUESTS_TOTAL` - Update `recalculate_udp_avg_processing_time_ns()` to use dedicated counter instead of accepted requests total - Add `udp_processed_requests_total()` method to retrieve the new metric value - Add `increment_udp_processed_requests_total()` helper method - Update metric descriptions to include the new counter **Problem Fixed:** Previously, the moving average calculation used the accepted requests counter that could be updated independently, causing race conditions where the same request count was used for multiple calculations. The new implementation increments its own dedicated counter atomically during the calculation, ensuring consistency. **Behavior Change:** The counter now starts at 0 and gets incremented to 1 on the first calculation call, then uses proper moving average formula for subsequent calls. This eliminates division by zero issues and provides more accurate moving averages. **Tests Updated:** Updated repository tests to reflect the new atomic behavior where the processed requests counter is managed specifically for moving average calculations. Fixes race conditions in UDP request processing time metrics while maintaining backward compatibility of all public APIs.
Implements a new aggregate function for calculating averages of metric samples that match specific label criteria, complementing the existing Sum aggregation. - **metrics/src/metric/aggregate/avg.rs**: New metric-level average trait and implementations - `Avg` trait with `avg()` method for calculating averages - Implementation for `Metric<Counter>` returning `f64` - Implementation for `Metric<Gauge>` returning `f64` - Comprehensive unit tests with edge cases (empty samples, large values, etc.) - **metrics/src/metric_collection/aggregate/avg.rs**: New collection-level average trait - `Avg` trait for `MetricCollection` and `MetricKindCollection<T>` - Delegates to metric-level implementations - Handles mixed counter/gauge collections by trying counters first, then gauges - Returns `None` for non-existent metrics - Comprehensive test suite covering various scenarios - **metrics/src/metric/aggregate/mod.rs**: Export new `avg` module - **metrics/src/metric_collection/aggregate/mod.rs**: Export new `avg` module - **metrics/README.md**: Add example usage of the new `Avg` trait in the aggregation section - **Type Safety**: Returns appropriate types (`f64` for both counters and gauges) - **Label Filtering**: Supports filtering samples by label criteria like existing `Sum` - **Edge Case Handling**: Returns `0.0` for empty sample sets - **Performance**: Uses iterator chains for efficient sample processing - **Comprehensive Testing**: 205 tests pass including new avg functionality ```rust use torrust_tracker_metrics::metric_collection::aggregate::Avg; // Calculate average of all matching samples let avg_value = metrics.avg(&metric_name, &label_criteria); ``` The implementation follows the same patterns as the existing `Sum` aggregate function, ensuring consistency in the codebase and maintaining the same level of type safety and performance characteristics.
Improve AI-generated code. Moves the collect_matching_samples helper method from individual aggregate implementations to the generic Metric<T> implementation, making it reusable across all aggregate functions. - Add collect_matching_samples method to Metric<T> for filtering samples by label criteria - Remove code duplication between Sum and Avg aggregate implementations - Improve code organization by centralizing sample collection logic - Maintain backward compatibility and all existing functionality This refactoring improves maintainability by providing a single, well-tested implementation of sample filtering that can be used by current and future aggregate functions.
Division by zero issues was solved. It can't happen now becuase we
increase the counter at the beggining of the function.
```rust
#[allow(clippy::cast_precision_loss)]
pub fn recalculate_udp_avg_processing_time_ns(
&mut self,
req_processing_time: Duration,
label_set: &LabelSet,
now: DurationSinceUnixEpoch,
) -> f64 {
self.increment_udp_processed_requests_total(label_set, now);
let processed_requests_total = self.udp_processed_requests_total(label_set) as f64;
let previous_avg = self.udp_avg_processing_time_ns(label_set);
let req_processing_time = req_processing_time.as_nanos() as f64;
// Moving average: https://en.wikipedia.org/wiki/Moving_average
let new_avg = previous_avg as f64 + (req_processing_time - previous_avg as f64) / processed_requests_total;
tracing::debug!(
"Recalculated UDP average processing time for labels {}: {} ns (previous: {} ns, req_processing_time: {} ns, request_processed_total: {})",
label_set,
new_avg,
previous_avg,
req_processing_time,
processed_requests_total
);
self.update_udp_avg_processing_time_ns(new_avg, label_set, now);
new_avg
}
```
…etrics When calculating aggregated values for processing time metrics across multiple servers, we need to use the average (.avg()) instead of sum (.sum()) because the metric samples are already averages per server. Using sum() on pre-averaged values would produce incorrect results, as it would add up the averages rather than computing the true average across all servers. Changes: - Add new *_averaged() methods that use .avg() for proper aggregation - Update services.rs to use the corrected averaging methods - Import Avg trait for metric collection averaging functionality Fixes incorrect metric aggregation for: - udp_avg_connect_processing_time_ns - udp_avg_announce_processing_time_ns - udp_avg_scrape_processing_time_ns"
Adds a comprehensive unit test to validate thread safety when updating UDP_TRACKER_SERVER_PERFORMANCE_AVG_PROCESSING_TIME_NS metrics under concurrent load. The test: - Spawns 200 concurrent tasks (100 per server) simulating two UDP servers - Server 1: cycles through [1000, 2000, 3000, 4000, 5000] ns processing times - Server 2: cycles through [2000, 3000, 4000, 5000, 6000] ns processing times - Validates request counts, average calculations, and metric relationships - Uses tolerance-based assertions (±50ns) to account for moving average calculation variations in concurrent environments - Ensures thread safety and mathematical correctness of the metrics system This test helps ensure the UDP tracker server's metrics collection remains accurate and thread-safe under high-concurrency scenarios.
…cs race condition test Restructures the race condition test to follow clear Arrange-Act-Assert pattern and eliminates code duplication through helper function extraction. The test maintains identical functionality while being more maintainable, readable, and following DRY principles. All 200 concurrent tasks still validate thread safety and mathematical correctness of the metrics system.
b423bf6 refactor: [torrust#1589] improve readability of UDP performance metrics race condition test (Jose Celano) dc8d4a9 test: [torrust#1589] add race condition test for UDP performance metrics (Jose Celano) a9acca5 refactor: [torrust#1589] rename methods and remove unused code (Jose Celano) 4c082fa refactor: [torrust#1589] make methods private (Jose Celano) cd57f7a fix: [torrust#1589] use average aggregation for UDP processing time metrics (Jose Celano) ba3d8a9 fix: format (Jose Celano) caa69ae test: [torrust#1589] remove uneeded test (Jose Celano) f402b02 chore: remove deprecated comment (Jose Celano) 8fbcf90 refactor(metrics): extract collect_matching_samples to Metric<T> impl (Jose Celano) 384b887 feat(metrics): [torrust#1589] add Avg (average) aggregate function (Jose Celano) ed5f1e6 fix: [torrust#1589] add dedicated metric for UDP request processing in moving average calculation (Jose Celano) 164de92 refactor: [torrust#1589] remvoe duplicate code (Jose Celano) 1c13b12 fix: [torrust#1589] partially. Moving average calculated for each time series (Jose Celano) 47c2949 refactor: [torrust#1598] make recalculate udp avg scrape processing time metric and update atomic (Jose Celano) 59fbb39 refactor: [torrust#1598] make recalculate udp avg announce processing time metric and update atomic (Jose Celano) d50948e refactor: [torrust#1598] make recalculate udp avg connect processing time metric and update atomic (Jose Celano) e6c05b6 refactor(udp-tracker-server): [torrust#1589] move average processing time calculation from Repository to Metrics (Jose Celano) Pull request description: Fix bug: Wrong UDP Average Connect Time metric. ### Context See torrust#1589. ### Subtasks - [x] Read, recalculate, and update average should be atomic. - [x] Average should be calculated for a time series (label set). We are mixing series: we update the average using a label set (segregate average), but count the requests for the average globally (not segregated). - [x] Add a new metric to count requests for the moving average. This counter is also increased atomically when the new average is updated. - [x] Fix global (all trackers, no labels) value for the average. It should be the average of all average samples, not the `SUM`. - [x] Add new average aggregate function to the `metrics` package. - [x] Add tests: - [x] With two times series. Two trackers running on different ports (`6868`, `6969`) - [x] For race conditions, running multiple requests in parallel. To ensure the average is accurate after many iterations. ### How to test 1. Run the tracker: `cargo run` 2. Simulate one `announce` request per UDP server ``` cargo run -p torrust-tracker-client --bin udp_tracker_client announce udp://127.0.0.1:6868 000620bbc6c52d5a96d98f6c0f1dfa523a40df82 | jq cargo run -p torrust-tracker-client --bin udp_tracker_client announce udp://127.0.0.1:6969 000620bbc6c52d5a96d98f6c0f1dfa523a40df82 | jq ``` 3. Check metrics In the labelled metrics, there should be two metric samples like these: curl -s "http://localhost:1212/api/v1/metrics?token=MyAccessToken&format=prometheus" ``` # HELP udp_tracker_server_performance_avg_processing_time_ns Average time to process a UDP request in nanoseconds # TYPE udp_tracker_server_performance_avg_processing_time_ns gauge udp_tracker_server_performance_avg_processing_time_ns{request_kind="announce",server_binding_address_ip_family="inet",server_binding_address_ip_type="plain",server_binding_ip="0.0.0.0",server_binding_port="6868",server_binding_protocol="udp"} 54773 udp_tracker_server_performance_avg_processing_time_ns{request_kind="connect",server_binding_address_ip_family="inet",server_binding_address_ip_type="plain",server_binding_ip="0.0.0.0",server_binding_port="6868",server_binding_protocol="udp"} 40326 udp_tracker_server_performance_avg_processing_time_ns{request_kind="announce",server_binding_address_ip_family="inet",server_binding_address_ip_type="plain",server_binding_ip="0.0.0.0",server_binding_port="6969",server_binding_protocol="udp"} 66063.71428571429 udp_tracker_server_performance_avg_processing_time_ns{request_kind="connect",server_binding_address_ip_family="inet",server_binding_address_ip_type="plain",server_binding_ip="0.0.0.0",server_binding_port="6969",server_binding_protocol="udp"} 43497.71428571428 ``` In the global aggregated metrics: curl -s "http://localhost:1212/api/v1/metrics?token=MyAccessToken&format=prometheus" The values should be the average of the server's averages: ``` udp_avg_connect_processing_time_ns 41911 udp_avg_announce_processing_time_ns 60418 udp_avg_scrape_processing_time_ns 0 ``` 41911 = (40326 + 43497.71428571428)/2 = 41911,857142857 60418 = (54773 + 66063.71428571429)/2 = 60418,357142857 The values are rounded because we use a u64 for the global aggregated metrics. ACKs for top commit: josecelano: ACK b423bf6 Tree-SHA512: 3d2544860838d0817f0700587af87bc6890eaef596db6cc15d02340e2ad9e459e425d2589d4e5c542ec3f0fc250b6e49fb4964be343e670a13d1de3a59b8f712
- Prefer parsed sample timestamps over fallback - Preserve HELP as description; keep unsupported families as errors (for now) - Remove gauge normalization in tests; only ensure trailing newline - Pin openmetrics-parser to 0.4.4 (latest) Refs: torrust#1582
Contributor
|
hello @naoNao89 , it looks like you are trying to apply you commit on a very old version, please get the latest version of torrust and remake your changes :) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.