-
-
Notifications
You must be signed in to change notification settings - Fork 137
hotfix for alerts #1283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hotfix for alerts #1283
Conversation
WalkthroughThe pull request introduces significant changes to the alerting system. Core modifications include updates to alert evaluation logic, query string construction, and aggregate configuration handling. The HTTP endpoints for alert management have been revised, and the internal alert scheduling mechanism has transitioned to an asynchronous model using message passing. In addition, serialization for alert targets has been updated from lowercase to camelCase and SQL query formatting has been adjusted. A new GitHub Actions workflow step now fixes a CMakeLists.txt file for librdkafka. Changes
Sequence Diagram(s)sequenceDiagram
participant Sender as Alert Task Sender
participant Runtime as Alert Runtime
participant Evaluator as Alert Evaluator Task
Sender->>Runtime: Send AlertTask (Create/Delete)
alt AlertTask is Create
Runtime->>Runtime: Check if alert exists
alt Alert does NOT exist
Runtime->>Evaluator: Spawn periodic evaluation task
Evaluator-->>Runtime: Return evaluation results/errors
else Alert exists
Runtime-->>Sender: Log error & skip task creation
end
else AlertTask is Delete
Runtime->>Runtime: Check if alert exists
alt Alert exists
Runtime->>Evaluator: Abort evaluation task
Runtime-->>Sender: Log deletion warning
else
Runtime-->>Sender: Log error: Alert not found
end
end
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/rbac/role.rs (1)
51-311
: Consider adding AddLLM and ListLLM to higher privilege rolesWhile
GetLLM
has been added to all role levels (editor, writer, reader), the other LLM actions (AddLLM
,DeleteLLM
,ListLLM
) are only available to admin roles. Consider whetherAddLLM
andListLLM
should be available to editor roles as they have other creation and listing permissions.For editor roles, consider adding:
Action::QueryLLM, Action::GetLLM, +Action::AddLLM, +Action::ListLLM, Action::CreateFilter,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/alerts/alerts_utils.rs
(2 hunks)src/alerts/mod.rs
(7 hunks)src/handlers/http/alerts.rs
(5 hunks)src/rbac/role.rs
(5 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
src/alerts/alerts_utils.rs (1)
src/handlers/http/query.rs (1)
query
(69-155)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: coverage
🔇 Additional comments (18)
src/handlers/http/alerts.rs (4)
56-57
: Authorization updated to use stream-based approachThe code now constructs a SQL query from
alert.stream
for authorization checks, aligning with the transition from a query-based to a stream-based alert model. This maintains security while simplifying the authorization flow.
93-94
: Consistent authorization check in get handlerThis change correctly applies the same stream-based authorization approach in the get handler, ensuring consistent security enforcement across all alert operations.
108-109
: Stream-based authorization in delete handlerThe delete handler now properly uses the stream-based authorization check, maintaining consistency with the other handlers.
192-193
: Stream-based authorization in update_state handlerThe update_state handler successfully implements the stream-based authorization check, completing the transition across all alert-related endpoints.
src/alerts/mod.rs (7)
393-393
: Field renamed from 'query' to 'stream' in AlertRequestThis change reflects the shift from a query-based to a stream-based model for alerts, improving clarity about what this field represents.
407-408
: Updated From implementation to use stream fieldThe
From<AlertRequest>
implementation has been correctly updated to map the renamed field fromquery
tostream
.
425-426
: Field renamed from 'query' to 'stream' in AlertConfigThe
stream
field inAlertConfig
aligns with the changes inAlertRequest
, maintaining consistency across the alert data structures.
438-439
: Modified alert.modify() to use stream fieldThe
modify
method has been updated to use the newstream
field from the alert request, ensuring consistency with the structural changes.
483-486
: Simplified query construction using stream fieldThe query construction now uses a standard
SELECT * FROM {stream}
format based on thestream
field. This simplifies query handling and makes the alert system more maintainable.
506-507
: Updated error message to use select_queryThe error message has been correctly updated to reference the new
select_query
variable.
755-756
: Updated list_alerts_for_user to use stream-based queriesThe authorization check in
list_alerts_for_user
now uses a formatted SQL query based onalert.stream
, maintaining consistency with the changes in other parts of the codebase.src/alerts/alerts_utils.rs (2)
105-106
: Updated evaluate_alert to use stream-based queryThe
evaluate_alert
function now uses a formattedSELECT * FROM {stream}
query based onalert.stream
, aligning with the new stream-based alert model. This simplifies query execution and makes the alert evaluation more consistent.
122-123
: Updated prepare_query to use stream-based approachThe
prepare_query
function now constructs a logical plan using a standardSELECT * FROM {stream}
query based onalert.stream
, ensuring consistency with the changes in other parts of the alert system.src/rbac/role.rs (5)
51-54
: Added new LLM-related actions to the Action enumNew actions related to LLM functionality (
AddLLM
,DeleteLLM
,GetLLM
,ListLLM
) have been added to theAction
enum, enhancing the role-based access control system to support LLM features.
122-126
: Updated RoleBuilder to include new LLM actionsThe RoleBuilder's match statement has been updated to handle the new LLM-related actions, ensuring they are properly processed when building permissions.
245-246
: Added GetLLM to editor permissionsThe
GetLLM
action has been added to the editor permission builder, allowing users with editor privileges to access LLM information.
288-289
: Added GetLLM to writer permissionsThe
GetLLM
action has been added to the writer permission builder, allowing users with writer privileges to access LLM information.
311-312
: Added GetLLM to reader permissionsThe
GetLLM
action has been added to the reader permission builder, allowing even users with the most basic read privileges to access LLM information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/workflows/coverage.yaml
(1 hunks)src/alerts/alerts_utils.rs
(2 hunks)src/alerts/mod.rs
(7 hunks)src/handlers/http/alerts.rs
(5 hunks)src/rbac/role.rs
(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- src/alerts/alerts_utils.rs
- src/handlers/http/alerts.rs
- src/alerts/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: coverage
🔇 Additional comments (6)
.github/workflows/coverage.yaml (1)
13-14
: Global Definition ofCMAKE_FLAGS
Enhances Consistency
The introduction of theCMAKE_FLAGS
environment variable at the job level ensures that all steps under thecoverage
job can uniformly access this configuration. This change removes redundancy compared to having the variable declared only in individual steps (like the previousCheck with clippy
step), and aligns with best practices for centralized configuration management.src/rbac/role.rs (5)
51-54
: LGTM: New LLM-related actions added to the Action enumThese additional actions enhance the RBAC system with fine-grained control over LLM-related functionality, which is a welcome addition for security and access control.
122-125
: LGTM: Proper integration of LLM actions into the permission systemThe new LLM actions are correctly added to the match statement as Unit permissions, ensuring they are handled consistently with other global actions.
288-288
: LGTM: Consistent permission model for Writer roleAdding the
GetLLM
action to the writer role is consistent with the existing permission model where writers have read access to resources.
311-311
: LGTM: Appropriate read-only permissions for Reader roleThe addition of
GetLLM
to the reader role is appropriate as it's a read-only operation, maintaining the principle of least privilege.
22-74
:❓ Verification inconclusive
Document the relationship between LLM actions and alerts
The PR is titled "hotfix for alerts" but introduces new LLM-related actions. While the code changes are well-implemented, the connection between these new actions and the alert system isn't immediately evident.
Please document how these LLM actions relate to the alert hotfix mentioned in the PR description. Are they part of a new alert capability that leverages LLMs?
🏁 Script executed:
#!/bin/bash # Check for connections between alerts and LLM in the codebase rg -A 5 -B 5 "alerts.*LLM|LLM.*alerts" --type rustLength of output: 52
LLM-Alert Relationship Documentation Needed
The new LLM actions (e.g.,
QueryLLM
,AddLLM
,DeleteLLM
,GetLLM
,ListLLM
) in theAction
enum do not show any explicit connection to the alert system in the code. Since the PR is titled "hotfix for alerts," please update the documentation (or add inline comments) to clarify whether these LLM actions are part of a new alert capability leveraging LLM functionality or if they are meant to serve a different purpose.
src/rbac/role.rs
Outdated
@@ -234,6 +242,7 @@ pub mod model { | |||
Action::GetAlert, | |||
Action::DeleteAlert, | |||
Action::QueryLLM, | |||
Action::GetLLM, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Consider explaining the LLM permission distribution
You've added GetLLM
permission to the editor role, which seems appropriate. However, the other LLM actions (AddLLM
, DeleteLLM
, ListLLM
) are not explicitly assigned to any role builder except possibly admin through Action::All
.
Are the other LLM actions intentionally restricted to admin users only? This access control decision should be documented.
🏁 Script executed:
#!/bin/bash
# Check if the admin privileges would include the missing LLM actions
grep -A 10 "admin_perm_builder" src/rbac/role.rs
Length of output: 928
Clarify LLM Permission Distribution Between Editor and Admin Roles
The current implementation explicitly grants the GetLLM
action to the editor role, while the admin role—via admin_perm_builder()
—receives Action::All
, which implicitly covers AddLLM
, DeleteLLM
, and ListLLM
. Please update the code documentation and/or inline comments in src/rbac/role.rs
to clearly explain this design decision and confirm that the additional LLM actions are intentionally restricted to admin users only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
src/rbac/role.rs (1)
209-215
: Verify that LLM actions align with RBAC design principlesThe admin role gets all privileges through
Action::All
, which implicitly includes the new LLM actions. However, it would be more explicit and maintainable to list all actions individually in admin_perm_builder() instead of using theAction::All
shortcut.Consider refactoring admin_perm_builder() to explicitly list all actions for better clarity:
fn admin_perm_builder() -> RoleBuilder { RoleBuilder { - actions: vec![Action::All], + actions: vec![ + // Explicitly list all actions, including LLM actions + Action::Ingest, + Action::Query, + // ... other actions + Action::AddLLM, + Action::DeleteLLM, + Action::GetLLM, + Action::ListLLM, + // ... remaining actions + ], stream: Some("*".to_string()), tag: None, } }This approach makes the admin permissions more explicit and maintainable as new actions are added.
src/alerts/alerts_utils.rs (3)
105-106
: Query structure simplified to use stream directlyThis change implements the transition from a query-based to a stream-based model for alerts. Now instead of executing a custom query, we're selecting all data from the specified stream.
This simplifies the alert configuration model but may impact alert precision if previous queries had more specific filtering conditions.
Consider adding a comment explaining this architectural change for future maintainers, especially since this appears to be a significant shift in how alerts work.
137-138
: Ensure parameters are correctly documentedThe function signature for
execute_base_query
accepts both the query object and an original_query string parameter, which is now being used to pass the new select query. This parameter naming might be confusing as it now contains the new query rather than the original one.Consider renaming the parameter from
original_query
toquery_string
orselect_query
to better reflect its current purpose:- async fn execute_base_query( - query: &crate::query::Query, - original_query: &str, - ) -> Result<DataFrame, AlertError> { + async fn execute_base_query( + query: &crate::query::Query, + query_string: &str, + ) -> Result<DataFrame, AlertError> {
139-141
: Query debugging message references renamed fieldThe error message still refers to "query" but the field has been renamed to "stream" in the alert configs. This could be confusing when debugging.
Update the error message to reflect the new terminology:
- AlertError::CustomError(format!("Table name not found in query- {}", original_query)) + AlertError::CustomError(format!("Table name not found in stream query- {}", original_query))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/workflows/coverage.yaml
(1 hunks)src/alerts/alerts_utils.rs
(2 hunks)src/alerts/mod.rs
(7 hunks)src/handlers/http/alerts.rs
(5 hunks)src/rbac/role.rs
(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- src/alerts/mod.rs
- src/handlers/http/alerts.rs
- .github/workflows/coverage.yaml
🧰 Additional context used
🧬 Code Definitions (1)
src/alerts/alerts_utils.rs (1)
src/handlers/http/query.rs (1)
query
(69-155)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: coverage
🔇 Additional comments (4)
src/rbac/role.rs (3)
51-54
: New LLM actions added to the RBAC modelYou've added four new LLM-related actions to the RBAC model, which is great for expanding the LLM management capabilities. This aligns with the PR objectives of implementing a hotfix for alerts and adding new actions.
122-125
: LLM actions properly integrated into the permission modelThe new LLM actions are correctly added to the permission building logic as unit-level permissions, consistent with other global actions that aren't stream-specific.
245-245
: Clarify LLM Permission Distribution Between Editor and Admin RolesThe current implementation explicitly grants only the
GetLLM
action to editor, writer, and reader roles, while the admin role—viaadmin_perm_builder()
—receivesAction::All
, which implicitly coversAddLLM
,DeleteLLM
, andListLLM
. This appears to be intentional, but lacks documentation explaining this design decision.Please update the code documentation and/or inline comments to clearly explain why only the
GetLLM
permission is granted to non-admin roles, while other LLM actions are restricted to admin users.Also applies to: 288-288, 311-311
src/alerts/alerts_utils.rs (1)
122-123
:❓ Verification inconclusive
Logical plan creation now uses stream instead of query
The logical plan is now created from a simple "SELECT * FROM {stream}" query rather than a potentially more complex custom query. This is consistent with the change in the evaluate_alert function.
However, I'm curious if this simplification might lose some filtering capabilities that were previously available through custom queries.
🏁 Script executed:
#!/bin/bash # Check for any existing alerts that use complex queries that might break with this change grep -r "query:" --include="*.rs" --include="*.yaml" --include="*.json" src/alerts/ # Also check if there are any tests that validate query functionality in alerts find tests -type f -name "*alert*test*" -exec grep -l "query" {} \;Length of output: 924
Attention: Verify Logical Plan Simplification Impact
The change to create the logical plan with a simple
"SELECT * FROM {stream}"
query (as seen insrc/alerts/alerts_utils.rs
, lines 122–123) aligns with the updated behavior in theevaluate_alert
function. Our codebase search shows that while query-related functions (likeget_tables_from_query
and associated imports) still exist, there’s no clear evidence from the repository (and no dedicated tests found in a standardtests/
directory) that rely on more complex custom queries in this context.To ensure that no required filtering functionality is lost, please verify:
- That the simplified query construction does not impact any filtering logic expected from alerts processing.
- If any integration or unit tests (perhaps located outside a standard
tests/
directory) should be updated or added to cover this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.github/workflows/coverage.yaml (1)
36-36
: Remove Trailing Whitespace on Line 36There is extra trailing whitespace on line 36 as indicated by YAMLlint. Removing this whitespace will help maintain clean formatting and avoid potential linting issues.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 36-36: trailing spaces
(trailing-spaces)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/workflows/coverage.yaml
(1 hunks)src/alerts/alerts_utils.rs
(2 hunks)src/alerts/mod.rs
(7 hunks)src/handlers/http/alerts.rs
(5 hunks)src/rbac/role.rs
(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- src/handlers/http/alerts.rs
- src/alerts/alerts_utils.rs
- src/alerts/mod.rs
- src/rbac/role.rs
🧰 Additional context used
🪛 YAMLlint (1.35.1)
.github/workflows/coverage.yaml
[error] 36-36: trailing spaces
(trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (1)
.github/workflows/coverage.yaml (1)
37-47
: New "Patch librdkafka CMakeLists.txt" Step: Functional and ClearThe newly added step correctly locates the
librdkafka
directory within the Cargo registry and updates the minimum required CMake version from 3.2 to 3.5. The conditional handles the error case by exiting with a non-zero status if the directory isn’t found.Please verify that this change aligns with all downstream build requirements and that the new CMake version (3.5) is compatible with the systems relying on this library. Optionally, consider adding more logging or handling scenarios where multiple directory matches might exist.
2184320
to
a68bd8d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
.github/workflows/coverage.yaml (2)
37-64
: Review of the New "Find and fix librdkafka CMakeLists.txt" StepThis new step script correctly:
- Downloads the Rust package via
cargo fetch
.- Locates the
rdkafka-sys
package directory.- Finds the
CMakeLists.txt
file for librdkafka.- Creates a backup and updates the minimum required CMake version from 3.2 to 3.5.
Suggestions:
- Consider adding additional error handling for the directory lookup. For example, if
find ~/.cargo/registry/src -name "rdkafka-sys-*" -type d | head -n 1
returns an empty string, subsequent commands referencing$RDKAFKA_SYS_DIR
may fail.- It might be beneficial to check the exit status of
cargo fetch
to ensure that the package has been downloaded successfully before proceeding.- While the current
diff
with|| true
prevents the step from failing, consider whether you want to alert on unexpected file differences in a more explicit manner.🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 41-41: trailing spaces
(trailing-spaces)
[error] 45-45: trailing spaces
(trailing-spaces)
[error] 48-48: trailing spaces
(trailing-spaces)
[error] 51-51: trailing spaces
(trailing-spaces)
[error] 54-54: trailing spaces
(trailing-spaces)
[error] 57-57: trailing spaces
(trailing-spaces)
41-41
: Remove Trailing WhitespaceStatic analysis has flagged trailing whitespace on lines 41, 45, 48, 51, 54, and 57. Removing these will improve the file’s overall cleanliness and maintainability.
Also applies to: 45-45, 48-48, 51-51, 54-54, 57-57
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 41-41: trailing spaces
(trailing-spaces)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/workflows/coverage.yaml
(1 hunks)src/alerts/alerts_utils.rs
(2 hunks)src/alerts/mod.rs
(7 hunks)src/handlers/http/alerts.rs
(5 hunks)src/rbac/role.rs
(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- src/alerts/alerts_utils.rs
- src/handlers/http/alerts.rs
- src/alerts/mod.rs
- src/rbac/role.rs
🧰 Additional context used
🪛 YAMLlint (1.35.1)
.github/workflows/coverage.yaml
[error] 41-41: trailing spaces
(trailing-spaces)
[error] 45-45: trailing spaces
(trailing-spaces)
[error] 48-48: trailing spaces
(trailing-spaces)
[error] 51-51: trailing spaces
(trailing-spaces)
[error] 54-54: trailing spaces
(trailing-spaces)
[error] 57-57: trailing spaces
(trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: coverage
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
src/alerts/alerts_utils.rs (1)
630-632
:⚠️ Potential issueCritical bug: incorrect slice index for second aggregate
At line 641,
let agg2 = &self.aggregates.aggregate_config[0];
reuses index0
instead of1
, causing the same config to be validated twice. This likely prevents the second aggregate check from running correctly.Apply this patch to fix the slice index:
- let agg2 = &self.aggregates.aggregate_config[0]; + let agg2 = &self.aggregates.aggregate_config[1];Also applies to: 640-644
🧹 Nitpick comments (1)
src/alerts/alerts_utils.rs (1)
660-667
: Aggregate & condition columns extractionUsing a
HashSet
to collect required columns ensures unique references. The logic inget_agg_config_cols
andget_condition_cols
looks correct aside from the earlier index issue. Keep in mind that any mismatch in index references could cause missing columns.Consider adding unit tests for multiple conditions with AND/OR to verify no columns are missed or duplicated.
Also applies to: 677-681, 693-695, 700-701
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/alerts/alerts_utils.rs
(12 hunks)src/alerts/mod.rs
(13 hunks)
🧰 Additional context used
🪛 GitHub Actions: Lint, Test and Coverage Report
src/alerts/mod.rs
[error] 272-272: method from_str
can be confused for the standard trait method std::str::FromStr::from_str
. Consider implementing the trait std::str::FromStr
or choosing a less ambiguous method name.
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
🔇 Additional comments (13)
src/alerts/alerts_utils.rs (6)
21-31
: Imports Look GoodThese added imports for
LogicalOperator
,Aggregate*
structs, and expansions indatafusion
appear correct for supporting stream-based alert queries and aggregates.Also applies to: 43-43
101-105
: Proper use ofalert.stream
inevaluate_alert
Switching from
alert.query
toalert.stream
here looks consistent with the new stream-based approach. The logic—preparing the query, building aSELECT * FROM ...
, and then fetching a DataFrame—seems correct.
146-147
: Confirm slicing of aggregate configurationsIn
evaluate_aggregates
, slicing&agg_config.aggregate_config[0..2]
for complex logic (whenoperator
isSome(..)
) vs.[0..1]
for a single condition is correct. Ensure that caller code always provides at least the required number of aggregates.Would you like to verify usage using a quick script to confirm that no calls pass fewer aggregates than required?
Also applies to: 153-154
206-209
: Extended operators inevaluate_condition
The additions for
Equal
,NotEqual
,GreaterThanOrEqual
, andLessThanOrEqual
align with the newly introduced alert operators. This implementation looks correct.
216-217
: Logical operators incalculate_final_result
Using
Iterator::all()
for AND logic andIterator::any()
for OR logic is concise and readable.
708-709
: Rolling window frequency retrievalExtracting
eval_frequency
fromrolling_window
is consistent with the newEvalConfig
.src/alerts/mod.rs (7)
186-197
: Renamed and expandedAlertOperator
Introducing
Equal
,NotEqual
,GreaterThanOrEqual
,LessThanOrEqual
,IsNull
, andIsNotNull
is a clear improvement in expressiveness. TheDisplay
implementation is also consistent with these new variants.Also applies to: 205-210
301-303
: Aggregates and Aggregation FunctionsAdjusting
AggregateFunction
for display logging is fine. The enumerations in lines 311-319 match up with the text-based representation.Also applies to: 311-319
339-341
: Condition configuration changesSwitching from
AlertOperator
toWhereConfigOperator
is consistent with the new design. The indexing usage for conditions[0]
and[1]
is correct for AND/OR logic.Also applies to: 346-348, 354-357
377-380
: Struct expansions forAggregateConfig
,Aggregates
,GroupBy
, andLogicalOperator
These modifications appear structurally sound. They align with treating aggregates and their condition sets in a more flexible, typed manner.
Also applies to: 385-387, 395-398, 401-405
474-478
: Adoption ofstream
andaggregates
inAlertRequest
andAlertConfig
Renaming
query
→stream
andaggregate_config
→aggregates
is consistent with the stream-based approach. This also aligns with changes inalerts_utils.rs
.Also applies to: 488-492, 506-510
519-523
: Validation flow updatesUsing a rolling-window approach and verifying the presence of columns in the new
SELECT * FROM {self.stream}
logic is sound. The updated validations ensure well-formed alerts.Also applies to: 530-530, 561-563
833-834
: Authorization check renamed to stream-based usageSwitching from
alert.query
to buildingSELECT * FROM {alert.stream}
and checking permission withuser_auth_for_query
is logical.
Added new actions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/alerts/mod.rs (1)
249-292
:from_str
approach might collide withstd::str::FromStr
.A prior reviewer suggested directly implementing the
FromStr
trait or renaming this method to avoid confusion. If this feedback still applies, consider addressing it now.
🧹 Nitpick comments (2)
.github/workflows/coverage.yaml (2)
65-92
: Duplicate step with identical name.You have introduced a second "Find and fix librdkafka CMakeLists.txt" step (lines 65–91) that duplicates the existing step at lines 37–63. Consider removing this duplicate or giving it a different name to avoid confusion and unnecessary build steps.
- - name: Find and fix librdkafka CMakeLists.txt - run: | - ... + # Remove or rename this duplicate block to avoid confusion🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 69-69: trailing spaces
(trailing-spaces)
[error] 73-73: trailing spaces
(trailing-spaces)
[error] 76-76: trailing spaces
(trailing-spaces)
[error] 79-79: trailing spaces
(trailing-spaces)
[error] 82-82: trailing spaces
(trailing-spaces)
[error] 85-85: trailing spaces
(trailing-spaces)
69-69
: Remove trailing spaces.The static analyzer flagged trailing spaces on these lines. Trim them to comply with style guidelines.
- cargo fetch + cargo fetchAlso applies to: 73-73, 76-76, 79-79, 82-82, 85-85
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 69-69: trailing spaces
(trailing-spaces)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.github/workflows/coverage.yaml
(1 hunks)src/alerts/alerts_utils.rs
(12 hunks)src/alerts/mod.rs
(13 hunks)src/handlers/http/alerts.rs
(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/handlers/http/alerts.rs
🧰 Additional context used
🪛 YAMLlint (1.35.1)
.github/workflows/coverage.yaml
[error] 69-69: trailing spaces
(trailing-spaces)
[error] 73-73: trailing spaces
(trailing-spaces)
[error] 76-76: trailing spaces
(trailing-spaces)
[error] 79-79: trailing spaces
(trailing-spaces)
[error] 82-82: trailing spaces
(trailing-spaces)
[error] 85-85: trailing spaces
(trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (20)
src/alerts/alerts_utils.rs (11)
21-26
: Import changes look fine.You are now using DataFusion’s
Operator
andBinaryExpr
for condition building. This appears consistent with the new operators introduced below.
31-31
: Newly importedLogicalOperator
.Including
LogicalOperator
here aligns with the multi-condition logic (AND/OR) used in subsequent functions.
43-43
: Extended imports of alert-related types.Bringing in all relevant alert structs and enums (like
WhereConfigOperator
,AggregateConfig
, etc.) is straightforward and maintains consistency with the refactor.
112-118
: Hardcoded end_time to "now".By forcing
end_time
to "now`, you assume the window extends up to the current time. If future changes require a flexible end time, this may need revisiting.Would you like to confirm if all rolling-window alerts rely on “now” for their end time?
145-145
: Function signature update.This function now accepts
&Aggregates
instead of a previous config type. No immediate issues.
153-154
: Slicing two aggregates vs one.You rely on slicing
[0..2]
ifoperator
isSome
, and[0..1]
ifNone
. The earlier validation checks for length, so this should be safe. Just ensure you maintain code alignment if you ever add new logic for more aggregates.
206-209
: Extended operators inevaluate_condition
.Support for
Equal
,NotEqual
,GreaterThanOrEqual
, andLessThanOrEqual
is correct. The_ => unreachable!()
will panic if new variants get introduced without updating this match.Please confirm future coverage by ensuring new operators are tested or lead to compile-time checks.
214-217
: Final result calculation withLogicalOperator
.Using
.all(...)
forAND
and.any(...)
forOR
is correct and concise.
250-260
: Adjusted template placeholders in alert message.Displaying the
aggregate_function
,column
, andoperator
with the actual value helps debugging. Looks good.
305-346
: Generating expressions for multiple aggregates and filters.The code carefully handles two aggregate conditions (for AND/OR) or a single one otherwise. The design is straightforward to follow.
102-105
:❓ Verification inconclusive
Replacing old query references with
alert.stream
.Constructing
select_query
fromalert.stream
is a clear approach. Verify all other uses of the oldalert.query
have been updated to prevent mismatched references.
🏁 Script executed:
#!/bin/bash # Searching for old references to `alert.query` rg "alert\.query"Length of output: 19
Action Required: Verify All References to
alert.query
Are UpdatedThe updated code snippet in
src/alerts/alerts_utils.rs
(lines 102–105) correctly constructsselect_query
usingalert.stream
. An automated search for the oldalert.query
reference returned no hits, suggesting the replacement was completed. However, because the search output was unexpectedly empty, please perform a manual verification to ensure no lingeringalert.query
references exist elsewhere in the codebase.
- Confirm that no files reference
alert.query
(in addition to using automated tools).- Ensure that any removed or replaced references in other parts of the code are consistent with this update.
src/alerts/mod.rs (9)
186-197
: Renamed and newly introduced operators inAlertOperator
.The renamed variants (e.g.,
Equal
,NotEqual
,GreaterThanOrEqual
, etc.) are more aligned with typical comparisons. AddingIsNull
andIsNotNull
provides a broader feature set.
214-247
: NewWhereConfigOperator
enum.Great expansion to handle more specialized string matching (e.g.,
BeginsWith
,EndsWith
, etc.). Make sure your front-end or any integrators know about these changes to keep consistent operator usage.
468-478
: Changed alert fields tostream
,aggregates
, andeval_config
.Switching from
query
tostream
and from old aggregates to the newAggregates
struct is consistent with the overall refactor.
501-514
: Updates inAlertConfig
.Moving to
stream
,aggregates
, andeval_config
ensures uniform usage of new naming conventions throughout.
518-523
:modify
function replaced old fields with new ones.All references to
alert.query
or older config fields are replaced. This keeps the config object consistent after modifications.
611-611
: Ensuring correct number of conditions for AND/OR or single condition.The error messages and logic for 2 or 1 condition are clear. This prevents undefined slicing if the user misconfigures the alert.
Also applies to: 619-619
630-644
: Same logic repeated for aggregates array slices.The same approach is used here: 2 aggregates if operator is present, 1 otherwise. The slice checks remain consistent with the earlier conditions approach.
663-684
: Column extraction inget_agg_config_cols
.Good usage of sets to gather columns from both the aggregate config and conditional clauses. This helps ensure you have all columns needed for the final DataFrame.
708-710
: Retrievingeval_frequency
from rolling window.The function remains straightforward, just returning the underlying frequency. No issues found.
- Added helper methods to fetch base query, eval details - Ensured DataFusion accepts columns and table names as is - Removed clockwerk scheduler - Modifed alert config JSON - Modified triggered alert message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/alerts/mod.rs (1)
272-276
: Potential naming conflict in WhereConfigOperatorThe
from_str
method might conflict with the standardFromStr
trait. This was previously flagged in a review.
WhereConfigOperator
and potential naming conflictThe
WhereConfigOperator
enum with string-based comparisons, plusfrom_str
logic, is coherent. However, the pipeline flaggedfrom_str
(line 272) for colliding withstd::str::FromStr::from_str
.Consider implementing the
FromStr
trait directly or renaming the helper to something likeparse_where_op
to avoid confusion and follow Rust conventions:- pub fn from_str(s: &str) -> Option<Self> { + impl std::str::FromStr for WhereConfigOperator { + type Err = (); + fn from_str(s: &str) -> Result<Self, Self::Err> { + // ... + Ok(op) + } + }
🧹 Nitpick comments (3)
src/handlers/http/modal/query_server.rs (1)
131-141
: Alert loading moved to server initialization phaseThis change moves alerts loading from the
load_on_init
function to the server initialization phase, spawning it as a background task. This aligns with the architectural changes in the alerts system.However, there's a commented-out error handling line. Consider whether alert loading failures should prevent server startup or if the current approach of just logging errors is sufficient.
// Run the alerts scheduler tokio::spawn(async { match ALERTS.load().await { Ok(_) => info!("Alerts loaded successfully"), Err(e) => { error!("Failed to load alerts: {}", e); - // return Err(e); + // Alert loading failures are logged but don't prevent server startup } }; });src/sync.rs (1)
239-245
: Duplicate alert detection is sound, but consider revisiting the policy.
When an alert with the same ID already exists, the code logs an error and continues. This is acceptable if you strictly want to prevent re-creation. However, if you ever need to replace an existing alert, you might want to handle that scenario differently.src/alerts/alerts_utils.rs (1)
190-194
: Potential unnecessary unwrappingThe call to
or(None)
on anOption<String>
is redundant sinceOption::or
would return the first option if it'sSome
. This doesn't affect functionality but could be simplified.- agg.conditions - .as_ref() - .map(|config| config.generate_filter_message()) - .or(None) + agg.conditions + .as_ref() + .map(|config| config.generate_filter_message())
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
src/alerts/alerts_utils.rs
(13 hunks)src/alerts/mod.rs
(23 hunks)src/alerts/target.rs
(6 hunks)src/handlers/airplane.rs
(1 hunks)src/handlers/http/alerts.rs
(6 hunks)src/handlers/http/modal/mod.rs
(1 hunks)src/handlers/http/modal/query_server.rs
(3 hunks)src/handlers/http/modal/server.rs
(1 hunks)src/rbac/role.rs
(1 hunks)src/sync.rs
(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/rbac/role.rs
🧰 Additional context used
🧬 Code Definitions (4)
src/handlers/http/alerts.rs (2)
src/alerts/alerts_utils.rs (1)
user_auth_for_query
(59-89)src/alerts/mod.rs (2)
update_state
(444-480)update_state
(878-912)
src/alerts/alerts_utils.rs (1)
src/alerts/mod.rs (2)
get_eval_window
(735-742)get_eval_frequency
(730-734)
src/handlers/http/modal/server.rs (2)
src/handlers/http/alerts.rs (1)
update_state
(124-162)src/alerts/mod.rs (2)
update_state
(444-480)update_state
(878-912)
src/alerts/mod.rs (2)
src/sync.rs (1)
alert_runtime
(233-278)src/utils/mod.rs (1)
user_auth_for_query
(85-117)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: coverage
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-unknown-linux-gnu
🔇 Additional comments (44)
src/handlers/airplane.rs (1)
172-172
: Improved SQL query safety with proper table name quotingAdding double quotes around the table name in the SQL query properly handles cases where stream names might contain special characters or reserved SQL keywords. This is a good security practice to prevent potential SQL injection issues.
src/handlers/http/modal/query_server.rs (2)
21-21
: LGTM: New import for ALERTSThis import is needed for the new alerts loading functionality.
37-37
: LGTM: Updated tracing import to include both error and infoCorrectly expanded the tracing import to include the info macro for better logging coverage.
src/handlers/http/modal/mod.rs (1)
168-168
: Removed alerts loading from load_on_init functionThe update from
future::join4
tofuture::join3
reflects the removal of alerts loading from this function, which is now handled in the QueryServer initialization. This change is aligned with the architectural refactoring of the alert system.src/handlers/http/modal/server.rs (1)
231-235
: Improved alert state update endpointThe route handler for PUT requests to the
/alerts/{alert_id}
endpoint has been updated to usealerts::update_state
instead ofalerts::modify
. Based on the provided code snippets, this change implements a more RESTful approach where state transitions are handled via query parameters (e.g.,?state=silenced
), and validation is performed to ensure only valid state transitions are allowed.This is a better design for state management in a REST API.
src/sync.rs (5)
20-20
: No functional concerns.
This import addition is straightforward and has no apparent issues.
23-23
: No functional concerns.
This import is appropriate and has no apparent issues.
29-29
: No functional concerns.
The import fromalerts_utils
andAlertTask
is aligned with the new alert runtime logic.
231-240
: Alert runtime initialization looks good.
The top-level function signature and documentation are clear, indicating an asynchronous runtime. This neatly replaces the old scheduling mechanism.
265-275
: Alert deletion logic is clear.
Removing from theHashMap
and aborting the underlying task correctly stops its evaluation loop.src/handlers/http/alerts.rs (10)
19-20
: Import changes.
These import adjustments are minimal and do not introduce concerns.
22-24
: Removal of commented-out code is fine.
No issues with removing or commenting old references if they are no longer needed.
59-60
: Query-based user authorization looks good.
Usingalert.get_base_query()
centralizes the logic for constructing the query to validate user access.
72-73
: Starting the alert task.
InvokingALERTS.start_task(alert.clone())
appears consistent with the newly introduced alert runtime.
85-86
: Authorization for GET endpoint looks good.
Again, ensuring that users can only retrieve alerts they have permission to query is appropriate.
100-101
: Authorization for DELETE endpoint looks good.
This is the same pattern of building a base query for permissions.
124-162
: High-level note on the newupdate_state
handler.
Overall, this function is structured clearly, first retrieving the alert and checking authorization before updating the state.
145-148
: Potential out-of-bounds error when parsing query string.
Splitting on'='
and immediately accessingtokens[1]
may panic if the query string does not include'='
or is malformed. Consider adding a length check before accessing indices.let tokens = query_string.split('=').collect::<Vec<&str>>(); -if tokens.len() < 2 { - return Err(AlertError::InvalidStateChange("Missing parameter".to_string())); -} let state_key = tokens[0]; let state_value = tokens[1];
157-159
: State update logic is appropriate.
Converting the string to anAlertState
and callingupdate_state
aligns with the rest of the alert system design.
160-161
: Returning the updated alert is consistent.
Good final step to confirm the new alert state.src/alerts/target.rs (6)
38-40
: Serialization rename to camelCase.
Changing from another case tocamelCase
may break compatibility for existing clients. Verify that downstream systems or front-end usage is updated accordingly.
52-53
: Target struct rename attribute updated.
This also changes the serialized field names to camelCase. Ensure your APIs or front-end code confirm to this new JSON format.
189-190
: Verifier struct rename attribute updated.
Aligned with camelCase. The same caution about backward compatibility applies here.
232-242
: Enum variants with custom renames.
Renaming the variants toslack
andalertManager
is consistent with camelCase. Double-check that clients expect these new variant names.
290-290
:OtherWebHook
struct rename.
Again, adopting camelCase might break existing consumers.
327-328
:AlertManager
struct rename.
Be sure to synchronize this rename with any other parts of your system that consume alert manager configuration in JSON form.src/alerts/alerts_utils.rs (10)
28-28
: Import usage efficiency in datafusion::logical_exprThe explicit imports of
BinaryExpr, Literal, Operator
from datafusion::logical_expr indicate new regex functionality, but they're only used in a small part of the file. This is good practice for clarity.
106-109
: Improved query generation approachThe alert evaluation now correctly builds queries from the alert.stream instead of using a hardcoded query. This makes authorization checks more consistent across the application.
210-214
: Complete operator enum coverageThe operator enum now correctly handles all comparison operations including equals, not equals, and range operators. This improves the flexibility of the alert system.
217-222
: Correct logical operator handlingThe function now properly evaluates alert conditions based on logical operators (AND/OR). This is a critical improvement for complex alert scenarios.
232-236
: Enhanced alert messagingThe alert message now includes important context about evaluation window and frequency which helps with troubleshooting and understanding alert behavior.
393-397
: Value type handling improvementThe code now intelligently handles both numeric and string values by attempting to parse as a number first. This makes the alert system more flexible with different data types.
401-431
: Comprehensive operator supportThe match_alert_operator function now supports a wide range of operators including string operations like "contains" and "begins with". This significantly enhances the filtering capabilities.
435-445
: Clean aggregate function mappingThe new match_aggregate_operation function provides a clean way to map between aggregate function types and their implementations, improving code readability.
447-467
: Smart value handling with NumberOrStringThe introduction of NumberOrString enum elegantly solves the problem of handling both numeric and string literals in query conditions. This is a clean design pattern.
116-118
:❓ Verification inconclusive
Simplified time range handling
Hardcoding the end time to "now" instead of calculating it from the rolling window is a cleaner approach, but ensure this doesn't affect time-sensitive alerts.
Verify that using "now" as the end time is consistent with all alert use cases:
🏁 Script executed:
#!/bin/bash # Check for any code references that might be expecting a custom end time rg -A 3 "eval_end" --type rustLength of output: 354
Action Required: Confirm Consistent End Time Usage Across Alerts
The updated approach in
src/alerts/alerts_utils.rs
(lines 116–118) now hardcodes the end time to"now"
instead of usingrolling_window.eval_end
. However, our search shows that elsewhere in the code—specifically insrc/alerts/mod.rs
—there are still references to a customeval_end
(e.g., in the alert struct and when botheval_start
andeval_end
are used). Please verify that replacing the custom end time with"now"
does not adversely affect any time-sensitive alert evaluations. If any components require the original behavior, consider updating them accordingly.src/alerts/mod.rs (8)
54-64
: Task management refactored with message passingThe ALERTS initialization now uses a message-passing system with mpsc channels, which is more scalable than the previous approach. This is a good architectural improvement.
73-75
: Clear task representation with AlertTask enumThe AlertTask enum provides a clean way to represent different alert operations (Create/Delete), making the code more maintainable and self-documenting.
223-256
: Comprehensive operator set with WhereConfigOperatorThe new WhereConfigOperator enum provides a rich set of comparison operators, especially for string values, significantly enhancing the filtering capabilities of alerts.
443-481
: Robust state transition implementationThe new AlertState implementation adds proper state transition validation to prevent invalid state changes. This is a good defensive programming practice that will prevent inconsistent alert states.
563-565
: Simplified query generationThe get_base_query function provides a consistent way to generate base queries from stream names, which helps maintain consistency across the application.
600-617
: Proper schema validationThe validation now correctly checks that referenced columns exist in the stream schema, preventing runtime errors from invalid column references.
735-742
: Enhanced logging with evaluation window detailsThe get_eval_window method provides useful information for alert messages, improving debuggability and user experience when analyzing alert behavior.
936-943
: Task management through channelsThe start_task method now uses message passing to manage alert lifecycle, which is a more scalable approach than direct task management. This aligns well with the alert_runtime implementation seen in the context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/sync.rs (1)
249-270
: Spawned task logic and retry behavior
- Once the task fails more than 3 times, it breaks from the loop, but the corresponding entry remains in the
alert_tasks
map. You might consider removing it from the map before exiting, avoiding references to tasks that have already terminated.- If the
evaluate_alert
recovers on a subsequent iteration, you may want to resetsleep_duration
back to the original frequency, unless the permanent fallback to 1 minute is intentional.+ // Example approach to remove the completed entry from the map: + // (In the main runtime loop, after the task ends, do something like:) + if handle.await.is_ok() { + alert_tasks.remove(&id); + }src/alerts/mod.rs (1)
54-65
: Global static ALERTS initializationLeveraging
Lazy<Alerts>
to bootstrap the mpsc channel and thread spawn is a succinct pattern. Spawningalert_runtime
in a separate thread is fine, but be mindful that thread-level panics cannot be automatically restarted by Tokio. Check logs or consider more robust crash handling if needed.-thread::spawn(|| alert_runtime(rx)); +thread::spawn(move || { + if let Err(e) = alert_runtime(rx) { + error!("alert_runtime crashed: {e}"); + } +});
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/alerts/mod.rs
(23 hunks)src/sync.rs
(2 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
src/alerts/mod.rs (2)
src/sync.rs (1)
alert_runtime
(233-285)src/utils/mod.rs (1)
user_auth_for_query
(85-117)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: coverage
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-apple-darwin
🔇 Additional comments (42)
src/sync.rs (5)
20-20
: New imports for alert handlingThese added imports (HashMap, mpsc, oneshot, alerts_utils, and AlertTask) seem appropriate for the new alert-related functionality and message passing.
Also applies to: 23-23, 29-29
231-234
: Introducing a dedicated alert task runtimeDefining a separate runtime for alert tasks is a clear design choice, isolating alert-related scheduling from other behaviors. This helps keep the main handler uncluttered.
236-244
: Handling new tasks in the select! loopCreating and tracking per-alert tasks in a HashMap is a valid approach for managing concurrency. The check against duplicates prevents conflicting tasks for the same alert ID.
272-280
: Deleting tasks and aborting the handleAborting the spawned task and removing it from the HashMap assures no further evaluation attempts occur. Logging the result clarifies which alerts were successfully removed.
283-284
: Function exitReturning
Ok(())
at the end provides a clean exit. No further changes needed.src/alerts/mod.rs (37)
23-24
: New imports for parsing logicThe added derive_more imports (
FromStr
andFromStrError
) facilitate simpler conversion from string representations for enums likeAlertState
orWhereConfigOperator
.
31-32
: Thread usage and oneshot channelsIntroducing
std::thread
and oneshot channels suggests a separation of concerns between the main async runtime and any blocking or alternative concurrency approach. Ensure that tasks that might block do not stall the primary async executor.
45-45
: Importing the newalert_runtime
Pulling in
alert_runtime
fromsrc/sync.rs
creates a clear boundary:Alerts
can spawn tasks by forwarding them to this dedicated runtime.
66-70
: Alerts struct and newsender
fieldAdding the
sender: mpsc::Sender<AlertTask>
field centralizes how alerts schedule creation/deletion tasks. This design looks consistent and maintainable.
72-75
: AlertTask enum definitionThis enum neatly encapsulates create and delete alert commands. Scaling to more task types in the future will be trivial (e.g., AlertTask::Update).
93-93
: IntroducingAggregateResult
with a Debug deriveDeclaring an
AggregateResult
type clarifies how results are passed around or logged. TheDebug
derive is helpful for diagnosing evaluation outcomes.
124-124
: Formatting the alert notification messageThe updated format string now includes a clearer layout, showing an alert name, timestamp, severity, and a placeholder for additional messages.
201-207
: Renamed or modified operator variants inAlertOperator
Renaming variants to
Equal
,NotEqual
,GreaterThanOrEqual
,LessThanOrEqual
(with matching serde tags) improves clarity. Ensuring these changes are reflected in calling code avoids serialization/deserialization mismatches.
215-218
: Display implementation for new variantsWriting the correct symbols (
=
,!=
,>=
,<=
) is crucial for generating queries. These lines correctly align with the updated operator variant names.
223-280
: IntroducingWhereConfigOperator
and related logic
- Providing a
FromStr
implementation and aserde(rename_all = "camelCase")
ensures consistent parsing from user input or JSON.- The broad set of operators (
IsNull
,IsNotNull
,ILike
, etc.) accommodates common filtering use cases.- Exposing
as_str()
helps with query building.
Overall, this is a flexible, readable solution for where-operator handling.
281-286
:Display
implementation forWhereConfigOperator
Delegating to
as_str()
keeps the display logic DRY. This approach is easy to maintain if future operators are added.
290-308
:AggregateFunction
and Display for aggregatesRenaming or clarifying the aggregator methods (Avg, Count, Min, Max, Sum) aligns well with typical user expectations. The
Display
implementation ensures they appear consistently in logs or queries.
326-326
: Switchingoperator
toWhereConfigOperator
inConditionConfig
Transitioning from
AlertOperator
toWhereConfigOperator
for finer-grained filtering is a solid improvement, ensuring the correct operator type is used for condition logic.
333-334
: Renaming fields inConditions
Replacing the old
conditions
field withcondition_config
, plus theoperator
now beingOption<LogicalOperator>
, conveys more explicit meaning.
341-345
: Handling compound conditions with AND/ORThis logic constructs a descriptive message for two conditions. Verify indexing carefully; the code always references
[0]
and[1]
but checks the length earlier, so it’s safe.
356-356
: Single-condition fallbackWhen there is no logical operator, only one condition is permitted. This fallback is consistent with the earlier validation logic, preventing out-of-bounds indexing.
363-368
:GroupBy
struct introductionIncluding a
group_by
field in aggregates expands the alerting options to handle grouped aggregations. This is a beneficial step for more complex queries.
372-374
: Broadening AggregateConfig
- Renaming
agg
toaggregate_function
clarifies usage.- Adding optional
conditions
andgroup_by
fields extends aggregate queries to more flexible scenarios.
382-384
:Aggregates
structThe new
operator
(AND/OR) plus anaggregate_config
array of individual aggregate instructions is a more modular representation than the previous approach.
394-401
: LogicalOperator DisplaySurfacing “AND” or “OR” as uppercase aligns with typical SQL usage or textual representations of boolean logic.
424-424
: AddingFromStr
toAlertState
Allowing
AlertState
to be parsed from strings is useful for user-facing or config-based state changes. Just confirm that unrecognized states are handled gracefully (a potential concern for the derivedFromStr
).Would you like to run a quick test verifying that invalid strings produce an error or default state? I can generate a small snippet or script if needed.
443-481
: Refinedupdate_state
methodThe state machine logic is stricter, preventing nonsensical transitions such as Triggered → Triggered or Silenced → Triggered. The final fallback for Resolved → any-other-state ensures correct usage. Logging or clearly surfacing the reason for a blocked state change can help maintain user clarity.
510-513
: Replacingquery
withstream
, plus new aggregates and eval_configUsing
stream
clarifies that the alert references a specific ingestion stream rather than an arbitrary query. This rework (aggregates, eval_config) matches the improved condition definitions.
524-527
: Mappingstream
,aggregates
, andeval_config
from AlertRequest to AlertConfigConsistently forwarding these fields affirms the revised structure. No issues identified.
542-545
: Consolidated fields inAlertConfig
Matching the naming scheme from
AlertRequest
ensures a smooth extension or future changes.
555-558
:modify
method updates relevant fieldsResetting the alert to
Resolved
on modification aligns with the typical approach of revalidating any newly changed alert. The reassignments look correct.
563-564
:get_base_query
utilityReplacing the old
query
usage with a format that referencesself.stream
is straightforward. This method can be extended if more complex base queries become necessary.
570-596
: Validating eval frequency and targets
- Confirming that
RollingWindow.eval_start
is humantime-compatible is a helpful user safeguard.- Checking that the sum of target retry intervals does not exceed
eval_frequency
prevents stuck or repeated notifications.
600-600
: Retrieving columns for aggregator checkingCapturing columns from aggregator configs first ensures we can compare them to the schema for correctness.
602-608
: Fetching schema fromPARSEABLE.get_stream
This gracefully handles a missing or invalid stream via
?
error propagation. The subsequent.fields()
iteration is well-structured.
610-615
: Ensuring the column existsRaising a
CustomError
if the column isn’t in the schema is a direct pointer for users. The format is expressive enough to avoid confusion.
633-633
: Error message for multi-conditions with AND/ORThe user workflow is more predictable with this explicit restriction, preventing more than two conditions in an AND/OR scenario.
639-641
: Validation for single-condition usageSimilarly restricting one condition for non-AND/OR cases ensures consistent indexing in
Conditions
.
650-679
: Aggregates operator validation (AND/OR vs single)
- If
operator
isSome(LogicalOperator)
, exactly two aggregator configs must be used and each internally validated.- Otherwise, exactly one aggregator config must be used.
This helps avoid out-of-bounds references or ambiguous aggregator setups.
685-707
: Gathering columns from aggregator-based conditionsSimilar logic in
get_agg_config_cols
, ensuring columns from each aggregator’s condition are recognized. This comprehensive approach prevents unexpected schema mismatch.
730-743
:get_eval_frequency
andget_eval_window
Splitting out these fields from
EvalConfig
is clean. Both methods are small wrappers for clarity, which reads well.
836-837
: (No explicit annotation was provided for lines 836-837, but referencing them from the snippet context)In-memory and on-disk alert consistency
Line 836 shows reloading each alert from disk and scheduling it via
AlertTask::Create
. This ensures the in-memory map matches persisted alerts consistently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/sync.rs (1)
246-268
: Consider maintaining the original evaluation schedule after retries.When an error occurs, you're setting
sleep_duration = 1
to retry after 1 minute. However, after a successful retry, you might want to restore the original evaluation schedule instead of continuing with the potentially shorter interval.match alerts_utils::evaluate_alert(&alert).await { Ok(_) => { retry_counter = 0; + sleep_duration = alert.get_eval_frequency(); // Restore original frequency after success } Err(err) => { warn!("Error while evaluation- {}\nRetrying after sleeping for 1 minute", err); sleep_duration = 1; retry_counter += 1; if retry_counter > 3 { error!("Alert with id {} failed to evaluate after 3 retries with err- {}", id, err); break; } } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/alerts/mod.rs
(23 hunks)src/sync.rs
(2 hunks)
🧰 Additional context used
🧬 Code Definitions (2)
src/sync.rs (2)
src/alerts/mod.rs (4)
mpsc
(55-55)new
(114-120)new
(155-167)new
(178-184)src/alerts/alerts_utils.rs (1)
evaluate_alert
(102-113)
src/alerts/mod.rs (2)
src/sync.rs (1)
alert_runtime
(233-287)src/alerts/alerts_utils.rs (1)
user_auth_for_query
(59-89)
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: coverage
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (8)
src/sync.rs (3)
231-270
: Improved alert error handling with retry mechanism.The new alert runtime effectively implements the suggested retry logic from previous review comments, replacing the previously criticized "break on first error" approach. The current implementation will make up to 3 retry attempts before giving up, with a reduced evaluation frequency (1 minute) during retries.
271-273
: Good practice: tracking alert tasks by ID.The HashMap provides an efficient way to track and manage alert tasks by their ID, which is a sound approach for proper task lifecycle management.
274-283
: Alert deletion properly handles task cancellation.The implementation for deleting alerts correctly aborts the task and removes it from the tracking map, with appropriate logging for both success and failure cases.
src/alerts/mod.rs (5)
223-286
: New WhereConfigOperator provides comprehensive comparison capabilities.The introduction of
WhereConfigOperator
significantly extends the comparison capabilities with a wide range of operators including NULL checks, pattern matching, and string operations. Implementation follows good practices with proper serialization/deserialization and display formatting.
363-375
: Enhanced aggregate configuration with group by functionality.The introduction of the
GroupBy
structure and updatingAggregateConfig
to include support for grouping and logical conditions improves the expressiveness of alert definitions, allowing for more sophisticated alert criteria.
443-481
: Alert state transition validation prevents invalid state changes.The implementation of
update_state
provides clear rules about allowed state transitions with appropriate error messages when invalid transitions are attempted. This prevents users from making illogical state changes like manually triggering a resolved alert.
936-943
: Alert task management via channels improves separation of concerns.The use of message passing through channels for alert task management is a significant improvement, providing better separation of concerns between the alert management logic and task execution.
945-953
: Alert task deletion properly propagates errors.The implementation for deleting alert tasks correctly propagates errors from the channel send operation with meaningful error messages.
src/alerts/mod.rs
Outdated
pub static ALERTS: Lazy<Alerts> = Lazy::new(|| { | ||
let (tx, rx) = mpsc::channel::<AlertTask>(1); | ||
let alerts = Alerts { | ||
alerts: RwLock::new(HashMap::new()), | ||
sender: tx, | ||
}; | ||
|
||
#[derive(Debug, Default)] | ||
thread::spawn(|| alert_runtime(rx)); | ||
|
||
alerts | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Thread spawn for alert_runtime lacks error handling and shutdown mechanism.
The current implementation spawns a thread for the alert runtime but doesn't keep its join handle, which means there's no way to gracefully shut it down. Additionally, there's no mechanism to handle errors from the alert_runtime function.
Consider using a more robust approach:
pub static ALERTS: Lazy<Alerts> = Lazy::new(|| {
let (tx, rx) = mpsc::channel::<AlertTask>(1);
let alerts = Alerts {
alerts: RwLock::new(HashMap::new()),
sender: tx,
};
- thread::spawn(|| alert_runtime(rx));
+ let handle = thread::spawn(|| {
+ if let Err(e) = alert_runtime(rx) {
+ error!("Alert runtime failed: {}", e);
+ }
+ });
+ // Store the handle somewhere for graceful shutdown
alerts
});
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
pub static ALERTS: Lazy<Alerts> = Lazy::new(|| { | |
let (tx, rx) = mpsc::channel::<AlertTask>(1); | |
let alerts = Alerts { | |
alerts: RwLock::new(HashMap::new()), | |
sender: tx, | |
}; | |
#[derive(Debug, Default)] | |
thread::spawn(|| alert_runtime(rx)); | |
alerts | |
}); | |
pub static ALERTS: Lazy<Alerts> = Lazy::new(|| { | |
let (tx, rx) = mpsc::channel::<AlertTask>(1); | |
let alerts = Alerts { | |
alerts: RwLock::new(HashMap::new()), | |
sender: tx, | |
}; | |
let handle = thread::spawn(|| { | |
if let Err(e) = alert_runtime(rx) { | |
error!("Alert runtime failed: {}", e); | |
} | |
}); | |
// Store the handle somewhere for graceful shutdown | |
alerts | |
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
src/alerts/mod.rs (2)
54-64
:⚠️ Potential issueThread spawn for alert_runtime lacks error handling and shutdown mechanism
The current implementation spawns a thread for the alert runtime but doesn't keep its join handle, which means there's no way to gracefully shut it down. Additionally, there's no mechanism to handle errors from the alert_runtime function.
Implement proper error handling and shutdown for the alert runtime thread:
pub static ALERTS: Lazy<Alerts> = Lazy::new(|| { let (tx, rx) = mpsc::channel::<AlertTask>(1); let alerts = Alerts { alerts: RwLock::new(HashMap::new()), sender: tx, }; - thread::spawn(|| alert_runtime(rx)); + let handle = thread::spawn(|| { + if let Err(e) = alert_runtime(rx) { + error!("Alert runtime failed: {}", e); + } + }); + // Store the handle somewhere for graceful shutdown alerts });
563-565
:⚠️ Potential issueSQL Injection Risk in
get_base_query
: Ensure Stream Name Is Properly EscapedThe current implementation directly interpolates the stream name into the SQL query without properly escaping it, which poses a SQL injection risk.
pub fn get_base_query(&self) -> String { - format!("SELECT * FROM \"{}\"", self.stream) + format!("SELECT * FROM \"{}\"", self.stream.replace("\"", "\"\"")) }
🧹 Nitpick comments (1)
src/sync.rs (1)
249-268
: Good implementation of retry logic with progressive backoffThis implementation properly addresses the past concern about breaking permanently on errors by adding a retry mechanism that attempts up to 3 times with a shorter evaluation interval.
One potential enhancement would be to use an exponential backoff strategy instead of a fixed 1-minute retry interval for more resilient error handling.
- sleep_duration = 1; + // Use an exponential backoff starting at 1 minute, then 2, then 4 + sleep_duration = 1 << retry_counter;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/alerts/mod.rs
(23 hunks)src/alerts/target.rs
(9 hunks)src/sync.rs
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/alerts/target.rs
🧰 Additional context used
🧬 Code Definitions (2)
src/sync.rs (2)
src/alerts/mod.rs (4)
mpsc
(55-55)new
(114-120)new
(155-167)new
(178-184)src/alerts/alerts_utils.rs (1)
evaluate_alert
(102-113)
src/alerts/mod.rs (2)
src/sync.rs (1)
alert_runtime
(233-287)src/utils/mod.rs (1)
user_auth_for_query
(85-117)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: coverage
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (10)
src/sync.rs (3)
231-243
: Alert runtime implementation with proper task managementThis introduces a well-structured alert runtime that manages alert tasks using tokio and a message queue approach. The implementation properly validates alert existence before creating new tasks and handles task lifecycle cleanly.
246-248
: LGTM: Clean task preparationThe code correctly clones the alert and stores the ID for later reference before spawning the task.
274-282
: Alert removal implementation with abort handlingThis properly handles alert deletion by aborting the running task and removing it from the HashMap, with appropriate error logging when alerts don't exist.
src/alerts/mod.rs (7)
72-75
: LGTM: Well-structured AlertTask enumThe AlertTask enum provides a clean interface for the alert system to handle creation and deletion operations through the message queue.
196-208
: Improved AlertOperator naming for better readabilityThe renamed operator variants (
Equal
,NotEqual
, etc.) are more consistent and clearer than the previous names (EqualTo
,NotEqualTo
, etc.).
223-286
: LGTM: Comprehensive WhereConfigOperator enum implementationThe new WhereConfigOperator enum provides a wide range of comparison operators that will allow for more expressive alert conditions, along with proper string conversion methods.
661-666
: Bug fix in aggregate condition validationThis correctly fixes the previously identified bug where agg2 was set to the same value as agg1, causing validation to be performed twice on the first condition instead of once on each.
735-742
: LGTM: New helper method for evaluation window informationThe addition of
get_eval_window()
provides a useful utility for formatting the evaluation window information, which improves code organization.
937-943
: LGTM: Updated task creation with message passingThe refactored
start_task
method correctly uses the new message-passing approach with proper error handling.
947-953
: LGTM: Updated task deletion with message passingThe refactored
delete_task
method correctly uses the new message-passing approach with proper error handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
src/alerts/mod.rs (2)
54-64
: Thread spawn for alert_runtime lacks error handlingThe current implementation spawns a thread for the alert runtime but doesn't keep its join handle, which means there's no way to handle errors from the alert_runtime function.
Consider using a more robust approach:
pub static ALERTS: Lazy<Alerts> = Lazy::new(|| { let (tx, rx) = mpsc::channel::<AlertTask>(10); let alerts = Alerts { alerts: RwLock::new(HashMap::new()), sender: tx, }; - thread::spawn(|| alert_runtime(rx)); + let handle = thread::spawn(|| { + if let Err(e) = alert_runtime(rx) { + error!("Alert runtime failed: {}", e); + } + }); + // Store the handle somewhere for graceful shutdown alerts });
563-565
:⚠️ Potential issueSQL Injection Risk in
get_base_query
The implementation directly interpolates the stream name into the SQL query, which could lead to SQL injection if the stream name isn't properly validated or escaped.
pub fn get_base_query(&self) -> String { - format!("SELECT * FROM \"{}\"", self.stream) + format!("SELECT * FROM \"{}\"", self.stream.replace("\"", "\"\"")) }
🧹 Nitpick comments (4)
src/sync.rs (1)
236-285
: Consider adding graceful shutdown mechanismThe alert runtime doesn't have a mechanism to handle graceful shutdowns. This could lead to resource leaks or incomplete alert evaluations during server shutdown.
Consider modifying the function to accept a shutdown signal:
#[tokio::main(flavor = "multi_thread")] -pub async fn alert_runtime(mut rx: mpsc::Receiver<AlertTask>) -> Result<(), anyhow::Error> { +pub async fn alert_runtime(mut rx: mpsc::Receiver<AlertTask>, mut shutdown: mpsc::Receiver<()>) -> Result<(), anyhow::Error> { let mut alert_tasks = HashMap::new(); // this is the select! loop which will keep waiting for the alert task to finish or get cancelled - while let Some(task) = rx.recv().await { - match task { + loop { + tokio::select! { + Some(task) = rx.recv() => match task { AlertTask::Create(alert) => { // check if the alert already exists if alert_tasks.contains_key(&alert.id) { error!("Alert with id {} already exists", alert.id); continue; } // ...rest of the task handling... } AlertTask::Delete(ulid) => { // ...task deletion logic... } + }, + _ = shutdown.recv() => { + // Cancel all alert tasks + for (id, handle) in alert_tasks.drain() { + handle.abort(); + warn!("Alert with id {} aborted due to shutdown", id); + } + break; + }, + else => break, } } Ok(()) }src/alerts/mod.rs (3)
835-849
: Implement exponential backoff for alert task creation retriesThe current retry logic for sending alert tasks only tries once more with the same delay. Consider implementing an exponential backoff strategy or using a more sophisticated retry mechanism.
match self.sender.send(AlertTask::Create(alert.clone())).await { Ok(_) => {} Err(e) => { - warn!("Failed to create alert task: {e}\nRetrying..."); - // Retry sending the task - match self.sender.send(AlertTask::Create(alert.clone())).await { - Ok(_) => {} - Err(e) => { - error!("Failed to create alert task: {e}"); - continue; - } - } + warn!("Failed to create alert task: {e}\nRetrying with backoff..."); + // Exponential backoff retry + let mut retry_count = 0; + let max_retries = 3; + let mut delay_ms = 100; + + while retry_count < max_retries { + tokio::time::sleep(tokio::time::Duration::from_millis(delay_ms)).await; + match self.sender.send(AlertTask::Create(alert.clone())).await { + Ok(_) => break, + Err(e) => { + retry_count += 1; + delay_ms *= 2; // Exponential backoff + if retry_count >= max_retries { + error!("Failed to create alert task after {retry_count} retries: {e}"); + continue; + } + warn!("Retry {}/{} failed: {e}", retry_count, max_retries); + } + } + } } };
949-956
: Consider buffering messages when channel is fullThe
start_task
method might fail if the channel is full (max capacity is 10). Consider implementing a buffering mechanism or increasing the channel capacity.pub async fn start_task(&self, alert: AlertConfig) -> Result<(), AlertError> { - self.sender - .send(AlertTask::Create(alert)) - .await - .map_err(|e| AlertError::CustomError(e.to_string()))?; + // Try sending with timeout to avoid blocking indefinitely + match tokio::time::timeout( + tokio::time::Duration::from_secs(5), + self.sender.send(AlertTask::Create(alert.clone())), + ).await { + Ok(Ok(_)) => Ok(()), + Ok(Err(e)) => { + // Channel error, likely full or closed + warn!("Failed to send alert task: {}", e); + Err(AlertError::CustomError(format!("Failed to schedule alert: {}", e))) + }, + Err(_) => { + // Timeout occurred + warn!("Timeout while sending alert task"); + Err(AlertError::CustomError("Timeout while scheduling alert".to_string())) + } + } Ok(()) }
733-742
: Add comprehensive logging for alert evaluation windowThe
get_eval_window
method is useful, but could be improved by adding more context to make the logs more useful for debugging.pub fn get_eval_window(&self) -> String { match &self.eval_config { EvalConfig::RollingWindow(rolling_window) => format!( - "Start={}\tEnd={}", - rolling_window.eval_start, rolling_window.eval_end + "Evaluation window: Start={} End={} Frequency={}min", + rolling_window.eval_start, rolling_window.eval_end, rolling_window.eval_frequency ), } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/alerts/mod.rs
(22 hunks)src/alerts/target.rs
(9 hunks)src/sync.rs
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/alerts/target.rs
🧰 Additional context used
🧬 Code Definitions (2)
src/sync.rs (1)
src/alerts/alerts_utils.rs (1)
evaluate_alert
(102-113)
src/alerts/mod.rs (2)
src/sync.rs (1)
alert_runtime
(233-287)src/alerts/alerts_utils.rs (1)
user_auth_for_query
(59-89)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: coverage
🔇 Additional comments (2)
src/sync.rs (2)
231-237
: Good addition of a dedicated runtime for alert tasksThe implementation of a separate runtime for alerts is a good design choice as it separates concerns and provides a clear boundary for alert processing. This new asynchronous approach using message passing is more scalable and maintainable than the previous approach.
246-265
: Improved error handling with retry mechanismYou've addressed the previous review comment by implementing a retry mechanism instead of breaking on the first error. The graduated approach with retry counter and reduced sleep duration is a good pattern for handling transient errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to merge
Added new actions along with a hotfix for alerts
Fixes #XXXX.
Description
This PR has:
Summary by CodeRabbit
New Features
Refactor
Chores