Skip to content

hotfix for alerts #1283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 9, 2025
Merged

hotfix for alerts #1283

merged 4 commits into from
Apr 9, 2025

Conversation

parmesant
Copy link
Contributor

@parmesant parmesant commented Apr 2, 2025

Added new actions along with a hotfix for alerts

Fixes #XXXX.

Description


This PR has:

  • been tested to ensure log ingestion and log query works.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

Summary by CodeRabbit

  • New Features

    • Improved alert configuration with a broader set of comparison operators and enhanced condition management.
    • Updated alert state updates for a more streamlined and secure experience.
    • Refined JSON formatting for alert targets, now using camelCase for better consistency.
  • Refactor

    • Overhauled the alert evaluation and scheduling processes for enhanced reliability and responsiveness.
    • Consolidated alert update endpoints to simplify state transitions and user authorization.
  • Chores

    • Enhanced build workflows by updating external dependency configurations and optimizing CI steps.

Copy link
Contributor

coderabbitai bot commented Apr 2, 2025

Walkthrough

The pull request introduces significant changes to the alerting system. Core modifications include updates to alert evaluation logic, query string construction, and aggregate configuration handling. The HTTP endpoints for alert management have been revised, and the internal alert scheduling mechanism has transitioned to an asynchronous model using message passing. In addition, serialization for alert targets has been updated from lowercase to camelCase and SQL query formatting has been adjusted. A new GitHub Actions workflow step now fixes a CMakeLists.txt file for librdkafka.

Changes

File(s) Change Summary
src/alerts/alerts_utils.rs, src/alerts/mod.rs Updated core alert evaluation and configuration logic. Changes include use of alert.get_base_query(), shifting from eval_type to eval_config, modification of aggregate types and condition operators, introduction of new enums (WhereConfigOperator, AlertTask), and transition to a message-passing system using mpsc for alert tasks.
src/handlers/http/alerts.rs, src/handlers/http/modal/server.rs Revised HTTP alert handlers. The endpoints now use the new query construction via get_base_query(), drop legacy oneshot channel usage in favor of direct task initiation using update_state, and consolidate routes by removing the separate update_state service.
.github/workflows/coverage.yaml Introduced a new workflow step to locate and update the librdkafka CMakeLists.txt (raising the minimum required CMake version from 3.2 to 3.5) and removed the CMAKE_FLAGS environment variable from the Clippy check.
src/alerts/target.rs Updated serialization attributes from lowercase to camelCase for multiple structs and enums. Changed the endpoint field type from String to Url and added explicit renaming for specific enum variants (e.g., Slack to "slack").
src/handlers/airplane.rs Modified SQL query string formatting to include double quotes around the table name, ensuring proper query parsing.
src/rbac/role.rs, src/sync.rs Minor update in src/rbac/role.rs (added a newline at EOF) and a major restructuring in src/sync.rs, replacing the synchronous schedule_alert_task function with the asynchronous alert_runtime that uses a mpsc Receiver, a HashMap for tracking tasks, tokio spawning for alert evaluation, and improved error logging.

Sequence Diagram(s)

sequenceDiagram
    participant Sender as Alert Task Sender
    participant Runtime as Alert Runtime
    participant Evaluator as Alert Evaluator Task

    Sender->>Runtime: Send AlertTask (Create/Delete)
    alt AlertTask is Create
        Runtime->>Runtime: Check if alert exists
        alt Alert does NOT exist
            Runtime->>Evaluator: Spawn periodic evaluation task
            Evaluator-->>Runtime: Return evaluation results/errors
        else Alert exists
            Runtime-->>Sender: Log error & skip task creation
        end
    else AlertTask is Delete
        Runtime->>Runtime: Check if alert exists
        alt Alert exists
            Runtime->>Evaluator: Abort evaluation task
            Runtime-->>Sender: Log deletion warning
        else
            Runtime-->>Sender: Log error: Alert not found
        end
    end
Loading

Possibly related PRs

Suggested labels

for next release

Suggested reviewers

  • nikhilsinhaparseable

Poem

I'm a rabbit on the run,
Hopping through lines of code and fun,
Evaluating alerts with precise delight,
Spawning tasks in the moonlit night,
With queries clear and targets bright,
I nibble bugs out of sight! 🐰✨

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/rbac/role.rs (1)

51-311: Consider adding AddLLM and ListLLM to higher privilege roles

While GetLLM has been added to all role levels (editor, writer, reader), the other LLM actions (AddLLM, DeleteLLM, ListLLM) are only available to admin roles. Consider whether AddLLM and ListLLM should be available to editor roles as they have other creation and listing permissions.

For editor roles, consider adding:

 Action::QueryLLM,
 Action::GetLLM,
+Action::AddLLM,
+Action::ListLLM,
 Action::CreateFilter,
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 40876cd and f2b341d.

📒 Files selected for processing (4)
  • src/alerts/alerts_utils.rs (2 hunks)
  • src/alerts/mod.rs (7 hunks)
  • src/handlers/http/alerts.rs (5 hunks)
  • src/rbac/role.rs (5 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
src/alerts/alerts_utils.rs (1)
src/handlers/http/query.rs (1)
  • query (69-155)
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: coverage
🔇 Additional comments (18)
src/handlers/http/alerts.rs (4)

56-57: Authorization updated to use stream-based approach

The code now constructs a SQL query from alert.stream for authorization checks, aligning with the transition from a query-based to a stream-based alert model. This maintains security while simplifying the authorization flow.


93-94: Consistent authorization check in get handler

This change correctly applies the same stream-based authorization approach in the get handler, ensuring consistent security enforcement across all alert operations.


108-109: Stream-based authorization in delete handler

The delete handler now properly uses the stream-based authorization check, maintaining consistency with the other handlers.


192-193: Stream-based authorization in update_state handler

The update_state handler successfully implements the stream-based authorization check, completing the transition across all alert-related endpoints.

src/alerts/mod.rs (7)

393-393: Field renamed from 'query' to 'stream' in AlertRequest

This change reflects the shift from a query-based to a stream-based model for alerts, improving clarity about what this field represents.


407-408: Updated From implementation to use stream field

The From<AlertRequest> implementation has been correctly updated to map the renamed field from query to stream.


425-426: Field renamed from 'query' to 'stream' in AlertConfig

The stream field in AlertConfig aligns with the changes in AlertRequest, maintaining consistency across the alert data structures.


438-439: Modified alert.modify() to use stream field

The modify method has been updated to use the new stream field from the alert request, ensuring consistency with the structural changes.


483-486: Simplified query construction using stream field

The query construction now uses a standard SELECT * FROM {stream} format based on the stream field. This simplifies query handling and makes the alert system more maintainable.


506-507: Updated error message to use select_query

The error message has been correctly updated to reference the new select_query variable.


755-756: Updated list_alerts_for_user to use stream-based queries

The authorization check in list_alerts_for_user now uses a formatted SQL query based on alert.stream, maintaining consistency with the changes in other parts of the codebase.

src/alerts/alerts_utils.rs (2)

105-106: Updated evaluate_alert to use stream-based query

The evaluate_alert function now uses a formatted SELECT * FROM {stream} query based on alert.stream, aligning with the new stream-based alert model. This simplifies query execution and makes the alert evaluation more consistent.


122-123: Updated prepare_query to use stream-based approach

The prepare_query function now constructs a logical plan using a standard SELECT * FROM {stream} query based on alert.stream, ensuring consistency with the changes in other parts of the alert system.

src/rbac/role.rs (5)

51-54: Added new LLM-related actions to the Action enum

New actions related to LLM functionality (AddLLM, DeleteLLM, GetLLM, ListLLM) have been added to the Action enum, enhancing the role-based access control system to support LLM features.


122-126: Updated RoleBuilder to include new LLM actions

The RoleBuilder's match statement has been updated to handle the new LLM-related actions, ensuring they are properly processed when building permissions.


245-246: Added GetLLM to editor permissions

The GetLLM action has been added to the editor permission builder, allowing users with editor privileges to access LLM information.


288-289: Added GetLLM to writer permissions

The GetLLM action has been added to the writer permission builder, allowing users with writer privileges to access LLM information.


311-312: Added GetLLM to reader permissions

The GetLLM action has been added to the reader permission builder, allowing even users with the most basic read privileges to access LLM information.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f2b341d and 17c3ab1.

📒 Files selected for processing (5)
  • .github/workflows/coverage.yaml (1 hunks)
  • src/alerts/alerts_utils.rs (2 hunks)
  • src/alerts/mod.rs (7 hunks)
  • src/handlers/http/alerts.rs (5 hunks)
  • src/rbac/role.rs (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/alerts/alerts_utils.rs
  • src/handlers/http/alerts.rs
  • src/alerts/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: coverage
🔇 Additional comments (6)
.github/workflows/coverage.yaml (1)

13-14: Global Definition of CMAKE_FLAGS Enhances Consistency
The introduction of the CMAKE_FLAGS environment variable at the job level ensures that all steps under the coverage job can uniformly access this configuration. This change removes redundancy compared to having the variable declared only in individual steps (like the previous Check with clippy step), and aligns with best practices for centralized configuration management.

src/rbac/role.rs (5)

51-54: LGTM: New LLM-related actions added to the Action enum

These additional actions enhance the RBAC system with fine-grained control over LLM-related functionality, which is a welcome addition for security and access control.


122-125: LGTM: Proper integration of LLM actions into the permission system

The new LLM actions are correctly added to the match statement as Unit permissions, ensuring they are handled consistently with other global actions.


288-288: LGTM: Consistent permission model for Writer role

Adding the GetLLM action to the writer role is consistent with the existing permission model where writers have read access to resources.


311-311: LGTM: Appropriate read-only permissions for Reader role

The addition of GetLLM to the reader role is appropriate as it's a read-only operation, maintaining the principle of least privilege.


22-74:

❓ Verification inconclusive

Document the relationship between LLM actions and alerts

The PR is titled "hotfix for alerts" but introduces new LLM-related actions. While the code changes are well-implemented, the connection between these new actions and the alert system isn't immediately evident.

Please document how these LLM actions relate to the alert hotfix mentioned in the PR description. Are they part of a new alert capability that leverages LLMs?


🏁 Script executed:

#!/bin/bash
# Check for connections between alerts and LLM in the codebase
rg -A 5 -B 5 "alerts.*LLM|LLM.*alerts" --type rust

Length of output: 52


LLM-Alert Relationship Documentation Needed

The new LLM actions (e.g., QueryLLM, AddLLM, DeleteLLM, GetLLM, ListLLM) in the Action enum do not show any explicit connection to the alert system in the code. Since the PR is titled "hotfix for alerts," please update the documentation (or add inline comments) to clarify whether these LLM actions are part of a new alert capability leveraging LLM functionality or if they are meant to serve a different purpose.

src/rbac/role.rs Outdated
@@ -234,6 +242,7 @@ pub mod model {
Action::GetAlert,
Action::DeleteAlert,
Action::QueryLLM,
Action::GetLLM,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Consider explaining the LLM permission distribution

You've added GetLLM permission to the editor role, which seems appropriate. However, the other LLM actions (AddLLM, DeleteLLM, ListLLM) are not explicitly assigned to any role builder except possibly admin through Action::All.

Are the other LLM actions intentionally restricted to admin users only? This access control decision should be documented.


🏁 Script executed:

#!/bin/bash
# Check if the admin privileges would include the missing LLM actions
grep -A 10 "admin_perm_builder" src/rbac/role.rs

Length of output: 928


Clarify LLM Permission Distribution Between Editor and Admin Roles

The current implementation explicitly grants the GetLLM action to the editor role, while the admin role—via admin_perm_builder()—receives Action::All, which implicitly covers AddLLM, DeleteLLM, and ListLLM. Please update the code documentation and/or inline comments in src/rbac/role.rs to clearly explain this design decision and confirm that the additional LLM actions are intentionally restricted to admin users only.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
src/rbac/role.rs (1)

209-215: Verify that LLM actions align with RBAC design principles

The admin role gets all privileges through Action::All, which implicitly includes the new LLM actions. However, it would be more explicit and maintainable to list all actions individually in admin_perm_builder() instead of using the Action::All shortcut.

Consider refactoring admin_perm_builder() to explicitly list all actions for better clarity:

 fn admin_perm_builder() -> RoleBuilder {
     RoleBuilder {
-        actions: vec![Action::All],
+        actions: vec![
+            // Explicitly list all actions, including LLM actions
+            Action::Ingest,
+            Action::Query,
+            // ... other actions
+            Action::AddLLM,
+            Action::DeleteLLM,
+            Action::GetLLM,
+            Action::ListLLM,
+            // ... remaining actions
+        ],
         stream: Some("*".to_string()),
         tag: None,
     }
 }

This approach makes the admin permissions more explicit and maintainable as new actions are added.

src/alerts/alerts_utils.rs (3)

105-106: Query structure simplified to use stream directly

This change implements the transition from a query-based to a stream-based model for alerts. Now instead of executing a custom query, we're selecting all data from the specified stream.

This simplifies the alert configuration model but may impact alert precision if previous queries had more specific filtering conditions.

Consider adding a comment explaining this architectural change for future maintainers, especially since this appears to be a significant shift in how alerts work.


137-138: Ensure parameters are correctly documented

The function signature for execute_base_query accepts both the query object and an original_query string parameter, which is now being used to pass the new select query. This parameter naming might be confusing as it now contains the new query rather than the original one.

Consider renaming the parameter from original_query to query_string or select_query to better reflect its current purpose:

- async fn execute_base_query(
-    query: &crate::query::Query,
-    original_query: &str,
- ) -> Result<DataFrame, AlertError> {
+ async fn execute_base_query(
+    query: &crate::query::Query,
+    query_string: &str,
+ ) -> Result<DataFrame, AlertError> {

139-141: Query debugging message references renamed field

The error message still refers to "query" but the field has been renamed to "stream" in the alert configs. This could be confusing when debugging.

Update the error message to reflect the new terminology:

- AlertError::CustomError(format!("Table name not found in query- {}", original_query))
+ AlertError::CustomError(format!("Table name not found in stream query- {}", original_query))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17c3ab1 and 6352f87.

📒 Files selected for processing (5)
  • .github/workflows/coverage.yaml (1 hunks)
  • src/alerts/alerts_utils.rs (2 hunks)
  • src/alerts/mod.rs (7 hunks)
  • src/handlers/http/alerts.rs (5 hunks)
  • src/rbac/role.rs (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/alerts/mod.rs
  • src/handlers/http/alerts.rs
  • .github/workflows/coverage.yaml
🧰 Additional context used
🧬 Code Definitions (1)
src/alerts/alerts_utils.rs (1)
src/handlers/http/query.rs (1)
  • query (69-155)
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: coverage
🔇 Additional comments (4)
src/rbac/role.rs (3)

51-54: New LLM actions added to the RBAC model

You've added four new LLM-related actions to the RBAC model, which is great for expanding the LLM management capabilities. This aligns with the PR objectives of implementing a hotfix for alerts and adding new actions.


122-125: LLM actions properly integrated into the permission model

The new LLM actions are correctly added to the permission building logic as unit-level permissions, consistent with other global actions that aren't stream-specific.


245-245: Clarify LLM Permission Distribution Between Editor and Admin Roles

The current implementation explicitly grants only the GetLLM action to editor, writer, and reader roles, while the admin role—via admin_perm_builder()—receives Action::All, which implicitly covers AddLLM, DeleteLLM, and ListLLM. This appears to be intentional, but lacks documentation explaining this design decision.

Please update the code documentation and/or inline comments to clearly explain why only the GetLLM permission is granted to non-admin roles, while other LLM actions are restricted to admin users.

Also applies to: 288-288, 311-311

src/alerts/alerts_utils.rs (1)

122-123:

❓ Verification inconclusive

Logical plan creation now uses stream instead of query

The logical plan is now created from a simple "SELECT * FROM {stream}" query rather than a potentially more complex custom query. This is consistent with the change in the evaluate_alert function.

However, I'm curious if this simplification might lose some filtering capabilities that were previously available through custom queries.


🏁 Script executed:

#!/bin/bash
# Check for any existing alerts that use complex queries that might break with this change
grep -r "query:" --include="*.rs" --include="*.yaml" --include="*.json" src/alerts/
# Also check if there are any tests that validate query functionality in alerts
find tests -type f -name "*alert*test*" -exec grep -l "query" {} \;

Length of output: 924


Attention: Verify Logical Plan Simplification Impact

The change to create the logical plan with a simple "SELECT * FROM {stream}" query (as seen in src/alerts/alerts_utils.rs, lines 122–123) aligns with the updated behavior in the evaluate_alert function. Our codebase search shows that while query-related functions (like get_tables_from_query and associated imports) still exist, there’s no clear evidence from the repository (and no dedicated tests found in a standard tests/ directory) that rely on more complex custom queries in this context.

To ensure that no required filtering functionality is lost, please verify:

  • That the simplified query construction does not impact any filtering logic expected from alerts processing.
  • If any integration or unit tests (perhaps located outside a standard tests/ directory) should be updated or added to cover this change.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
.github/workflows/coverage.yaml (1)

36-36: Remove Trailing Whitespace on Line 36

There is extra trailing whitespace on line 36 as indicated by YAMLlint. Removing this whitespace will help maintain clean formatting and avoid potential linting issues.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 36-36: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6352f87 and 158d947.

📒 Files selected for processing (5)
  • .github/workflows/coverage.yaml (1 hunks)
  • src/alerts/alerts_utils.rs (2 hunks)
  • src/alerts/mod.rs (7 hunks)
  • src/handlers/http/alerts.rs (5 hunks)
  • src/rbac/role.rs (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/handlers/http/alerts.rs
  • src/alerts/alerts_utils.rs
  • src/alerts/mod.rs
  • src/rbac/role.rs
🧰 Additional context used
🪛 YAMLlint (1.35.1)
.github/workflows/coverage.yaml

[error] 36-36: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (1)
.github/workflows/coverage.yaml (1)

37-47: New "Patch librdkafka CMakeLists.txt" Step: Functional and Clear

The newly added step correctly locates the librdkafka directory within the Cargo registry and updates the minimum required CMake version from 3.2 to 3.5. The conditional handles the error case by exiting with a non-zero status if the directory isn’t found.

Please verify that this change aligns with all downstream build requirements and that the new CMake version (3.5) is compatible with the systems relying on this library. Optionally, consider adding more logging or handling scenarios where multiple directory matches might exist.

@parmesant parmesant force-pushed the hotfix-alerts branch 2 times, most recently from 2184320 to a68bd8d Compare April 3, 2025 08:47
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
.github/workflows/coverage.yaml (2)

37-64: Review of the New "Find and fix librdkafka CMakeLists.txt" Step

This new step script correctly:

  • Downloads the Rust package via cargo fetch.
  • Locates the rdkafka-sys package directory.
  • Finds the CMakeLists.txt file for librdkafka.
  • Creates a backup and updates the minimum required CMake version from 3.2 to 3.5.

Suggestions:

  • Consider adding additional error handling for the directory lookup. For example, if find ~/.cargo/registry/src -name "rdkafka-sys-*" -type d | head -n 1 returns an empty string, subsequent commands referencing $RDKAFKA_SYS_DIR may fail.
  • It might be beneficial to check the exit status of cargo fetch to ensure that the package has been downloaded successfully before proceeding.
  • While the current diff with || true prevents the step from failing, consider whether you want to alert on unexpected file differences in a more explicit manner.
🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 41-41: trailing spaces

(trailing-spaces)


[error] 45-45: trailing spaces

(trailing-spaces)


[error] 48-48: trailing spaces

(trailing-spaces)


[error] 51-51: trailing spaces

(trailing-spaces)


[error] 54-54: trailing spaces

(trailing-spaces)


[error] 57-57: trailing spaces

(trailing-spaces)


41-41: Remove Trailing Whitespace

Static analysis has flagged trailing whitespace on lines 41, 45, 48, 51, 54, and 57. Removing these will improve the file’s overall cleanliness and maintainability.

Also applies to: 45-45, 48-48, 51-51, 54-54, 57-57

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 41-41: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2184320 and a68bd8d.

📒 Files selected for processing (5)
  • .github/workflows/coverage.yaml (1 hunks)
  • src/alerts/alerts_utils.rs (2 hunks)
  • src/alerts/mod.rs (7 hunks)
  • src/handlers/http/alerts.rs (5 hunks)
  • src/rbac/role.rs (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/alerts/alerts_utils.rs
  • src/handlers/http/alerts.rs
  • src/alerts/mod.rs
  • src/rbac/role.rs
🧰 Additional context used
🪛 YAMLlint (1.35.1)
.github/workflows/coverage.yaml

[error] 41-41: trailing spaces

(trailing-spaces)


[error] 45-45: trailing spaces

(trailing-spaces)


[error] 48-48: trailing spaces

(trailing-spaces)


[error] 51-51: trailing spaces

(trailing-spaces)


[error] 54-54: trailing spaces

(trailing-spaces)


[error] 57-57: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: coverage
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
src/alerts/alerts_utils.rs (1)

630-632: ⚠️ Potential issue

Critical bug: incorrect slice index for second aggregate

At line 641, let agg2 = &self.aggregates.aggregate_config[0]; reuses index 0 instead of 1, causing the same config to be validated twice. This likely prevents the second aggregate check from running correctly.

Apply this patch to fix the slice index:

-                let agg2 = &self.aggregates.aggregate_config[0];
+                let agg2 = &self.aggregates.aggregate_config[1];

Also applies to: 640-644

🧹 Nitpick comments (1)
src/alerts/alerts_utils.rs (1)

660-667: Aggregate & condition columns extraction

Using a HashSet to collect required columns ensures unique references. The logic in get_agg_config_cols and get_condition_cols looks correct aside from the earlier index issue. Keep in mind that any mismatch in index references could cause missing columns.

Consider adding unit tests for multiple conditions with AND/OR to verify no columns are missed or duplicated.

Also applies to: 677-681, 693-695, 700-701

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a68bd8d and 856be6a.

📒 Files selected for processing (2)
  • src/alerts/alerts_utils.rs (12 hunks)
  • src/alerts/mod.rs (13 hunks)
🧰 Additional context used
🪛 GitHub Actions: Lint, Test and Coverage Report
src/alerts/mod.rs

[error] 272-272: method from_str can be confused for the standard trait method std::str::FromStr::from_str. Consider implementing the trait std::str::FromStr or choosing a less ambiguous method name.

⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
🔇 Additional comments (13)
src/alerts/alerts_utils.rs (6)

21-31: Imports Look Good

These added imports for LogicalOperator, Aggregate* structs, and expansions in datafusion appear correct for supporting stream-based alert queries and aggregates.

Also applies to: 43-43


101-105: Proper use of alert.stream in evaluate_alert

Switching from alert.query to alert.stream here looks consistent with the new stream-based approach. The logic—preparing the query, building a SELECT * FROM ..., and then fetching a DataFrame—seems correct.


146-147: Confirm slicing of aggregate configurations

In evaluate_aggregates, slicing &agg_config.aggregate_config[0..2] for complex logic (when operator is Some(..)) vs. [0..1] for a single condition is correct. Ensure that caller code always provides at least the required number of aggregates.

Would you like to verify usage using a quick script to confirm that no calls pass fewer aggregates than required?

Also applies to: 153-154


206-209: Extended operators in evaluate_condition

The additions for Equal, NotEqual, GreaterThanOrEqual, and LessThanOrEqual align with the newly introduced alert operators. This implementation looks correct.


216-217: Logical operators in calculate_final_result

Using Iterator::all() for AND logic and Iterator::any() for OR logic is concise and readable.


708-709: Rolling window frequency retrieval

Extracting eval_frequency from rolling_window is consistent with the new EvalConfig.

src/alerts/mod.rs (7)

186-197: Renamed and expanded AlertOperator

Introducing Equal, NotEqual, GreaterThanOrEqual, LessThanOrEqual, IsNull, and IsNotNull is a clear improvement in expressiveness. The Display implementation is also consistent with these new variants.

Also applies to: 205-210


301-303: Aggregates and Aggregation Functions

Adjusting AggregateFunction for display logging is fine. The enumerations in lines 311-319 match up with the text-based representation.

Also applies to: 311-319


339-341: Condition configuration changes

Switching from AlertOperator to WhereConfigOperator is consistent with the new design. The indexing usage for conditions [0] and [1] is correct for AND/OR logic.

Also applies to: 346-348, 354-357


377-380: Struct expansions for AggregateConfig, Aggregates, GroupBy, and LogicalOperator

These modifications appear structurally sound. They align with treating aggregates and their condition sets in a more flexible, typed manner.

Also applies to: 385-387, 395-398, 401-405


474-478: Adoption of stream and aggregates in AlertRequest and AlertConfig

Renaming querystream and aggregate_configaggregates is consistent with the stream-based approach. This also aligns with changes in alerts_utils.rs.

Also applies to: 488-492, 506-510


519-523: Validation flow updates

Using a rolling-window approach and verifying the presence of columns in the new SELECT * FROM {self.stream} logic is sound. The updated validations ensure well-formed alerts.

Also applies to: 530-530, 561-563


833-834: Authorization check renamed to stream-based usage

Switching from alert.query to building SELECT * FROM {alert.stream} and checking permission with user_auth_for_query is logical.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/alerts/mod.rs (1)

249-292: from_str approach might collide with std::str::FromStr.

A prior reviewer suggested directly implementing the FromStr trait or renaming this method to avoid confusion. If this feedback still applies, consider addressing it now.

🧹 Nitpick comments (2)
.github/workflows/coverage.yaml (2)

65-92: Duplicate step with identical name.

You have introduced a second "Find and fix librdkafka CMakeLists.txt" step (lines 65–91) that duplicates the existing step at lines 37–63. Consider removing this duplicate or giving it a different name to avoid confusion and unnecessary build steps.

-      - name: Find and fix librdkafka CMakeLists.txt
-        run: |
-          ...
+      # Remove or rename this duplicate block to avoid confusion
🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 69-69: trailing spaces

(trailing-spaces)


[error] 73-73: trailing spaces

(trailing-spaces)


[error] 76-76: trailing spaces

(trailing-spaces)


[error] 79-79: trailing spaces

(trailing-spaces)


[error] 82-82: trailing spaces

(trailing-spaces)


[error] 85-85: trailing spaces

(trailing-spaces)


69-69: Remove trailing spaces.

The static analyzer flagged trailing spaces on these lines. Trim them to comply with style guidelines.

-          cargo fetch  
+          cargo fetch

Also applies to: 73-73, 76-76, 79-79, 82-82, 85-85

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 69-69: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 856be6a and 74fa378.

📒 Files selected for processing (4)
  • .github/workflows/coverage.yaml (1 hunks)
  • src/alerts/alerts_utils.rs (12 hunks)
  • src/alerts/mod.rs (13 hunks)
  • src/handlers/http/alerts.rs (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/handlers/http/alerts.rs
🧰 Additional context used
🪛 YAMLlint (1.35.1)
.github/workflows/coverage.yaml

[error] 69-69: trailing spaces

(trailing-spaces)


[error] 73-73: trailing spaces

(trailing-spaces)


[error] 76-76: trailing spaces

(trailing-spaces)


[error] 79-79: trailing spaces

(trailing-spaces)


[error] 82-82: trailing spaces

(trailing-spaces)


[error] 85-85: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: coverage
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (20)
src/alerts/alerts_utils.rs (11)

21-26: Import changes look fine.

You are now using DataFusion’s Operator and BinaryExpr for condition building. This appears consistent with the new operators introduced below.


31-31: Newly imported LogicalOperator.

Including LogicalOperator here aligns with the multi-condition logic (AND/OR) used in subsequent functions.


43-43: Extended imports of alert-related types.

Bringing in all relevant alert structs and enums (like WhereConfigOperator, AggregateConfig, etc.) is straightforward and maintains consistency with the refactor.


112-118: Hardcoded end_time to "now".

By forcing end_time to "now`, you assume the window extends up to the current time. If future changes require a flexible end time, this may need revisiting.

Would you like to confirm if all rolling-window alerts rely on “now” for their end time?


145-145: Function signature update.

This function now accepts &Aggregates instead of a previous config type. No immediate issues.


153-154: Slicing two aggregates vs one.

You rely on slicing [0..2] if operator is Some, and [0..1] if None. The earlier validation checks for length, so this should be safe. Just ensure you maintain code alignment if you ever add new logic for more aggregates.


206-209: Extended operators in evaluate_condition.

Support for Equal, NotEqual, GreaterThanOrEqual, and LessThanOrEqual is correct. The _ => unreachable!() will panic if new variants get introduced without updating this match.

Please confirm future coverage by ensuring new operators are tested or lead to compile-time checks.


214-217: Final result calculation with LogicalOperator.

Using .all(...) for AND and .any(...) for OR is correct and concise.


250-260: Adjusted template placeholders in alert message.

Displaying the aggregate_function, column, and operator with the actual value helps debugging. Looks good.


305-346: Generating expressions for multiple aggregates and filters.

The code carefully handles two aggregate conditions (for AND/OR) or a single one otherwise. The design is straightforward to follow.


102-105:

❓ Verification inconclusive

Replacing old query references with alert.stream.

Constructing select_query from alert.stream is a clear approach. Verify all other uses of the old alert.query have been updated to prevent mismatched references.


🏁 Script executed:

#!/bin/bash
# Searching for old references to `alert.query`
rg "alert\.query"

Length of output: 19


Action Required: Verify All References to alert.query Are Updated

The updated code snippet in src/alerts/alerts_utils.rs (lines 102–105) correctly constructs select_query using alert.stream. An automated search for the old alert.query reference returned no hits, suggesting the replacement was completed. However, because the search output was unexpectedly empty, please perform a manual verification to ensure no lingering alert.query references exist elsewhere in the codebase.

  • Confirm that no files reference alert.query (in addition to using automated tools).
  • Ensure that any removed or replaced references in other parts of the code are consistent with this update.
src/alerts/mod.rs (9)

186-197: Renamed and newly introduced operators in AlertOperator.

The renamed variants (e.g., Equal, NotEqual, GreaterThanOrEqual, etc.) are more aligned with typical comparisons. Adding IsNull and IsNotNull provides a broader feature set.


214-247: New WhereConfigOperator enum.

Great expansion to handle more specialized string matching (e.g., BeginsWith, EndsWith, etc.). Make sure your front-end or any integrators know about these changes to keep consistent operator usage.


468-478: Changed alert fields to stream, aggregates, and eval_config.

Switching from query to stream and from old aggregates to the new Aggregates struct is consistent with the overall refactor.


501-514: Updates in AlertConfig.

Moving to stream, aggregates, and eval_config ensures uniform usage of new naming conventions throughout.


518-523: modify function replaced old fields with new ones.

All references to alert.query or older config fields are replaced. This keeps the config object consistent after modifications.


611-611: Ensuring correct number of conditions for AND/OR or single condition.

The error messages and logic for 2 or 1 condition are clear. This prevents undefined slicing if the user misconfigures the alert.

Also applies to: 619-619


630-644: Same logic repeated for aggregates array slices.

The same approach is used here: 2 aggregates if operator is present, 1 otherwise. The slice checks remain consistent with the earlier conditions approach.


663-684: Column extraction in get_agg_config_cols.

Good usage of sets to gather columns from both the aggregate config and conditional clauses. This helps ensure you have all columns needed for the final DataFrame.


708-710: Retrieving eval_frequency from rolling window.

The function remains straightforward, just returning the underlying frequency. No issues found.

- Added helper methods to fetch base query, eval details
- Ensured DataFusion accepts columns and table names as is
- Removed clockwerk scheduler
- Modifed alert config JSON
- Modified triggered alert message
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/alerts/mod.rs (1)

272-276: Potential naming conflict in WhereConfigOperator

The from_str method might conflict with the standard FromStr trait. This was previously flagged in a review.

WhereConfigOperator and potential naming conflict

The WhereConfigOperator enum with string-based comparisons, plus from_str logic, is coherent. However, the pipeline flagged from_str (line 272) for colliding with std::str::FromStr::from_str.

Consider implementing the FromStr trait directly or renaming the helper to something like parse_where_op to avoid confusion and follow Rust conventions:

- pub fn from_str(s: &str) -> Option<Self> {
+ impl std::str::FromStr for WhereConfigOperator {
+     type Err = ();
+     fn from_str(s: &str) -> Result<Self, Self::Err> {
+         // ...
+         Ok(op)
+     }
+ }
🧹 Nitpick comments (3)
src/handlers/http/modal/query_server.rs (1)

131-141: Alert loading moved to server initialization phase

This change moves alerts loading from the load_on_init function to the server initialization phase, spawning it as a background task. This aligns with the architectural changes in the alerts system.

However, there's a commented-out error handling line. Consider whether alert loading failures should prevent server startup or if the current approach of just logging errors is sufficient.

 // Run the alerts scheduler
 tokio::spawn(async {
     match ALERTS.load().await {
         Ok(_) => info!("Alerts loaded successfully"),
         Err(e) => {
             error!("Failed to load alerts: {}", e);
-            // return Err(e);
+            // Alert loading failures are logged but don't prevent server startup
         }
     };
 });
src/sync.rs (1)

239-245: Duplicate alert detection is sound, but consider revisiting the policy.
When an alert with the same ID already exists, the code logs an error and continues. This is acceptable if you strictly want to prevent re-creation. However, if you ever need to replace an existing alert, you might want to handle that scenario differently.

src/alerts/alerts_utils.rs (1)

190-194: Potential unnecessary unwrapping

The call to or(None) on an Option<String> is redundant since Option::or would return the first option if it's Some. This doesn't affect functionality but could be simplified.

-    agg.conditions
-        .as_ref()
-        .map(|config| config.generate_filter_message())
-        .or(None)
+    agg.conditions
+        .as_ref()
+        .map(|config| config.generate_filter_message())
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 74fa378 and ece7953.

📒 Files selected for processing (10)
  • src/alerts/alerts_utils.rs (13 hunks)
  • src/alerts/mod.rs (23 hunks)
  • src/alerts/target.rs (6 hunks)
  • src/handlers/airplane.rs (1 hunks)
  • src/handlers/http/alerts.rs (6 hunks)
  • src/handlers/http/modal/mod.rs (1 hunks)
  • src/handlers/http/modal/query_server.rs (3 hunks)
  • src/handlers/http/modal/server.rs (1 hunks)
  • src/rbac/role.rs (1 hunks)
  • src/sync.rs (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/rbac/role.rs
🧰 Additional context used
🧬 Code Definitions (4)
src/handlers/http/alerts.rs (2)
src/alerts/alerts_utils.rs (1)
  • user_auth_for_query (59-89)
src/alerts/mod.rs (2)
  • update_state (444-480)
  • update_state (878-912)
src/alerts/alerts_utils.rs (1)
src/alerts/mod.rs (2)
  • get_eval_window (735-742)
  • get_eval_frequency (730-734)
src/handlers/http/modal/server.rs (2)
src/handlers/http/alerts.rs (1)
  • update_state (124-162)
src/alerts/mod.rs (2)
  • update_state (444-480)
  • update_state (878-912)
src/alerts/mod.rs (2)
src/sync.rs (1)
  • alert_runtime (233-278)
src/utils/mod.rs (1)
  • user_auth_for_query (85-117)
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: coverage
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
🔇 Additional comments (44)
src/handlers/airplane.rs (1)

172-172: Improved SQL query safety with proper table name quoting

Adding double quotes around the table name in the SQL query properly handles cases where stream names might contain special characters or reserved SQL keywords. This is a good security practice to prevent potential SQL injection issues.

src/handlers/http/modal/query_server.rs (2)

21-21: LGTM: New import for ALERTS

This import is needed for the new alerts loading functionality.


37-37: LGTM: Updated tracing import to include both error and info

Correctly expanded the tracing import to include the info macro for better logging coverage.

src/handlers/http/modal/mod.rs (1)

168-168: Removed alerts loading from load_on_init function

The update from future::join4 to future::join3 reflects the removal of alerts loading from this function, which is now handled in the QueryServer initialization. This change is aligned with the architectural refactoring of the alert system.

src/handlers/http/modal/server.rs (1)

231-235: Improved alert state update endpoint

The route handler for PUT requests to the /alerts/{alert_id} endpoint has been updated to use alerts::update_state instead of alerts::modify. Based on the provided code snippets, this change implements a more RESTful approach where state transitions are handled via query parameters (e.g., ?state=silenced), and validation is performed to ensure only valid state transitions are allowed.

This is a better design for state management in a REST API.

src/sync.rs (5)

20-20: No functional concerns.
This import addition is straightforward and has no apparent issues.


23-23: No functional concerns.
This import is appropriate and has no apparent issues.


29-29: No functional concerns.
The import from alerts_utils and AlertTask is aligned with the new alert runtime logic.


231-240: Alert runtime initialization looks good.
The top-level function signature and documentation are clear, indicating an asynchronous runtime. This neatly replaces the old scheduling mechanism.


265-275: Alert deletion logic is clear.
Removing from the HashMap and aborting the underlying task correctly stops its evaluation loop.

src/handlers/http/alerts.rs (10)

19-20: Import changes.
These import adjustments are minimal and do not introduce concerns.


22-24: Removal of commented-out code is fine.
No issues with removing or commenting old references if they are no longer needed.


59-60: Query-based user authorization looks good.
Using alert.get_base_query() centralizes the logic for constructing the query to validate user access.


72-73: Starting the alert task.
Invoking ALERTS.start_task(alert.clone()) appears consistent with the newly introduced alert runtime.


85-86: Authorization for GET endpoint looks good.
Again, ensuring that users can only retrieve alerts they have permission to query is appropriate.


100-101: Authorization for DELETE endpoint looks good.
This is the same pattern of building a base query for permissions.


124-162: High-level note on the new update_state handler.
Overall, this function is structured clearly, first retrieving the alert and checking authorization before updating the state.


145-148: Potential out-of-bounds error when parsing query string.
Splitting on '=' and immediately accessing tokens[1] may panic if the query string does not include '=' or is malformed. Consider adding a length check before accessing indices.

 let tokens = query_string.split('=').collect::<Vec<&str>>();
-if tokens.len() < 2 {
-    return Err(AlertError::InvalidStateChange("Missing parameter".to_string()));
-}
 let state_key = tokens[0];
 let state_value = tokens[1];

157-159: State update logic is appropriate.
Converting the string to an AlertState and calling update_state aligns with the rest of the alert system design.


160-161: Returning the updated alert is consistent.
Good final step to confirm the new alert state.

src/alerts/target.rs (6)

38-40: Serialization rename to camelCase.
Changing from another case to camelCase may break compatibility for existing clients. Verify that downstream systems or front-end usage is updated accordingly.


52-53: Target struct rename attribute updated.
This also changes the serialized field names to camelCase. Ensure your APIs or front-end code confirm to this new JSON format.


189-190: Verifier struct rename attribute updated.
Aligned with camelCase. The same caution about backward compatibility applies here.


232-242: Enum variants with custom renames.
Renaming the variants to slack and alertManager is consistent with camelCase. Double-check that clients expect these new variant names.


290-290: OtherWebHook struct rename.
Again, adopting camelCase might break existing consumers.


327-328: AlertManager struct rename.
Be sure to synchronize this rename with any other parts of your system that consume alert manager configuration in JSON form.

src/alerts/alerts_utils.rs (10)

28-28: Import usage efficiency in datafusion::logical_expr

The explicit imports of BinaryExpr, Literal, Operator from datafusion::logical_expr indicate new regex functionality, but they're only used in a small part of the file. This is good practice for clarity.


106-109: Improved query generation approach

The alert evaluation now correctly builds queries from the alert.stream instead of using a hardcoded query. This makes authorization checks more consistent across the application.


210-214: Complete operator enum coverage

The operator enum now correctly handles all comparison operations including equals, not equals, and range operators. This improves the flexibility of the alert system.


217-222: Correct logical operator handling

The function now properly evaluates alert conditions based on logical operators (AND/OR). This is a critical improvement for complex alert scenarios.


232-236: Enhanced alert messaging

The alert message now includes important context about evaluation window and frequency which helps with troubleshooting and understanding alert behavior.


393-397: Value type handling improvement

The code now intelligently handles both numeric and string values by attempting to parse as a number first. This makes the alert system more flexible with different data types.


401-431: Comprehensive operator support

The match_alert_operator function now supports a wide range of operators including string operations like "contains" and "begins with". This significantly enhances the filtering capabilities.


435-445: Clean aggregate function mapping

The new match_aggregate_operation function provides a clean way to map between aggregate function types and their implementations, improving code readability.


447-467: Smart value handling with NumberOrString

The introduction of NumberOrString enum elegantly solves the problem of handling both numeric and string literals in query conditions. This is a clean design pattern.


116-118:

❓ Verification inconclusive

Simplified time range handling

Hardcoding the end time to "now" instead of calculating it from the rolling window is a cleaner approach, but ensure this doesn't affect time-sensitive alerts.

Verify that using "now" as the end time is consistent with all alert use cases:


🏁 Script executed:

#!/bin/bash
# Check for any code references that might be expecting a custom end time
rg -A 3 "eval_end" --type rust

Length of output: 354


Action Required: Confirm Consistent End Time Usage Across Alerts

The updated approach in src/alerts/alerts_utils.rs (lines 116–118) now hardcodes the end time to "now" instead of using rolling_window.eval_end. However, our search shows that elsewhere in the code—specifically in src/alerts/mod.rs—there are still references to a custom eval_end (e.g., in the alert struct and when both eval_start and eval_end are used). Please verify that replacing the custom end time with "now" does not adversely affect any time-sensitive alert evaluations. If any components require the original behavior, consider updating them accordingly.

src/alerts/mod.rs (8)

54-64: Task management refactored with message passing

The ALERTS initialization now uses a message-passing system with mpsc channels, which is more scalable than the previous approach. This is a good architectural improvement.


73-75: Clear task representation with AlertTask enum

The AlertTask enum provides a clean way to represent different alert operations (Create/Delete), making the code more maintainable and self-documenting.


223-256: Comprehensive operator set with WhereConfigOperator

The new WhereConfigOperator enum provides a rich set of comparison operators, especially for string values, significantly enhancing the filtering capabilities of alerts.


443-481: Robust state transition implementation

The new AlertState implementation adds proper state transition validation to prevent invalid state changes. This is a good defensive programming practice that will prevent inconsistent alert states.


563-565: Simplified query generation

The get_base_query function provides a consistent way to generate base queries from stream names, which helps maintain consistency across the application.


600-617: Proper schema validation

The validation now correctly checks that referenced columns exist in the stream schema, preventing runtime errors from invalid column references.


735-742: Enhanced logging with evaluation window details

The get_eval_window method provides useful information for alert messages, improving debuggability and user experience when analyzing alert behavior.


936-943: Task management through channels

The start_task method now uses message passing to manage alert lifecycle, which is a more scalable approach than direct task management. This aligns well with the alert_runtime implementation seen in the context.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/sync.rs (1)

249-270: Spawned task logic and retry behavior

  1. Once the task fails more than 3 times, it breaks from the loop, but the corresponding entry remains in the alert_tasks map. You might consider removing it from the map before exiting, avoiding references to tasks that have already terminated.
  2. If the evaluate_alert recovers on a subsequent iteration, you may want to reset sleep_duration back to the original frequency, unless the permanent fallback to 1 minute is intentional.
+ // Example approach to remove the completed entry from the map:
+ // (In the main runtime loop, after the task ends, do something like:)
+ if handle.await.is_ok() {
+     alert_tasks.remove(&id);
+ }
src/alerts/mod.rs (1)

54-65: Global static ALERTS initialization

Leveraging Lazy<Alerts> to bootstrap the mpsc channel and thread spawn is a succinct pattern. Spawning alert_runtime in a separate thread is fine, but be mindful that thread-level panics cannot be automatically restarted by Tokio. Check logs or consider more robust crash handling if needed.

-thread::spawn(|| alert_runtime(rx));
+thread::spawn(move || {
+    if let Err(e) = alert_runtime(rx) {
+        error!("alert_runtime crashed: {e}");
+    }
+});
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ece7953 and 31f650b.

📒 Files selected for processing (2)
  • src/alerts/mod.rs (23 hunks)
  • src/sync.rs (2 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
src/alerts/mod.rs (2)
src/sync.rs (1)
  • alert_runtime (233-285)
src/utils/mod.rs (1)
  • user_auth_for_query (85-117)
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: coverage
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Default aarch64-apple-darwin
🔇 Additional comments (42)
src/sync.rs (5)

20-20: New imports for alert handling

These added imports (HashMap, mpsc, oneshot, alerts_utils, and AlertTask) seem appropriate for the new alert-related functionality and message passing.

Also applies to: 23-23, 29-29


231-234: Introducing a dedicated alert task runtime

Defining a separate runtime for alert tasks is a clear design choice, isolating alert-related scheduling from other behaviors. This helps keep the main handler uncluttered.


236-244: Handling new tasks in the select! loop

Creating and tracking per-alert tasks in a HashMap is a valid approach for managing concurrency. The check against duplicates prevents conflicting tasks for the same alert ID.


272-280: Deleting tasks and aborting the handle

Aborting the spawned task and removing it from the HashMap assures no further evaluation attempts occur. Logging the result clarifies which alerts were successfully removed.


283-284: Function exit

Returning Ok(()) at the end provides a clean exit. No further changes needed.

src/alerts/mod.rs (37)

23-24: New imports for parsing logic

The added derive_more imports (FromStr and FromStrError) facilitate simpler conversion from string representations for enums like AlertState or WhereConfigOperator.


31-32: Thread usage and oneshot channels

Introducing std::thread and oneshot channels suggests a separation of concerns between the main async runtime and any blocking or alternative concurrency approach. Ensure that tasks that might block do not stall the primary async executor.


45-45: Importing the new alert_runtime

Pulling in alert_runtime from src/sync.rs creates a clear boundary: Alerts can spawn tasks by forwarding them to this dedicated runtime.


66-70: Alerts struct and new sender field

Adding the sender: mpsc::Sender<AlertTask> field centralizes how alerts schedule creation/deletion tasks. This design looks consistent and maintainable.


72-75: AlertTask enum definition

This enum neatly encapsulates create and delete alert commands. Scaling to more task types in the future will be trivial (e.g., AlertTask::Update).


93-93: Introducing AggregateResult with a Debug derive

Declaring an AggregateResult type clarifies how results are passed around or logged. The Debug derive is helpful for diagnosing evaluation outcomes.


124-124: Formatting the alert notification message

The updated format string now includes a clearer layout, showing an alert name, timestamp, severity, and a placeholder for additional messages.


201-207: Renamed or modified operator variants in AlertOperator

Renaming variants to Equal, NotEqual, GreaterThanOrEqual, LessThanOrEqual (with matching serde tags) improves clarity. Ensuring these changes are reflected in calling code avoids serialization/deserialization mismatches.


215-218: Display implementation for new variants

Writing the correct symbols (=, !=, >=, <=) is crucial for generating queries. These lines correctly align with the updated operator variant names.


223-280: Introducing WhereConfigOperator and related logic

  1. Providing a FromStr implementation and a serde(rename_all = "camelCase") ensures consistent parsing from user input or JSON.
  2. The broad set of operators (IsNull, IsNotNull, ILike, etc.) accommodates common filtering use cases.
  3. Exposing as_str() helps with query building.
    Overall, this is a flexible, readable solution for where-operator handling.

281-286: Display implementation for WhereConfigOperator

Delegating to as_str() keeps the display logic DRY. This approach is easy to maintain if future operators are added.


290-308: AggregateFunction and Display for aggregates

Renaming or clarifying the aggregator methods (Avg, Count, Min, Max, Sum) aligns well with typical user expectations. The Display implementation ensures they appear consistently in logs or queries.


326-326: Switching operator to WhereConfigOperator in ConditionConfig

Transitioning from AlertOperator to WhereConfigOperator for finer-grained filtering is a solid improvement, ensuring the correct operator type is used for condition logic.


333-334: Renaming fields in Conditions

Replacing the old conditions field with condition_config, plus the operator now being Option<LogicalOperator>, conveys more explicit meaning.


341-345: Handling compound conditions with AND/OR

This logic constructs a descriptive message for two conditions. Verify indexing carefully; the code always references [0] and [1] but checks the length earlier, so it’s safe.


356-356: Single-condition fallback

When there is no logical operator, only one condition is permitted. This fallback is consistent with the earlier validation logic, preventing out-of-bounds indexing.


363-368: GroupBy struct introduction

Including a group_by field in aggregates expands the alerting options to handle grouped aggregations. This is a beneficial step for more complex queries.


372-374: Broadening AggregateConfig

  1. Renaming agg to aggregate_function clarifies usage.
  2. Adding optional conditions and group_by fields extends aggregate queries to more flexible scenarios.

382-384: Aggregates struct

The new operator (AND/OR) plus an aggregate_config array of individual aggregate instructions is a more modular representation than the previous approach.


394-401: LogicalOperator Display

Surfacing “AND” or “OR” as uppercase aligns with typical SQL usage or textual representations of boolean logic.


424-424: Adding FromStr to AlertState

Allowing AlertState to be parsed from strings is useful for user-facing or config-based state changes. Just confirm that unrecognized states are handled gracefully (a potential concern for the derived FromStr).

Would you like to run a quick test verifying that invalid strings produce an error or default state? I can generate a small snippet or script if needed.


443-481: Refined update_state method

The state machine logic is stricter, preventing nonsensical transitions such as Triggered → Triggered or Silenced → Triggered. The final fallback for Resolved → any-other-state ensures correct usage. Logging or clearly surfacing the reason for a blocked state change can help maintain user clarity.


510-513: Replacing query with stream, plus new aggregates and eval_config

Using stream clarifies that the alert references a specific ingestion stream rather than an arbitrary query. This rework (aggregates, eval_config) matches the improved condition definitions.


524-527: Mapping stream, aggregates, and eval_config from AlertRequest to AlertConfig

Consistently forwarding these fields affirms the revised structure. No issues identified.


542-545: Consolidated fields in AlertConfig

Matching the naming scheme from AlertRequest ensures a smooth extension or future changes.


555-558: modify method updates relevant fields

Resetting the alert to Resolved on modification aligns with the typical approach of revalidating any newly changed alert. The reassignments look correct.


563-564: get_base_query utility

Replacing the old query usage with a format that references self.stream is straightforward. This method can be extended if more complex base queries become necessary.


570-596: Validating eval frequency and targets

  1. Confirming that RollingWindow.eval_start is humantime-compatible is a helpful user safeguard.
  2. Checking that the sum of target retry intervals does not exceed eval_frequency prevents stuck or repeated notifications.

600-600: Retrieving columns for aggregator checking

Capturing columns from aggregator configs first ensures we can compare them to the schema for correctness.


602-608: Fetching schema from PARSEABLE.get_stream

This gracefully handles a missing or invalid stream via ? error propagation. The subsequent .fields() iteration is well-structured.


610-615: Ensuring the column exists

Raising a CustomError if the column isn’t in the schema is a direct pointer for users. The format is expressive enough to avoid confusion.


633-633: Error message for multi-conditions with AND/OR

The user workflow is more predictable with this explicit restriction, preventing more than two conditions in an AND/OR scenario.


639-641: Validation for single-condition usage

Similarly restricting one condition for non-AND/OR cases ensures consistent indexing in Conditions.


650-679: Aggregates operator validation (AND/OR vs single)

  1. If operator is Some(LogicalOperator), exactly two aggregator configs must be used and each internally validated.
  2. Otherwise, exactly one aggregator config must be used.
    This helps avoid out-of-bounds references or ambiguous aggregator setups.

685-707: Gathering columns from aggregator-based conditions

Similar logic in get_agg_config_cols, ensuring columns from each aggregator’s condition are recognized. This comprehensive approach prevents unexpected schema mismatch.


730-743: get_eval_frequency and get_eval_window

Splitting out these fields from EvalConfig is clean. Both methods are small wrappers for clarity, which reads well.


836-837: (No explicit annotation was provided for lines 836-837, but referencing them from the snippet context)

In-memory and on-disk alert consistency

Line 836 shows reloading each alert from disk and scheduling it via AlertTask::Create. This ensures the in-memory map matches persisted alerts consistently.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/sync.rs (1)

246-268: Consider maintaining the original evaluation schedule after retries.

When an error occurs, you're setting sleep_duration = 1 to retry after 1 minute. However, after a successful retry, you might want to restore the original evaluation schedule instead of continuing with the potentially shorter interval.

match alerts_utils::evaluate_alert(&alert).await {
    Ok(_) => {
        retry_counter = 0;
+       sleep_duration = alert.get_eval_frequency(); // Restore original frequency after success
    }
    Err(err) => {
        warn!("Error while evaluation- {}\nRetrying after sleeping for 1 minute", err);
        sleep_duration = 1;
        retry_counter += 1;

        if retry_counter > 3 {
            error!("Alert with id {} failed to evaluate after 3 retries with err- {}", id, err);
            break;
        }
    }
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 31f650b and 87e262b.

📒 Files selected for processing (2)
  • src/alerts/mod.rs (23 hunks)
  • src/sync.rs (2 hunks)
🧰 Additional context used
🧬 Code Definitions (2)
src/sync.rs (2)
src/alerts/mod.rs (4)
  • mpsc (55-55)
  • new (114-120)
  • new (155-167)
  • new (178-184)
src/alerts/alerts_utils.rs (1)
  • evaluate_alert (102-113)
src/alerts/mod.rs (2)
src/sync.rs (1)
  • alert_runtime (233-287)
src/alerts/alerts_utils.rs (1)
  • user_auth_for_query (59-89)
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: coverage
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (8)
src/sync.rs (3)

231-270: Improved alert error handling with retry mechanism.

The new alert runtime effectively implements the suggested retry logic from previous review comments, replacing the previously criticized "break on first error" approach. The current implementation will make up to 3 retry attempts before giving up, with a reduced evaluation frequency (1 minute) during retries.


271-273: Good practice: tracking alert tasks by ID.

The HashMap provides an efficient way to track and manage alert tasks by their ID, which is a sound approach for proper task lifecycle management.


274-283: Alert deletion properly handles task cancellation.

The implementation for deleting alerts correctly aborts the task and removes it from the tracking map, with appropriate logging for both success and failure cases.

src/alerts/mod.rs (5)

223-286: New WhereConfigOperator provides comprehensive comparison capabilities.

The introduction of WhereConfigOperator significantly extends the comparison capabilities with a wide range of operators including NULL checks, pattern matching, and string operations. Implementation follows good practices with proper serialization/deserialization and display formatting.


363-375: Enhanced aggregate configuration with group by functionality.

The introduction of the GroupBy structure and updating AggregateConfig to include support for grouping and logical conditions improves the expressiveness of alert definitions, allowing for more sophisticated alert criteria.


443-481: Alert state transition validation prevents invalid state changes.

The implementation of update_state provides clear rules about allowed state transitions with appropriate error messages when invalid transitions are attempted. This prevents users from making illogical state changes like manually triggering a resolved alert.


936-943: Alert task management via channels improves separation of concerns.

The use of message passing through channels for alert task management is a significant improvement, providing better separation of concerns between the alert management logic and task execution.


945-953: Alert task deletion properly propagates errors.

The implementation for deleting alert tasks correctly propagates errors from the channel send operation with meaningful error messages.

Comment on lines 54 to 64
pub static ALERTS: Lazy<Alerts> = Lazy::new(|| {
let (tx, rx) = mpsc::channel::<AlertTask>(1);
let alerts = Alerts {
alerts: RwLock::new(HashMap::new()),
sender: tx,
};

#[derive(Debug, Default)]
thread::spawn(|| alert_runtime(rx));

alerts
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Thread spawn for alert_runtime lacks error handling and shutdown mechanism.

The current implementation spawns a thread for the alert runtime but doesn't keep its join handle, which means there's no way to gracefully shut it down. Additionally, there's no mechanism to handle errors from the alert_runtime function.

Consider using a more robust approach:

pub static ALERTS: Lazy<Alerts> = Lazy::new(|| {
    let (tx, rx) = mpsc::channel::<AlertTask>(1);
    let alerts = Alerts {
        alerts: RwLock::new(HashMap::new()),
        sender: tx,
    };

-   thread::spawn(|| alert_runtime(rx));
+   let handle = thread::spawn(|| {
+       if let Err(e) = alert_runtime(rx) {
+           error!("Alert runtime failed: {}", e);
+       }
+   });
+   // Store the handle somewhere for graceful shutdown

    alerts
});
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub static ALERTS: Lazy<Alerts> = Lazy::new(|| {
let (tx, rx) = mpsc::channel::<AlertTask>(1);
let alerts = Alerts {
alerts: RwLock::new(HashMap::new()),
sender: tx,
};
#[derive(Debug, Default)]
thread::spawn(|| alert_runtime(rx));
alerts
});
pub static ALERTS: Lazy<Alerts> = Lazy::new(|| {
let (tx, rx) = mpsc::channel::<AlertTask>(1);
let alerts = Alerts {
alerts: RwLock::new(HashMap::new()),
sender: tx,
};
let handle = thread::spawn(|| {
if let Err(e) = alert_runtime(rx) {
error!("Alert runtime failed: {}", e);
}
});
// Store the handle somewhere for graceful shutdown
alerts
});

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
src/alerts/mod.rs (2)

54-64: ⚠️ Potential issue

Thread spawn for alert_runtime lacks error handling and shutdown mechanism

The current implementation spawns a thread for the alert runtime but doesn't keep its join handle, which means there's no way to gracefully shut it down. Additionally, there's no mechanism to handle errors from the alert_runtime function.

Implement proper error handling and shutdown for the alert runtime thread:

pub static ALERTS: Lazy<Alerts> = Lazy::new(|| {
    let (tx, rx) = mpsc::channel::<AlertTask>(1);
    let alerts = Alerts {
        alerts: RwLock::new(HashMap::new()),
        sender: tx,
    };

-   thread::spawn(|| alert_runtime(rx));
+   let handle = thread::spawn(|| {
+       if let Err(e) = alert_runtime(rx) {
+           error!("Alert runtime failed: {}", e);
+       }
+   });
+   // Store the handle somewhere for graceful shutdown

    alerts
});

563-565: ⚠️ Potential issue

SQL Injection Risk in get_base_query: Ensure Stream Name Is Properly Escaped

The current implementation directly interpolates the stream name into the SQL query without properly escaping it, which poses a SQL injection risk.

 pub fn get_base_query(&self) -> String {
-    format!("SELECT * FROM \"{}\"", self.stream)
+    format!("SELECT * FROM \"{}\"", self.stream.replace("\"", "\"\""))
 }
🧹 Nitpick comments (1)
src/sync.rs (1)

249-268: Good implementation of retry logic with progressive backoff

This implementation properly addresses the past concern about breaking permanently on errors by adding a retry mechanism that attempts up to 3 times with a shorter evaluation interval.

One potential enhancement would be to use an exponential backoff strategy instead of a fixed 1-minute retry interval for more resilient error handling.

- sleep_duration = 1;
+ // Use an exponential backoff starting at 1 minute, then 2, then 4
+ sleep_duration = 1 << retry_counter;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 87e262b and b71b7f0.

📒 Files selected for processing (3)
  • src/alerts/mod.rs (23 hunks)
  • src/alerts/target.rs (9 hunks)
  • src/sync.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/alerts/target.rs
🧰 Additional context used
🧬 Code Definitions (2)
src/sync.rs (2)
src/alerts/mod.rs (4)
  • mpsc (55-55)
  • new (114-120)
  • new (155-167)
  • new (178-184)
src/alerts/alerts_utils.rs (1)
  • evaluate_alert (102-113)
src/alerts/mod.rs (2)
src/sync.rs (1)
  • alert_runtime (233-287)
src/utils/mod.rs (1)
  • user_auth_for_query (85-117)
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: coverage
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (10)
src/sync.rs (3)

231-243: Alert runtime implementation with proper task management

This introduces a well-structured alert runtime that manages alert tasks using tokio and a message queue approach. The implementation properly validates alert existence before creating new tasks and handles task lifecycle cleanly.


246-248: LGTM: Clean task preparation

The code correctly clones the alert and stores the ID for later reference before spawning the task.


274-282: Alert removal implementation with abort handling

This properly handles alert deletion by aborting the running task and removing it from the HashMap, with appropriate error logging when alerts don't exist.

src/alerts/mod.rs (7)

72-75: LGTM: Well-structured AlertTask enum

The AlertTask enum provides a clean interface for the alert system to handle creation and deletion operations through the message queue.


196-208: Improved AlertOperator naming for better readability

The renamed operator variants (Equal, NotEqual, etc.) are more consistent and clearer than the previous names (EqualTo, NotEqualTo, etc.).


223-286: LGTM: Comprehensive WhereConfigOperator enum implementation

The new WhereConfigOperator enum provides a wide range of comparison operators that will allow for more expressive alert conditions, along with proper string conversion methods.


661-666: Bug fix in aggregate condition validation

This correctly fixes the previously identified bug where agg2 was set to the same value as agg1, causing validation to be performed twice on the first condition instead of once on each.


735-742: LGTM: New helper method for evaluation window information

The addition of get_eval_window() provides a useful utility for formatting the evaluation window information, which improves code organization.


937-943: LGTM: Updated task creation with message passing

The refactored start_task method correctly uses the new message-passing approach with proper error handling.


947-953: LGTM: Updated task deletion with message passing

The refactored delete_task method correctly uses the new message-passing approach with proper error handling.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
src/alerts/mod.rs (2)

54-64: Thread spawn for alert_runtime lacks error handling

The current implementation spawns a thread for the alert runtime but doesn't keep its join handle, which means there's no way to handle errors from the alert_runtime function.

Consider using a more robust approach:

pub static ALERTS: Lazy<Alerts> = Lazy::new(|| {
    let (tx, rx) = mpsc::channel::<AlertTask>(10);
    let alerts = Alerts {
        alerts: RwLock::new(HashMap::new()),
        sender: tx,
    };

-   thread::spawn(|| alert_runtime(rx));
+   let handle = thread::spawn(|| {
+       if let Err(e) = alert_runtime(rx) {
+           error!("Alert runtime failed: {}", e);
+       }
+   });
+   // Store the handle somewhere for graceful shutdown

    alerts
});

563-565: ⚠️ Potential issue

SQL Injection Risk in get_base_query

The implementation directly interpolates the stream name into the SQL query, which could lead to SQL injection if the stream name isn't properly validated or escaped.

 pub fn get_base_query(&self) -> String {
-    format!("SELECT * FROM \"{}\"", self.stream)
+    format!("SELECT * FROM \"{}\"", self.stream.replace("\"", "\"\""))
 }
🧹 Nitpick comments (4)
src/sync.rs (1)

236-285: Consider adding graceful shutdown mechanism

The alert runtime doesn't have a mechanism to handle graceful shutdowns. This could lead to resource leaks or incomplete alert evaluations during server shutdown.

Consider modifying the function to accept a shutdown signal:

 #[tokio::main(flavor = "multi_thread")]
-pub async fn alert_runtime(mut rx: mpsc::Receiver<AlertTask>) -> Result<(), anyhow::Error> {
+pub async fn alert_runtime(mut rx: mpsc::Receiver<AlertTask>, mut shutdown: mpsc::Receiver<()>) -> Result<(), anyhow::Error> {
     let mut alert_tasks = HashMap::new();
 
     // this is the select! loop which will keep waiting for the alert task to finish or get cancelled
-    while let Some(task) = rx.recv().await {
-        match task {
+    loop {
+        tokio::select! {
+            Some(task) = rx.recv() => match task {
             AlertTask::Create(alert) => {
                 // check if the alert already exists
                 if alert_tasks.contains_key(&alert.id) {
                     error!("Alert with id {} already exists", alert.id);
                     continue;
                 }
                 // ...rest of the task handling...
             }
             AlertTask::Delete(ulid) => {
                 // ...task deletion logic...
             }
+            },
+            _ = shutdown.recv() => {
+                // Cancel all alert tasks
+                for (id, handle) in alert_tasks.drain() {
+                    handle.abort();
+                    warn!("Alert with id {} aborted due to shutdown", id);
+                }
+                break;
+            },
+            else => break,
         }
     }
     Ok(())
 }
src/alerts/mod.rs (3)

835-849: Implement exponential backoff for alert task creation retries

The current retry logic for sending alert tasks only tries once more with the same delay. Consider implementing an exponential backoff strategy or using a more sophisticated retry mechanism.

 match self.sender.send(AlertTask::Create(alert.clone())).await {
     Ok(_) => {}
     Err(e) => {
-        warn!("Failed to create alert task: {e}\nRetrying...");
-        // Retry sending the task
-        match self.sender.send(AlertTask::Create(alert.clone())).await {
-            Ok(_) => {}
-            Err(e) => {
-                error!("Failed to create alert task: {e}");
-                continue;
-            }
-        }
+        warn!("Failed to create alert task: {e}\nRetrying with backoff...");
+        // Exponential backoff retry
+        let mut retry_count = 0;
+        let max_retries = 3;
+        let mut delay_ms = 100;
+        
+        while retry_count < max_retries {
+            tokio::time::sleep(tokio::time::Duration::from_millis(delay_ms)).await;
+            match self.sender.send(AlertTask::Create(alert.clone())).await {
+                Ok(_) => break,
+                Err(e) => {
+                    retry_count += 1;
+                    delay_ms *= 2; // Exponential backoff
+                    if retry_count >= max_retries {
+                        error!("Failed to create alert task after {retry_count} retries: {e}");
+                        continue;
+                    }
+                    warn!("Retry {}/{} failed: {e}", retry_count, max_retries);
+                }
+            }
+        }
     }
 };

949-956: Consider buffering messages when channel is full

The start_task method might fail if the channel is full (max capacity is 10). Consider implementing a buffering mechanism or increasing the channel capacity.

 pub async fn start_task(&self, alert: AlertConfig) -> Result<(), AlertError> {
-    self.sender
-        .send(AlertTask::Create(alert))
-        .await
-        .map_err(|e| AlertError::CustomError(e.to_string()))?;
+    // Try sending with timeout to avoid blocking indefinitely
+    match tokio::time::timeout(
+        tokio::time::Duration::from_secs(5),
+        self.sender.send(AlertTask::Create(alert.clone())),
+    ).await {
+        Ok(Ok(_)) => Ok(()),
+        Ok(Err(e)) => {
+            // Channel error, likely full or closed
+            warn!("Failed to send alert task: {}", e);
+            Err(AlertError::CustomError(format!("Failed to schedule alert: {}", e)))
+        },
+        Err(_) => {
+            // Timeout occurred
+            warn!("Timeout while sending alert task");
+            Err(AlertError::CustomError("Timeout while scheduling alert".to_string()))
+        }
+    }
     Ok(())
 }

733-742: Add comprehensive logging for alert evaluation window

The get_eval_window method is useful, but could be improved by adding more context to make the logs more useful for debugging.

 pub fn get_eval_window(&self) -> String {
     match &self.eval_config {
         EvalConfig::RollingWindow(rolling_window) => format!(
-            "Start={}\tEnd={}",
-            rolling_window.eval_start, rolling_window.eval_end
+            "Evaluation window: Start={} End={} Frequency={}min",
+            rolling_window.eval_start, rolling_window.eval_end, rolling_window.eval_frequency
         ),
     }
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b71b7f0 and be76b5c.

📒 Files selected for processing (3)
  • src/alerts/mod.rs (22 hunks)
  • src/alerts/target.rs (9 hunks)
  • src/sync.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/alerts/target.rs
🧰 Additional context used
🧬 Code Definitions (2)
src/sync.rs (1)
src/alerts/alerts_utils.rs (1)
  • evaluate_alert (102-113)
src/alerts/mod.rs (2)
src/sync.rs (1)
  • alert_runtime (233-287)
src/alerts/alerts_utils.rs (1)
  • user_auth_for_query (59-89)
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: coverage
🔇 Additional comments (2)
src/sync.rs (2)

231-237: Good addition of a dedicated runtime for alert tasks

The implementation of a separate runtime for alerts is a good design choice as it separates concerns and provides a clear boundary for alert processing. This new asynchronous approach using message passing is more scalable and maintainable than the previous approach.


246-265: Improved error handling with retry mechanism

You've addressed the previous review comment by implementing a retry mechanism instead of breaking on the first error. The graduated approach with retry counter and reduced sleep duration is a good pattern for handling transient errors.

Copy link
Contributor

@nikhilsinhaparseable nikhilsinhaparseable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to merge

@nitisht nitisht merged commit 22901bf into parseablehq:main Apr 9, 2025
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants