feat: prism post datasets API #1236

de-sh · 2025-03-13T11:18:21Z

Fixes #XXXX.

Description

Example

Request:

{"streams": ["test"]}

Response:

[
  {
    "stream": "test",
    "info": {
      "created-at": "2025-03-14T18:41:15.342294+00:00",
      "first-event-at": "2025-03-15T00:11:15.344+05:30",
      "stream_type": "UserDefined",
      "log_source": [
        {
          "log_source_format": "Json",
          "fields": []
        }
      ]
    },
    "stats": {
      "stream": "test",
      "time": "2025-03-17T09:15:41.884455Z",
      "ingestion": {
        "count": 38452150,
        "size": 17461745043,
        "format": "json",
        "lifetime_count": 38452150,
        "lifetime_size": 17461745043,
        "deleted_count": 0,
        "deleted_size": 0
      },
      "storage": {
        "size": 5078719817,
        "format": "parquet",
        "lifetime_size": 5078719817,
        "deleted_size": 0
      }
    },
    "retention": [],
    "hottier": null,
    "counts": {
      "fields": [
        "start_time",
        "end_time",
        "count"
      ],
      "records": [
        {
          "start_time": "2025-03-17T08:15:41.884559+00:00",
          "end_time": "2025-03-17T08:21:41.884559+00:00",
          "count": 0
        },
        {
          "start_time": "2025-03-17T08:21:41.884559+00:00",
          "end_time": "2025-03-17T08:27:41.884559+00:00",
          "count": 0
        },
        {
          "start_time": "2025-03-17T08:27:41.884559+00:00",
          "end_time": "2025-03-17T08:33:41.884559+00:00",
          "count": 0
        },
        {
          "start_time": "2025-03-17T08:33:41.884559+00:00",
          "end_time": "2025-03-17T08:39:41.884559+00:00",
          "count": 0
        },
        {
          "start_time": "2025-03-17T08:39:41.884559+00:00",
          "end_time": "2025-03-17T08:45:41.884559+00:00",
          "count": 0
        },
        {
          "start_time": "2025-03-17T08:45:41.884559+00:00",
          "end_time": "2025-03-17T08:51:41.884559+00:00",
          "count": 0
        },
        {
          "start_time": "2025-03-17T08:51:41.884559+00:00",
          "end_time": "2025-03-17T08:57:41.884559+00:00",
          "count": 0
        },
        {
          "start_time": "2025-03-17T08:57:41.884559+00:00",
          "end_time": "2025-03-17T09:03:41.884559+00:00",
          "count": 0
        },
        {
          "start_time": "2025-03-17T09:03:41.884559+00:00",
          "end_time": "2025-03-17T09:09:41.884559+00:00",
          "count": 0
        },
        {
          "start_time": "2025-03-17T09:09:41.884559+00:00",
          "end_time": "2025-03-17T09:15:41.884559+00:00",
          "count": 0
        }
      ]
    },
    "distinct_sources": {
      "ips": null,
      "user_agents": null
    }
  }
]

NOTE: returns for all streams when none are mentioned or body is empty

This PR has:

been tested to ensure log ingestion and log query works.
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added documentation for new or modified features or behaviors.

Summary by CodeRabbit

New Features
- Introduced a new endpoint for dataset queries, enabling secure POST requests to retrieve comprehensive dataset details including statistics, retention information, and record counts.
Refactor
- Streamlined data processing and response formatting to improve performance and reliability when handling dataset requests.
- Simplified the conversion of record batches to JSON, enhancing code efficiency.

coderabbitai · 2025-03-13T11:18:34Z

Walkthrough

This change adds a dataset service endpoint to the HTTP server. The implementation updates the route configuration in both the query server and server modules by registering a new method for datasets. A new asynchronous handler is introduced to process POST requests for datasets, including JSON parsing and authorization checks. Additionally, new request and response structures, along with an updated error enum in the Prism module, support dataset queries. Finally, minor improvements have been made to JSON conversion for Arrow record batches.

Changes

File(s)	Change Summary
`src/handlers/http/modal/query_server.rs`, `src/handlers/http/modal/server.rs`, `src/handlers/http/prism_logstream.rs`	Added new dataset endpoints: registered `.service(Server::get_prism_datasets())` in query server, defined a dataset route (`/datasets`) with authorization in server, and implemented the async `post_datasets` handler.
`src/prism/logstream/mod.rs`, `src/response.rs`	Introduced `PrismDatasetRequest` and `PrismDatasetResponse` structs, implemented the `get_datasets` method with extended error variants, and simplified JSON conversion in the `to_http` method.
`src/utils/arrow/mod.rs`	Updated the `record_batches_to_json` function to accept owned `RecordBatch` instances and modified the writing loop and corresponding test case.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server
    participant Handler
    participant PrismModule

    Client->>Server: POST /datasets (JSON payload)
    Server->>Handler: Forward dataset request
    Handler->>PrismModule: Execute get_datasets()
    PrismModule-->>Handler: Return dataset info or error
    Handler->>Server: Wrap response as JSON
    Server-->>Client: HTTP JSON response

Possibly related PRs

fix: ensure current slot data goes into current slot file and isn't flushed until end of the slot #1238: The changes in the main PR are related to the addition of the get_prism_datasets method in the Server, which is also introduced in the retrieved PR as part of the DiskWriter functionality, indicating a connection in the context of handling dataset requests.
perf: don't construct a tokio runtime for each query #1226: The changes in the main PR are related to the addition of the get_prism_datasets method in the Server implementation, which is directly connected to the new post_datasets function introduced in the retrieved PR, as both involve handling dataset-related requests.
fix: bugs introduced in #1143 #1185: The changes in the main PR are related to the addition of the get_prism_datasets method in the Server struct, which is also reflected in the retrieved PR where the get_prism_datasets method is introduced in the same Server implementation.

Suggested labels

for next release

Suggested reviewers

nikhilsinhaparseable

Poem

Hop, hop, along the code trail we go, 🐇
New endpoints sprout like carrots in a row.
Datasets and logs now dance in sync,
Each function and method in perfect link.
With gentle hops, our code grows neat—
A bunny’s delight in every beat!
Happy coding in the digital heat!

Tip

⚡🧪 Multi-step agentic review comment chat (experimental)

We're introducing multi-step agentic chat in review comments. This experimental feature enhances review discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments.
- To enable this feature, set early_access to true under in the settings.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6bf49c0 and ab16243.

📒 Files selected for processing (1)

src/prism/logstream/mod.rs (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/prism/logstream/mod.rs

⏰ Context from checks skipped due to timeout of 90000ms (10)

GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: coverage
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Build Default x86_64-pc-windows-msvc

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

src/prism/logstream/mod.rs (2)

196-213: Well-documented dataset response structure.

PrismDatasetResponse encapsulates essential stream metadata, statistics, retention, and query results. The doc comments are thorough. Consider storing distinct_sources in a typed struct if the shape is known, but using serde_json::Value is acceptable for flexibility.

229-345: Robust dataset retrieval and aggregation logic.

The loop processes each stream, skipping unavailable ones, which strengthens resilience.

Hot tier information and distinct entry queries are well-integrated.

Swallowing errors in distinct lookups is suitable for partial success, but consider logging warnings on failure for better observability.

Parallelizing the loop (using tasks or rayon) might improve performance for large stream lists, but the current sequential approach is simpler to maintain.

Overall, the code is well-structured, and the error handling strategy aligns with partial success semantics.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 506527d and 5367820.

📒 Files selected for processing (6)

src/handlers/http/modal/query_server.rs (1 hunks)
src/handlers/http/modal/server.rs (2 hunks)
src/handlers/http/prism_logstream.rs (1 hunks)
src/prism/logstream/mod.rs (5 hunks)
src/response.rs (1 hunks)
src/utils/arrow/mod.rs (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (10)

GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: coverage
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Build Kafka x86_64-unknown-linux-gnu

🔇 Additional comments (23)

src/handlers/http/modal/query_server.rs (1)

79-80: Looks good! Added prism datasets service.

The addition of the datasets service aligns well with the existing infrastructure and properly follows the established pattern next to the logstream service.

src/handlers/http/modal/server.rs (1)

184-193: Implementation of datasets endpoint follows best practices.

The implementation follows the established pattern for route declaration and properly includes the necessary authorization checks for stream info, stats, and retention.

src/response.rs (1)

36-36: Good refactoring to simplify JSON conversion.

This change removes an unnecessary intermediate step, directly passing the record batches to the conversion function. This simplifies the code while maintaining the same functionality.

src/handlers/http/prism_logstream.rs (3)

20-20: Updated imports to support JSON handling.

Added the Json type import to support parsing request bodies for the new datasets endpoint.

24-24: Added necessary imports for dataset functionality.

Included PrismDatasetRequest to support the new datasets endpoint functionality.

33-38: Well-implemented handler for the datasets endpoint.

This handler properly:

Takes a JSON payload via the Json extractor

Processes the request asynchronously

Returns a properly formatted JSON response

Includes appropriate error handling

The comment also clearly explains that this endpoint combines functionality from multiple other endpoints.

src/utils/arrow/mod.rs (3)

93-93: Refactored function signature for enhanced clarity.

Switching to a slice of owned RecordBatch objects (&[RecordBatch]) ensures a more standard API and simplifies usage, preventing potential reference lifetime pitfalls.

96-98: Looping over batches is straightforward and concise.

Writing each RecordBatch in a loop is a clear and maintainable approach. The usage of the ? operator for error propagation is consistent with idiomatic Rust error handling.

194-194: Consistent with the updated function signature.

Switching from vec![&r] to vec![r] matches the new function parameter style. This change eliminates the need for references to RecordBatch in this context.

src/prism/logstream/mod.rs (14)

25-27: New imports for serialization, JSON handling, and debugging.

Adding serde, serde_json, and tracing imports is appropriate for the newly introduced dataset and logging features.

36-36: Extended query imports.

Introducing Query and QueryError in the imports indicates new query functionalities specific to this module, streamlining error handling and query building.

38-38: Hot Tier Manager integration.

Importing HotTierError, HotTierManager, and StreamHotTier suggests advanced stream management and a new dimension of error handling for hot tier capabilities.

40-40: Expanded query utilities.

Bringing in execute, CountsRequest, CountsResponse, and QUERY_SESSION aligns with your new approach to handle count queries within the streaming logic.

43-46: Utility imports for record batch JSON conversion and time parsing.

Using record_batches_to_json and TimeRange helps unify record transformations and date handling, improving modularity and readability across the codebase.

215-227: Clear request structure for dataset queries.

PrismDatasetRequest neatly defines the data needed to retrieve datasets. The user-friendly doc comments help clarify field usage. Ensure upstream validation rejects invalid or empty time strings if needed.

355-355: New error annotation for hot tier.

#[error("Hottier: {0}")] extends clarity by rendering a descriptive message when a hot tier-related error occurs.

356-357: Introducing Hottier variant with from-attribute.

Hottier(#[from] HotTierError) and Query(#[from] QueryError) make error handling consistent and concise through thiserror.

359-359: Readable time parse error message.

#[error("TimeParse: {0}")] ensures that users can quickly identify problems with time string inputs.

360-361: Time parsing and execute error variants.

TimeParse(#[from] TimeParseError) and #[error("Execute: {0}")] unify typical parse and execution failures within the same error enumeration.

363-364: Empty stream list checks.

Returning a BAD_REQUEST (see below) for an empty stream list is a sensible choice, guiding the user to provide at least one stream.

373-373: Mapping hot tier errors to 500.

PrismLogstreamError::Hottier(_) => StatusCode::INTERNAL_SERVER_ERROR is consistent with critical server-side failure logic.

374-375: Query and time parse error statuses.

Query errors remain 500, indicating back-end failures.

Time parse errors map to 404, guiding the user to correct the time string usage.

377-377: Empty request yields a 400 Bad Request.

Returning StatusCode::BAD_REQUEST helps users recognize that their request must contain valid streams.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/prism/logstream/mod.rs (1)

244-315: Consider parallel processing for multiple streams.

For requests with many streams, processing each stream sequentially might lead to increased response times. Consider using parallel processing with futures::future::join_all to improve performance for multi-stream requests.

-        let mut responses = vec![];
-        for stream in self.streams.iter() {
-            // existing processing logic
-        }
+        // Create a vector of futures for each stream
+        let futures = self.streams.iter().map(|stream| {
+            let stream = stream.clone();
+            let start_time = self.start_time.clone();
+            let end_time = self.end_time.clone();
+            let key = key.clone();
+            
+            async move {
+                // Skip if unauthorized or stream not found
+                if Users.authorize(key, Action::ListStream, Some(&stream), None)
+                    != crate::rbac::Response::Authorized
+                {
+                    warn!("Unauthorized access requested for stream: {stream}");
+                    return None;
+                }
+                
+                if PARSEABLE.check_or_load_stream(&stream).await {
+                    debug!("Stream not found: {stream}");
+                    return None;
+                }
+                
+                // Process stream (existing logic)
+                match process_single_stream(&stream, &start_time, &end_time).await {
+                    Ok(response) => Some(response),
+                    Err(e) => {
+                        warn!("Error processing stream {stream}: {e}");
+                        None
+                    }
+                }
+            }
+        });
+        
+        // Process all streams in parallel
+        let responses: Vec<_> = futures::future::join_all(futures)
+            .await
+            .into_iter()
+            .filter_map(|result| result)
+            .collect();

You would need to extract the stream processing logic into a separate process_single_stream function.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5367820 and 427316d.

📒 Files selected for processing (2)

src/handlers/http/prism_logstream.rs (1 hunks)
src/prism/logstream/mod.rs (5 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (10)

GitHub Check: coverage
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: Quest Smoke and Load Tests for Standalone deployments

🔇 Additional comments (5)

src/handlers/http/prism_logstream.rs (1)

32-38: Function implementation looks good.

The new post_datasets function correctly extracts the session key from the request, processes the dataset query, and returns a JSON response. The implementation follows the established pattern in the codebase.

src/prism/logstream/mod.rs (4)

197-214: Well-structured response model with comprehensive documentation.

The PrismDatasetResponse struct is appropriately designed to encapsulate all relevant information about a dataset, including stream info, stats, retention policy, and query results. The documentation clearly explains each field's purpose.

216-228: Request model follows good design practices.

The PrismDatasetRequest struct properly uses Rust idioms like #[serde(rename_all = "camelCase")] for JSON serialization/deserialization and provides clear field documentation.

367-376: Good error handling extension.

The additions to the PrismLogstreamError enum appropriately cover all the new error cases that might occur during dataset operations. This ensures proper error propagation and handling throughout the system.

385-389: Appropriate HTTP status codes for error types.

The status code mapping for the new error variants is appropriate, with authentication errors returning 401, time parsing errors returning 404, and internal processing errors returning 500.

src/prism/logstream/mod.rs

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/handlers/http/prism_logstream.rs (1)
36-45: New endpoint implementation for dataset retrieval.

The new post_datasets function is well-structured and follows established patterns:

It extracts the session key for authentication

Passes the session key to the dataset retrieval method

Returns the result as JSON

The function correctly uses async/await with proper error propagation through the ? operator.

Consider adding more specific documentation about:

The expected format of the request payload

The response structure

Any authorization requirements

Enhance the function documentation to be more specific about request/response format:
-/// A combination of /stats, /retention, /hottier, /info, /counts and /query
+/// Retrieves dataset information for a Prism logstream
+/// 
+/// Takes a JSON payload with stream identifier and time range, and returns
+/// dataset details including creation time, stream type, log source, and
+/// various statistics related to ingestion and storage.
+///
+/// Requires authentication via session key.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 427316d and 90df592.

📒 Files selected for processing (1)

src/handlers/http/prism_logstream.rs (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (10)

GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: coverage

🔇 Additional comments (4)

src/handlers/http/prism_logstream.rs (4)

19-22: Added HttpRequest and Json imports for new functionality.

The imports are correctly added to support the new post_datasets function, providing the necessary components for handling JSON payloads and accessing HTTP request details.

24-27: Updated imports to include PrismDatasetRequest type.

The imports are properly organized to include the new PrismDatasetRequest type and the session key extraction utility, maintaining good code organization.

37-40: Function signature follows best practices.

The function signature correctly uses Actix's JSON extractor pattern for the request body and includes the HttpRequest parameter for session key extraction.

41-44:
❓ Verification inconclusive

Verify authorization checks in the get_datasets implementation.

The session key extraction and passing looks good. However, this code doesn't show the implementation of the actual authorization check that happens within get_datasets.

🏁 Script executed:
#!/bin/bash
# Check if there are proper authorization checks in the get_datasets implementation
rg -A 10 "fn get_datasets.*session_key" --type rust
Length of output: 53
ACTION: Additional authorization verification required via manual or extended regex search

The session key extraction logic is solid, but our initial regex search did not locate the get_datasets implementation for verifying its internal authorization checks. Please investigate further by:

Locating the full definition of fn get_datasets to inspect for any authorization-related logic.

Checking for keywords like authorize, permission, or auth in the function’s implementation.

For example, run this extended shell script to search all Rust files for the definition and check for authorization check patterns:
#!/bin/bash
# Find all occurrences of the get_datasets function and inspect following lines for authorization logic
rg "fn get_datasets" --type rust

# For each file found, display the function's context and look for authorization keywords
for file in $(rg -l "fn get_datasets" --type rust); do
    echo "Inspecting file: $file"
    rg -A 15 "fn get_datasets" "$file" | grep -iE "authorize|permission|auth"
done
Please verify the output manually to ensure that appropriate authorization checks are implemented.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/prism/logstream/mod.rs (1)

226-312: Consider making time ranges configurable in dataset retrieval.

The implementation is robust with good error handling and authorization checks. However, the time ranges for CountsRequest are hardcoded to "1h" and "now". Consider making these configurable through the API request to improve flexibility.

 #[derive(Deserialize, Default)]
 #[serde(rename_all = "camelCase")]
 pub struct PrismDatasetRequest {
     /// List of stream names to query
     #[serde(default)]
     streams: Vec<String>,
+    /// Start time for queries (default: "1h")
+    #[serde(default = "default_start_time")]
+    start_time: String,
+    /// End time for queries (default: "now")
+    #[serde(default = "default_end_time")]
+    end_time: String,
 }
+
+fn default_start_time() -> String {
+    "1h".to_string()
+}
+
+fn default_end_time() -> String {
+    "now".to_string()
+}

Then update the relevant parts of the implementation to use these values:

         let records = CountsRequest {
             stream: stream.clone(),
-            start_time: "1h".to_owned(),
-            end_time: "now".to_owned(),
+            start_time: self.start_time.clone(),
+            end_time: self.end_time.clone(),
             num_bins: 1,
         }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 35561e9 and d64b8a8.

📒 Files selected for processing (2)

src/handlers/http/prism_logstream.rs (1 hunks)
src/prism/logstream/mod.rs (5 hunks)

🧰 Additional context used

🧠 Learnings (1)

src/prism/logstream/mod.rs (1)

Learnt from: de-sh
PR: parseablehq/parseable#1236
File: src/prism/logstream/mod.rs:332-332
Timestamp: 2025-03-13T11:39:52.587Z
Learning: SQL injection concerns can be ignored in this codebase as all SQL queries are run against immutable data streams, limiting the potential impact of any injection.

⏰ Context from checks skipped due to timeout of 90000ms (10)

GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: coverage

🔇 Additional comments (8)

src/handlers/http/prism_logstream.rs (3)

20-21: New imports support dataset API endpoint functionality.

The additional imports (Json, HttpRequest) facilitate the JSON parsing and request handling capabilities needed for the new dataset POST endpoint.

24-27: Module imports updated to include dataset-related components.

The imports correctly include the new PrismDatasetRequest type and reuse the existing authorization utility function.

36-49: Well-implemented dataset POST handler with clear error handling.

The implementation follows established patterns for Actix Web handlers:

Properly extracts session key for authorization

Handles optional JSON payload with sensible defaults

Delegates dataset retrieval logic to the model layer

Returns a properly formatted JSON response

This maintains separation of concerns between HTTP handling and business logic.

src/prism/logstream/mod.rs (5)

197-214: Well-structured and documented response type for dataset queries.

The PrismDatasetResponse struct provides a comprehensive representation of dataset information with clear field documentation. The structure follows good API design by grouping related information together.

216-224: Request structure uses proper serialization attributes.

The PrismDatasetRequest struct is properly annotated with #[serde(rename_all = "camelCase")] for consistent JSON field naming in the API. The #[serde(default)] on the streams field ensures the API remains backward compatible if the field is omitted.

323-353: SQL query construction appears safe in this context.

The method builds a SQL query by directly interpolating user-provided field names. While this would normally be a SQL injection risk, I understand from the provided context that SQL injection concerns are mitigated in this codebase because queries are run against immutable data streams.

The method effectively retrieves distinct entries and processes the results appropriately. Consider also making the time range configurable here, matching the previous recommendation.

364-373: Error types appropriately expanded to cover all failure modes.

The new error variants in PrismLogstreamError properly account for all possible failure modes in the dataset retrieval process. This ensures that errors are accurately reported and can be handled appropriately.

382-386: Status codes for new error types are appropriate.

The mapping of error types to HTTP status codes follows RESTful conventions:

Internal server errors for processing failures

Not found for time parsing errors (consistent with resource not found semantics)

Unauthorized for authentication errors

praveen5959 · 2025-03-14T07:16:49Z

stream_name is missing from the stream level entry.

src/prism/logstream/mod.rs

Signed-off-by: Devdutt Shenoi <[email protected]>

feat: prism post datasets API

5367820

de-sh requested a review from praveen5959 March 13, 2025 11:18

coderabbitai bot reviewed Mar 13, 2025

View reviewed changes

coderabbitai bot previously approved these changes Mar 13, 2025

View reviewed changes

Devdutt Shenoi added 2 commits March 13, 2025 17:04

fix: block unauthorized access

97f98d4

fix: list all streams when none are provided

427316d

de-sh dismissed coderabbitai[bot]’s stale review via 427316d March 13, 2025 11:34

coderabbitai bot requested changes Mar 13, 2025

View reviewed changes

src/prism/logstream/mod.rs Show resolved Hide resolved

style: cargo fmt

90df592

coderabbitai bot reviewed Mar 13, 2025

View reviewed changes

Devdutt Shenoi added 2 commits March 13, 2025 17:15

style: clippy suggestions

9680b52

log: don't warn when listed by parseable

35561e9

coderabbitai bot previously approved these changes Mar 13, 2025

View reviewed changes

nitisht requested a review from parmesant March 13, 2025 12:18

Devdutt Shenoi added 2 commits March 13, 2025 19:51

fix: last hour window

d9d2235

refactor: work without request body

d64b8a8

de-sh dismissed coderabbitai[bot]’s stale review via d64b8a8 March 13, 2025 14:44

coderabbitai bot reviewed Mar 13, 2025

View reviewed changes

coderabbitai bot previously approved these changes Mar 13, 2025

View reviewed changes

fix: return stream name in json

6bf49c0

de-sh dismissed coderabbitai[bot]’s stale review via 6bf49c0 March 14, 2025 07:53

coderabbitai bot previously approved these changes Mar 14, 2025

View reviewed changes

Merge branch 'main' into datasets

47f86c1

de-sh commented Mar 17, 2025

View reviewed changes

src/prism/logstream/mod.rs Outdated Show resolved Hide resolved

suggestion by @praveen5959

ab16243

Signed-off-by: Devdutt Shenoi <[email protected]>

de-sh dismissed coderabbitai[bot]’s stale review via ab16243 March 17, 2025 09:13

coderabbitai bot approved these changes Mar 17, 2025

View reviewed changes

nikhilsinhaparseable added the for next release label Mar 17, 2025

nitisht merged commit a8c8ed4 into parseablehq:main Mar 18, 2025
14 checks passed

de-sh deleted the datasets branch March 18, 2025 07:20

This was referenced Mar 19, 2025

fix: datasets API edgecase where stream is not hottiered #1252

Merged

add schema to dataset api response #1260

Merged

refactor + test: query response serialization #1165

Open

add dataset name and type to home api response #1271

Merged

coderabbitai bot mentioned this pull request Mar 28, 2025

fix: parallelise datasets API #1276

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: prism post datasets API #1236

feat: prism post datasets API #1236

Uh oh!

de-sh commented Mar 13, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 13, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

praveen5959 commented Mar 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat: prism post datasets API #1236

feat: prism post datasets API #1236

Uh oh!

Conversation

de-sh commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Example

Summary by CodeRabbit

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

praveen5959 commented Mar 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

de-sh commented Mar 13, 2025 •

edited

Loading

coderabbitai bot commented Mar 13, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)