Skip to content

fix: ensure current slot data goes into current slot file and isn't flushed until end of the slot #1238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Mar 16, 2025

Conversation

de-sh
Copy link
Contributor

@de-sh de-sh commented Mar 14, 2025

Fixes #1240

Summary by CodeRabbit

  • New Features

    • Introduced a new disk writing mechanism for enhanced on-disk data storage.
    • Added a new file extension constant for incomplete arrow files.
    • Enhanced TimeRange functionality with methods for generating granular time ranges and checking timestamp containment.
  • Refactor

    • Streamlined the flushing process for disk writes, improving performance and reducing complexity.
    • Updated method names and signatures for clarity and consistency.
    • Updated test cases to reflect changes in the writing strategy and file naming conventions.

@de-sh de-sh changed the title feat: DiskWriter abstraction to handle write to arrows files refactor: DiskWriter abstraction to handle write to arrows files Mar 14, 2025
Copy link
Contributor

coderabbitai bot commented Mar 14, 2025

Walkthrough

The changes introduce a new DiskWriter struct to encapsulate disk writing functionality previously handled by StreamWriter. The new struct wraps a buffered writer and manages file paths, error logging, and proper finalization through a dedicated finish method and a Drop implementation. Additionally, the Stream struct now creates and flushes disk writers using DiskWriter instead of StreamWriter, streamlining the write and flush operations. A new constant for file extensions is added, and various method signatures are updated to reflect these changes.

Changes

File Change Summary
src/parseable/staging/writer.rs Introduced DiskWriter struct wrapping a StreamWriter<BufWriter<File>> and a PathBuf. Added methods try_new, is_current, write, and Drop implementation for automatic finalization and error logging.
src/parseable/streams.rs Updated the Stream struct to use DiskWriter instead of StreamWriter in the push method and simplified the flush functionality by retaining only current writers. Renamed path_by_current_time to filename_by_partition.
src/parseable/mod.rs Added constant PART_FILE_EXTENSION defined as "part" for incomplete arrow files.
src/parseable/staging/reader.rs Updated test module to replace StreamWriter with DiskWriter, modified file extension in test cases, and adjusted the write_test_batches function to reflect changes in the writing mechanism.
src/utils/time.rs Added methods granularity_range and contains to TimeRange struct for calculating time ranges and checking timestamp containment.

Possibly related PRs

  • refactor: DRY object_storage #1147: The changes in the main PR are related to the modifications in the retrieved PR as both involve the introduction and usage of the DiskWriter struct for handling disk operations, specifically in the context of writing data.
  • refactor: utility Minute handles slotting by minute #1203: The changes in the main PR are related to the modifications in the Stream struct's methods in the retrieved PR, as both involve the transition from StreamWriter to DiskWriter and the handling of time-related operations.
  • feat: merge finish .arrows and convert to .parquet #1200: The changes in the main PR, which introduce the DiskWriter struct and modify the Stream struct to utilize it, are directly related to the modifications in the retrieved PR that also involve the Stream struct and its handling of data writing, specifically transitioning to a new method for flushing and converting data.

Suggested labels

for next release

Suggested reviewers

  • nikhilsinhaparseable

Poem

I'm a rabbit, hopping code with glee,
DiskWriter's here, as neat as can be.
With buffered strokes and file renames,
I savor each change as it proclaims.
A codey warren, safe and sound,
Bound in bytes where joy is found.
🥕 Hop on, team, to code profound!

Tip

⚡🧪 Multi-step agentic review comment chat (experimental)
  • We're introducing multi-step agentic chat in review comments. This experimental feature enhances review discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments.
    - To enable this feature, set early_access to true under in the settings.
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
src/parseable/streams.rs (2)

135-136: Avoid panicking on file creation to ensure graceful error handling.

Using .expect("File and RecordBatch both are checked") will terminate the program on failure. Consider returning a StagingError instead of panicking to allow the caller to handle file-creation errors more gracefully.

-let mut writer = DiskWriter::new(file_path, &record.schema())
-    .expect("File and RecordBatch both are checked");
+let mut writer = match DiskWriter::new(file_path, &record.schema()) {
+    Ok(dw) => dw,
+    Err(e) => {
+        return Err(e);
+    }
+};

361-365: Consider capturing errors from DiskWriter::finish() rather than just draining.

Draining the disk HashMap relies on the Drop for each DiskWriter to finalize writes, meaning any finalization errors are only logged and not propagated. If you need stronger reliability or post-flush checks, consider a synchronous finish call for each writer so that errors can be handled.

src/parseable/staging/writer.rs (2)

45-48: Make fields private to preserve encapsulation unless they must be public.

Having inner and path as public could allow unwanted external manipulation. If only used internally, consider making them private for better maintainability.

-pub struct DiskWriter {
-    pub inner: StreamWriter<BufWriter<File>>,
-    pub path: PathBuf,
+pub struct DiskWriter {
+    inner: StreamWriter<BufWriter<File>>,
+    path: PathBuf,
 }

63-74: Surface rename failures more visibly if needed.

If renaming the file fails, the error is merely logged. If critical, consider re-trying or signaling the caller. Otherwise, the silent fallback is acceptable for non-blocking logs.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db4a68d and cd0e1dd.

📒 Files selected for processing (2)
  • src/parseable/staging/writer.rs (2 hunks)
  • src/parseable/streams.rs (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: coverage
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
🔇 Additional comments (7)
src/parseable/streams.rs (1)

61-61: Good import of the new DiskWriter.

By re-exporting both DiskWriter and Writer from the same module, you keep the code organized while enabling the new abstraction to coexist with the existing logic.

src/parseable/staging/writer.rs (6)

22-24: Import changes for file handling and buffering look good.

Using File, OpenOptions, and BufWriter aligns well with your new approach for disk-based writes.


33-33: Additional imports for logging, constants, and errors are properly included.

Bringing in tracing::error, ARROW_FILE_EXTENSION, adapt_batch, and StagingError is consistent with the extended functionality needed for disk-based writes and improved logging.

Also applies to: 35-35, 37-37


42-43: Replacing the HashMap<String, StreamWriter<File>> with HashMap<String, DiskWriter> is a solid abstraction step.

This change centralizes the on-disk streaming logic in a dedicated struct, making it clearer to track and finalize writes.


59-61: Delegate writing logic with minimal overhead.

Forwarding to self.inner.write(rb) is concise. The direct mapping of the error to StagingError is equally straightforward.


77-80: Automatic resource finalization on drop is a useful safeguard.

Calling finish() within Drop helps ensure streams are properly closed. Just note that any final error remains in logs without the chance to recover.


50-57:

❓ Verification inconclusive

Be cautious with append(true) to avoid data collisions in existing files.

Currently, you open files in append mode, which might unintentionally append to a pre-existing file rather than fully replacing it. Verify this matches the intended behavior (e.g., continuing partial batch writes), and consider truncate(true) if complete overwrites are desired.


🏁 Script executed:

#!/bin/bash
# Ensure DiskWriter is only used for newly created files or handle existing data intentionally.
rg -A 3 "DiskWriter::new" 

Length of output: 325


Attention: Verify File Open Mode Usage in DiskWriter

The current implementation of DiskWriter::new in src/parseable/staging/writer.rs (lines 50–57) uses append(true) when opening the file. This behavior means that new data will be added to any pre-existing content, which can lead to unintended data collisions if a stale file is present. In src/parseable/streams.rs, we see that DiskWriter::new is invoked consistently without any additional safeguards around file state.

  • Confirm whether the intent is to continuously write to an existing file (e.g., for accumulating partial batch writes).
  • If the design requires a fresh file for each operation, consider using truncate(true) to replace any existing content instead of appending.

Please review the design intent and update accordingly if necessary.

coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 14, 2025
coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 14, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/parseable/staging/writer.rs (1)

73-85: Consider returning Result from finish method instead of swallowing errors

The finish method currently logs errors but doesn't propagate them, which might hide problems and make debugging difficult. While logging is good, callers might benefit from being able to handle or react to these errors.

Additionally, the file renaming operation could fail if the target file already exists - consider adding additional error handling or cleanup to address this edge case.

- pub fn finish(&mut self) {
+ pub fn finish(&mut self) -> Result<(), std::io::Error> {
-     if let Err(err) = self.inner.finish() {
-         error!("Couldn't finish arrow file {:?}, error = {err}", self.path);
-         return;
-     }
+     self.inner.finish().map_err(|err| {
+         error!("Couldn't finish arrow file {:?}, error = {err}", self.path);
+         std::io::Error::new(std::io::ErrorKind::Other, err)
+     })?;

      let mut arrow_path = self.path.to_owned();
      arrow_path.set_extension(ARROW_FILE_EXTENSION);
-     if let Err(err) = std::fs::rename(&self.path, &arrow_path) {
-         error!("Couldn't rename file {:?}, error = {err}", self.path);
-     }
+     std::fs::rename(&self.path, &arrow_path).map_err(|err| {
+         error!("Couldn't rename file {:?}, error = {err}", self.path);
+         err
+     })
  }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fd62135 and fd82d8c.

📒 Files selected for processing (1)
  • src/parseable/staging/writer.rs (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: coverage
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
🔇 Additional comments (6)
src/parseable/staging/writer.rs (6)

48-51: Well-designed DiskWriter struct with appropriate fields

The DiskWriter struct is well-designed with two essential fields:

  1. inner: A buffered stream writer for performance optimization
  2. path: The file path stored for later operations like renaming

This encapsulation properly separates concerns and provides a cleaner API than directly using StreamWriter<File>.


55-66: Good initialization pattern with proper error handling

The try_new function follows best practices by:

  1. Converting the input path to a standardized type
  2. Setting the temporary extension before file creation
  3. Using appropriate OpenOptions for file creation
  4. Properly propagating errors with the correct error type
  5. Using buffered writing for performance

The builder pattern for OpenOptions is used correctly and the function returns a well-structured Result.


68-71: Simple and effective write method

This wrapper method correctly maps errors to the appropriate domain-specific error type, maintaining a clean abstraction boundary.


88-92: Good use of Drop trait for resource cleanup

Implementing Drop ensures that files are properly finalized even if clients forget to call finish() explicitly. This helps prevent resource leaks and incomplete files.

Note that if you modify finish() to return a Result as suggested, the drop implementation would still need to swallow errors since drop cannot return values.


22-38: Clean import organization with appropriate dependencies

The imports have been properly organized and include all necessary components for file handling, buffering, error logging, and file extensions. The reorganization of imports from the crate helps maintain a clean structure.


45-46: Good abstraction update in Writer struct

Changing from HashMap<String, StreamWriter<File>> to HashMap<String, DiskWriter> improves the code by encapsulating file handling logic and providing a cleaner API. This change is consistent with the refactoring goal.

coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 14, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fd82d8c and 8706176.

📒 Files selected for processing (1)
  • src/parseable/staging/writer.rs (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: coverage
🔇 Additional comments (6)
src/parseable/staging/writer.rs (6)

22-24: New imports support buffered file operations

These imports are necessary for the new DiskWriter struct's file handling capabilities, providing access to file operations, buffering, and path manipulation.


33-33: Good addition of error logging

The tracing::error import is appropriately added for logging file operation errors during the cleanup phase in the Drop implementation.


35-38: Well-organized imports for file extensions

Nice reorganization of imports to bring in the necessary file extension constants that will be used for file naming in the DiskWriter implementation.


45-45: Updated disk field to use new DiskWriter abstraction

The change from StreamWriter<File> to DiskWriter for the disk field is appropriate, reflecting the shift to a more structured approach for file handling.


48-51: Well-designed DiskWriter struct

Good design for the new struct that wraps the StreamWriter with a buffered writer and keeps track of the file path, enabling proper file management.


53-72: Clean implementation of DiskWriter creation and writing

The implementation is well-structured with good error handling. The try_new method appropriately sets up files with the temporary extension, and the write method correctly maps errors to the appropriate type.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
src/parseable/staging/writer.rs (2)

74-93: ⚠️ Potential issue

Incomplete error handling in Drop implementation

While the code correctly handles file renaming and checks for existing files, it doesn't clean up partial files if finish() fails. This could leave corrupted .part files in the filesystem.

Add cleanup for the part file when finish() fails:

fn drop(&mut self) {
    if let Err(err) = self.inner.finish() {
        error!("Couldn't finish arrow file {:?}, error = {err}", self.path);
+       // Clean up the corrupted part file
+       if let Err(clean_err) = std::fs::remove_file(&self.path) {
+           error!("Failed to clean up corrupted part file {:?}, error = {clean_err}", self.path);
+       } else {
+           warn!("Removed corrupted part file {:?}", self.path);
+       }
        return;
    }

    let mut arrow_path = self.path.to_owned();
    arrow_path.set_extension(ARROW_FILE_EXTENSION);

    if arrow_path.exists() {
        warn!("File {arrow_path:?} exists and will be overwritten");
    }

    if let Err(err) = std::fs::rename(&self.path, &arrow_path) {
        error!("Couldn't rename file {:?}, error = {err}", self.path);
+       // Try to clean up the part file on rename failure
+       if let Err(clean_err) = std::fs::remove_file(&self.path) {
+           error!("Failed to clean up part file after rename failure {:?}, error = {clean_err}", self.path);
+       }
    }
}

89-91: 🛠️ Refactor suggestion

Rename error handling could be improved

When renaming fails, the error is logged but the partial file is left behind, potentially causing confusion or disk space issues.

if let Err(err) = std::fs::rename(&self.path, &arrow_path) {
    error!("Couldn't rename file {:?}, error = {err}", self.path);
+   // Clean up the part file on rename failure to avoid leaving partial files
+   if let Err(clean_err) = std::fs::remove_file(&self.path) {
+       error!("Failed to clean up part file after rename failure {:?}, error = {clean_err}", self.path);
+   } else {
+       warn!("Removed part file {:?} after rename failure", self.path);
+   }
}
🧹 Nitpick comments (2)
src/parseable/staging/writer.rs (2)

53-66: Well-structured initialization method

The try_new method properly handles file creation with appropriate options and error handling. One suggestion would be to add a check for an existing .part file before opening.

pub fn try_new(path: impl Into<PathBuf>, schema: &Schema) -> Result<Self, StagingError> {
    let mut path = path.into();
    path.set_extension(PART_FILE_EXTENSION);
+   
+   // Check if a part file already exists - this could indicate a previous failure
+   if path.exists() {
+       warn!("Found existing part file {:?}, it will be overwritten", path);
+   }
    
    let file = OpenOptions::new()
        .write(true)
        .truncate(true)
        .create(true)
        .open(&path)?;
    let inner = StreamWriter::try_new_buffered(file, schema)?;

    Ok(Self { inner, path })
}

85-87: Target file existence check is good but handling could be improved

The code correctly checks if the target file exists before renaming, but only logs a warning without offering special handling like backup or unique naming.

Consider implementing a more robust strategy for handling existing files:

if arrow_path.exists() {
    warn!("File {arrow_path:?} exists and will be overwritten");
+   // Option 1: Create a backup of the existing file
+   let backup_path = arrow_path.with_extension(format!("{}.bak", ARROW_FILE_EXTENSION));
+   if let Err(err) = std::fs::rename(&arrow_path, &backup_path) {
+       error!("Failed to backup existing file {:?}, error = {err}", arrow_path);
+       // Continue with overwrite
+   } else {
+       warn!("Created backup of existing file at {:?}", backup_path);
+   }
+   
+   // Option 2: Use a unique name by appending timestamp
+   // let now = std::time::SystemTime::now().duration_since(std::time::UNIX_EPOCH).unwrap().as_secs();
+   // arrow_path = arrow_path.with_extension(format!("{}.{}", ARROW_FILE_EXTENSION, now));
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8706176 and cce0cb6.

📒 Files selected for processing (1)
  • src/parseable/staging/writer.rs (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: coverage
🔇 Additional comments (5)
src/parseable/staging/writer.rs (5)

22-24: Import additions align with new DiskWriter design

The added imports for File, OpenOptions, BufWriter, and PathBuf support the new DiskWriter implementation which encapsulates file I/O operations.


35-38: Good organization of imports

The reorganized imports improve clarity by grouping related items and explicitly importing the needed constants.


45-45: HashMap type change reflects new abstraction

The change from StreamWriter<File> to DiskWriter properly encapsulates the Arrow file writing functionality.


48-51: Good encapsulation in DiskWriter struct

The DiskWriter struct effectively encapsulates both the writer and file path, facilitating proper cleanup on drop.


68-71: Straightforward write implementation

The write method is concise and maps errors appropriately to the StagingError type.

coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 14, 2025
coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 15, 2025
@de-sh de-sh changed the title refactor: DiskWriter abstraction to handle write to arrows files fix: ensure current slot data goes into current slot file and isn't flushed until end of the slot Mar 15, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/utils/time.rs (1)

265-282: The granularity_range method implementation looks good.

This method correctly calculates the time range for a given timestamp based on the specified data granularity. It ensures that seconds and nanoseconds are set to 0, and properly computes the start and end times of the containing block.

However, consider adding validation for the data_granularity parameter to handle edge cases like 0 or negative values.

pub fn granularity_range(timestamp: DateTime<Utc>, data_granularity: u32) -> Self {
+   assert!(data_granularity > 0, "data_granularity must be positive");
    let time = timestamp
        .time()
        .with_second(0)
        .and_then(|time| time.with_nanosecond(0))
        .expect("Within expected time range");
    let timestamp = timestamp.with_time(time).unwrap();
    let block_n = timestamp.minute() / data_granularity;
    let block_start = block_n * data_granularity;
    let start = timestamp
        .with_minute(block_start)
        .expect("Within minute range");
    let end = start + TimeDelta::minutes(data_granularity as i64);

    Self { start, end }
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2c74671 and b031901.

📒 Files selected for processing (1)
  • src/utils/time.rs (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: coverage
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (3)
src/utils/time.rs (3)

284-287: LGTM! The contains method is well-implemented.

This method correctly checks if a timestamp falls within the time range using a half-open interval [start, end), which is the standard approach for time range checks.


349-349: LGTM! Import addition is necessary.

The addition of TimeZone to the imports is necessary for the new test methods that use Utc.with_ymd_and_hms().


540-616: Comprehensive test coverage for granularity_range.

These tests thoroughly validate the granularity_range method across different granularity values (1, 5, 15, 30 minutes) and edge cases. The test cases provide good coverage for boundary conditions and hour transitions.

Copy link
Contributor

@nikhilsinhaparseable nikhilsinhaparseable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@nitisht nitisht merged commit d147f48 into parseablehq:main Mar 16, 2025
13 of 14 checks passed
@de-sh de-sh deleted the disk-writer branch March 16, 2025 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

regression: dataloss as a result of prematurely "finish"ed arrows
3 participants