Skip to content

Conversation

@dannyzaken
Copy link
Member

@dannyzaken dannyzaken commented Nov 18, 2025

Describe the Problem

  • In read_object_stream, we used the requested_size passed to the _read function as the value to acquire from the io semaphore. By default, this is 32 MB (the stream's highWaterMark).
  • For datasets with mostly small objects, this limits the number of concurrent reads more than necessary.

Explain the Changes

  • Changed io_sem_size to reflect the actual size requested by the current read.
  • Also, avoid entering the code under the semaphore if there is nothing more to read.
  • changed debug level of some read\upload messages to log1 instead of log0

Issues: Fixed #xxx / Gap #xxx

Testing Instructions:

  • Doc added/updated
  • Tests added

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Fixed early termination handling in read stream operations.
    • Improved error handling and reporting during upload operations.
    • Optimized resource allocation calculations for read operations.
  • Refactor

    • Reduced logging verbosity across upload and read paths to minimize output noise.

- In read_object_stream, we used the requested_size passed to the _read function as the value to acquire from the io semaphore. By default, this is 32 MB (the stream's highWaterMark).
- For datasets with mostly small objects, this limits the number of concurrent reads more than necessary.
- Changed io_sem_size to reflect the actual size requested by the current read.
- Also, avoid entering the code under the semaphore if there is nothing more to read.
- changed debug level of some read\upload messages to log1 instead of log0

Signed-off-by: Danny Zaken <[email protected]>
@coderabbitai
Copy link

coderabbitai bot commented Nov 18, 2025

Walkthrough

Modifications to object stream IO handling include converting verbose debug logs to less verbose levels across upload and read operations, adding early-termination logic in read_object_stream, adjusting IO semaphore sizing calculation, and adding explicit error re-throwing in the upload path.

Changes

Cohort / File(s) Summary
Debug logging standardization and IO logic refinements
src/sdk/object_io.js
Converts multiple debug logs from verbose (log0) to less verbose (log1) levels across upload and read paths; adds early-termination guard in read_object_stream when requested_end ≤ reader.pos; aligns IO semaphore sizing to use requested_end - reader.pos; adds explicit error re-throw in upload_object_range catch block; introduces log line for streaming target bucket/key; minor comment formatting adjustment

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Semaphore sizing alignment: Verify the change from requested_size to requested_end - reader.pos matches intended buffer behavior and doesn't introduce performance regressions
  • Early-termination guard logic: Confirm the new guard condition in read_object_stream correctly handles edge cases and stream closure semantics
  • Error re-throw placement: Ensure error re-throw in upload_object_range catch block maintains proper error propagation and doesn't mask underlying issues

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly reflects the main technical change: adjusting semaphore acquisition to use the exact read size instead of the default stream size, which is the core optimization described in the PR objectives.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 149c7f5 and 4abf62d.

📒 Files selected for processing (1)
  • src/sdk/object_io.js (10 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-package-lock-validation
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (8)
src/sdk/object_io.js (8)

176-176: LGTM: Log verbosity reduction is appropriate.

The changes from log0 to log1 for routine upload operations reduce log noise while maintaining visibility for debugging when needed.

Also applies to: 216-216, 231-231, 278-278, 292-292


175-181: LGTM: Error re-throwing ensures proper error propagation.

The explicit throw err; on line 180 is essential to propagate the error to callers. Without it, the error would be logged but swallowed, leading to silent failures.


380-380: LGTM: Enhanced logging context with reduced verbosity.

The log message now includes the streaming target (bucket/key) while reducing verbosity from log0 to log1.


493-493: LGTM: Reduced verbosity for internal upload steps.

Changing part upload logging from log0 to log1 is consistent with the goal of reducing log noise.


503-512: LGTM: Comment formatting improvement.

The JSDoc comment formatting is clearer with no functional changes.


601-605: LGTM: Early termination guard improves efficiency.

The guard correctly detects when the read position has reached or exceeded the requested end, avoiding unnecessary semaphore acquisition when there's nothing more to read. This is a valuable optimization that aligns with the PR objectives.


607-607: Excellent fix: Accurate semaphore sizing for actual read size.

This change correctly uses the actual bytes to be read (requested_end - reader.pos) instead of the stream's requested_size (which defaults to 32MB). This is the core fix that allows more concurrent reads for datasets with mostly small objects, directly addressing the PR objective. The early termination guard on lines 601-605 ensures this value is always positive.


636-636: LGTM: Reduced verbosity for routine position updates.

Changing reader position logging from log0 to log1 is appropriate for routine internal operations.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

// instead of getting multiple calls from the stream with small slices to return.

const requested_end = Math.min(params.end, reader.pos + requested_size);
if (requested_end <= reader.pos) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking here, maybe the logic will be clearer:
if (requested_size === 0 || reader.pos >= params.end) {
As this will give us more understanding that reading is finished

Copy link
Contributor

@alphaprinz alphaprinz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. LGTM.

re Jacky's comment, can't say I have a preference. Options are equivalent and both make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants