use exact read size to acquire from io semaphore #9287

dannyzaken · 2025-11-18T14:48:39Z

Describe the Problem

In read_object_stream, we used the requested_size passed to the _read function as the value to acquire from the io semaphore. By default, this is 32 MB (the stream's highWaterMark).
For datasets with mostly small objects, this limits the number of concurrent reads more than necessary.

Explain the Changes

Changed io_sem_size to reflect the actual size requested by the current read.
Also, avoid entering the code under the semaphore if there is nothing more to read.
changed debug level of some read\upload messages to log1 instead of log0

Issues: Fixed #xxx / Gap #xxx

Testing Instructions:

Doc added/updated
Tests added

Summary by CodeRabbit

Release Notes

Bug Fixes
- Fixed early termination handling in read stream operations.
- Improved error handling and reporting during upload operations.
- Optimized resource allocation calculations for read operations.
Refactor
- Reduced logging verbosity across upload and read paths to minimize output noise.

- In read_object_stream, we used the requested_size passed to the _read function as the value to acquire from the io semaphore. By default, this is 32 MB (the stream's highWaterMark). - For datasets with mostly small objects, this limits the number of concurrent reads more than necessary. - Changed io_sem_size to reflect the actual size requested by the current read. - Also, avoid entering the code under the semaphore if there is nothing more to read. - changed debug level of some read\upload messages to log1 instead of log0 Signed-off-by: Danny Zaken <[email protected]>

coderabbitai · 2025-11-18T14:48:53Z

Walkthrough

Modifications to object stream IO handling include converting verbose debug logs to less verbose levels across upload and read operations, adding early-termination logic in read_object_stream, adjusting IO semaphore sizing calculation, and adding explicit error re-throwing in the upload path.

Changes

Cohort / File(s)	Summary
Debug logging standardization and IO logic refinements `src/sdk/object_io.js`	Converts multiple debug logs from verbose (log0) to less verbose (log1) levels across upload and read paths; adds early-termination guard in read_object_stream when requested_end ≤ reader.pos; aligns IO semaphore sizing to use requested_end - reader.pos; adds explicit error re-throw in upload_object_range catch block; introduces log line for streaming target bucket/key; minor comment formatting adjustment

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Semaphore sizing alignment: Verify the change from requested_size to requested_end - reader.pos matches intended buffer behavior and doesn't introduce performance regressions
Early-termination guard logic: Confirm the new guard condition in read_object_stream correctly handles edge cases and stream closure semantics
Error re-throw placement: Ensure error re-throw in upload_object_range catch block maintains proper error propagation and doesn't mask underlying issues

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title directly reflects the main technical change: adjusting semaphore acquisition to use the exact read size instead of the default stream size, which is the core optimization described in the PR objectives.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 149c7f5 and 4abf62d.

📒 Files selected for processing (1)

src/sdk/object_io.js (10 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Build Noobaa Image
GitHub Check: run-package-lock-validation
GitHub Check: run-jest-unit-tests

🔇 Additional comments (8)

src/sdk/object_io.js (8)

176-176: LGTM: Log verbosity reduction is appropriate.

The changes from log0 to log1 for routine upload operations reduce log noise while maintaining visibility for debugging when needed.

Also applies to: 216-216, 231-231, 278-278, 292-292

175-181: LGTM: Error re-throwing ensures proper error propagation.

The explicit throw err; on line 180 is essential to propagate the error to callers. Without it, the error would be logged but swallowed, leading to silent failures.

380-380: LGTM: Enhanced logging context with reduced verbosity.

The log message now includes the streaming target (bucket/key) while reducing verbosity from log0 to log1.

493-493: LGTM: Reduced verbosity for internal upload steps.

Changing part upload logging from log0 to log1 is consistent with the goal of reducing log noise.

503-512: LGTM: Comment formatting improvement.

The JSDoc comment formatting is clearer with no functional changes.

601-605: LGTM: Early termination guard improves efficiency.

The guard correctly detects when the read position has reached or exceeded the requested end, avoiding unnecessary semaphore acquisition when there's nothing more to read. This is a valuable optimization that aligns with the PR objectives.

607-607: Excellent fix: Accurate semaphore sizing for actual read size.

This change correctly uses the actual bytes to be read (requested_end - reader.pos) instead of the stream's requested_size (which defaults to 32MB). This is the core fix that allows more concurrent reads for datasets with mostly small objects, directly addressing the PR objective. The early termination guard on lines 601-605 ensures this value is always positive.

636-636: LGTM: Reduced verbosity for routine position updates.

Changing reader position logging from log0 to log1 is appropriate for routine internal operations.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

jackyalbo · 2025-11-18T15:47:58Z

src/sdk/object_io.js

            // instead of getting multiple calls from the stream with small slices to return.

            const requested_end = Math.min(params.end, reader.pos + requested_size);
+            if (requested_end <= reader.pos) {


Just thinking here, maybe the logic will be clearer:
if (requested_size === 0 || reader.pos >= params.end) {
As this will give us more understanding that reading is finished

alphaprinz

Nice. LGTM.

re Jacky's comment, can't say I have a preference. Options are equivalent and both make sense.

dannyzaken requested a review from guymguym November 18, 2025 14:48

pull-request-size bot added the size/S label Nov 18, 2025

jackyalbo reviewed Nov 18, 2025

View reviewed changes

alphaprinz approved these changes Nov 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

use exact read size to acquire from io semaphore #9287

use exact read size to acquire from io semaphore #9287

dannyzaken commented Nov 18, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 18, 2025 •

edited

Loading

Uh oh!

jackyalbo Nov 18, 2025

Uh oh!

alphaprinz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

use exact read size to acquire from io semaphore #9287

Are you sure you want to change the base?

use exact read size to acquire from io semaphore #9287

Conversation

dannyzaken commented Nov 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the Problem

Explain the Changes

Issues: Fixed #xxx / Gap #xxx

Testing Instructions:

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

jackyalbo Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

alphaprinz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dannyzaken commented Nov 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 18, 2025 •

edited

Loading